Work Storage Migration and New Scratch Workflow

We have now enabled general access to a new storage location at $WORK, which is /work/users/$USER where $USER is your ManeFrame II (M2) username. This storage space is designed to be used for long term data storage on M2 in lieu of $SCRATCH (/scratch/users/$USER), which is now commonly being used as a long term storage location. Your $WORK space has an 8 TB quota with no time limits.

In the near term we encourage you to move general data storage from $SCRATCH to $WORK and then use $SCRATCH just for scratch space during calculations. Below are some helpful commands for determining what to move to $WORK.

Note

  • $SCRATCH is a temporary storage space for the duration of a job or a set of jobs.

  • $WORK is storage for the duration of a project or critical research output that would be difficult to reproduce. Files needed after a job completes should be moved from $SCRATCH to $WORK.

  • Neither $SCRATCH nor $WORK are backed up. Both have built in redundancies, but are otherwise not protected.

As noted above, your $SCRATCH space should be used as scratch space for running jobs. Files that are not needed after the job has completed should be removed and those files that are needed should be moved to your $HOME or $WORK spaces. Examples of this workflow are given below.

Migrating Data from $SCRATCH to $WORK

The commands given must be run through shell access to M2, which can be the HPC OnDemand Web Portal or any SSH-capable client.

In each of the commands where a job is submited to the queue, the job may take a while to complete. You can view the status of the job via squeue -u $USER | grep wrap.

Before the following commands can be run, the migration assistant module must be loaded into your environment.

module load migration

The migration assistant’s help can be found by running work_migration_assistant.

Determine the Size of Files in Your $SCRATCH Space

sbatch -p htc -c 1 --mem=6G -o "$HOME/scratch_usage_%j.out" --wrap 'work_migration_assistant report_scratch_size'

This will submit a job that will show the sizes of the directories $SCRATCH and the total. The output for the job will be in "$HOME/scratch_usage_%j.out" where %j is a job ID number.

Determine the Number of Files in Your $SCRATCH Space

sbatch -p htc -c 1 --mem=6G -o "$HOME/scratch_file_count_%j.out" --wrap 'work_migration_assistant report_scratch_number_files'

This will submit a job that will show the number of files for each directory in $SCRATCH. The output for the job will be in $HOME/scratch_file_count_%j.out where %j is a job ID number.

Move All Data from $SCRATCH to $WORK if the Total Size is Less than 8 TB

Warning

These commands will copy all data from $SCRATCH and then delete the data in $SCRATCH. Also, this is best done while you have no other jobs running on M2.

sbatch -p htc -c 1 --mem=6G -o "$HOME/scratch_work_migration_%j.out" --wrap 'work_migration_assistant migrate'

This will submit a job that will copy data from $SCRATCH to $WORK. The output of the transfer will be in file $HOME/scratch_work_migration_%j.out, where %j is a job ID number.

Verify that all the data has been tranferred from $SCRATCH to $WORK.

sbatch -p htc -c 1 --mem=6G -o "$HOME/scratch_work_migration_%j.out" --wrap 'work_migration_assistant verify'

After you have verified that the transfer has been completed you may delete the data from $SCRATCH. In order to use the migration assistant to remove the data, the following conditions must be met:

  1. The path must exist.

  2. An absolute (full, not relative) path path must be given.

  3. The path must be the real path, e.g. /scratch/users/$USER and not ~/scratch\.

  4. The path must be /scratch/users/$USER or a directory within /scratch/users/$USER.

The migration assistant will verify with you if you really want to remove the specified directory.

work_migration_assistant remove $SCRATCH

Move Select Directory from $SCRATCH to $WORK if the Total Size is Less than 8 TB

Warning

These commands will copy a specific directory from $SCRATCH and then delete that directory from $SCRATCH. Also, this is best done while you have no other jobs reading or writing data to this directory.

sbatch -p htc -c 1 --mem=6G "$HOME/scratch_work_migration_%j.out" --wrap 'work_migration_assistant migrate <directory_to_move>'

Here, <directory_to_move> is the directory that you would like to move. The command will submit a job that will copy data from $SCRATCH to $WORK. The output of the transfer will be in file $HOME/scratch_work_migration_%j.out, where %j is a job ID number.

Verify that all the data has been tranferred from $SCRATCH to $WORK.

sbatch -p htc -c 1 --mem=6G -o "$HOME/scratch_work_migration_%j.out" --wrap 'work_migration_assistant verify <directory_to_move>'

If verification reports that the transfer has not completed, run the migration command above again and then re-verify.

After you have verified that the transfer has been completed you may delete the data from $SCRATCH. In order to use the migration assistant to remove the data, the following conditions must be met:

  1. The path must exist.

  2. An absolute (full, not relative) path path must be given.

  3. The path must be the real path, e.g. /scratch/users/$USER and not ~/scratch\.

  4. The path must be /scratch/users/$USER or a directory within /scratch/users/$USER.

The migration assistant will verify with you if you really want to remove the specified directory.

work_migration_assistant remove $SCRATCH/<directory_to_move>

Using $SCRATCH as Scratch, a Workflow for $WORK

The general workflow should be:

  1. Prepare job files and associated scripts in $HOME or $WORK.

  2. During the job, i.e. in the job script itself, copy needed files from $HOME or $WORK to a temporary directory in $SCRATCH.

  3. Once the calculations in the job have finished, copy the needed files back from $SCRATCH.

  4. Lastly, clean up your temporary directory in $SCRATCH by deleting it.

Example Workflow When Using sbatch

In the following example job script, the work flow of copying data to $SCRATCH, running the job, and then copying needed data back from SCRATCH is demonstrated. The files for this example can be found on M2 in /hpc/examples/workflow.

In this example:

  1. One file, water.dat, is defined to be copied to the temporary scratch directory.

  2. Two files, water.out and water.fchk, are defined to be copied back from the temporary scratch directory after the calculation has completed.

  3. A temporary scratch directory is made.

  4. This example is using Psi4, which has a special variable to point the calculation a scratch space. Thus, we set that variable to point to the temporary scratch directory as well.

  5. The all of files initially defined to be copied to the temporary scratch directory are then copied there.

  6. The script then goes to the temporary scratch directory.

  7. The calculation is run.

  8. Once the calculation has finished, all the files defined to be copied back to the directory from which the job was submitted are then copied to a job specific directory there.

  9. The temporary scratch directory is the deleted thereby cleaning up all other files produced by the calculation that are no longer needed.

#!/bin/bash
#SBATCH -J water
#SBATCH -o water_%j.out
#SBATCH -p htc
#SBATCH -c 1
#SBATCH --mem=6G

module purge
module load psi4

#Files to copy to $SCRATCH for the job
start_files=("water.dat") 

#Files to copy from $SCRATCH after the job
end_files=("water.out" "water.fchk")

#Setup temporary job directory in $SCRATCH
job_directory="${SCRATCH}/${SLURM_JOB_NAME}_${SLURM_JOB_ID}"
mkdir ${job_directory}
lfs setstripe -c 1 ${job_directory}

#Set application-specific scratch location
export PSI_SCRATCH=${job_directory}

#Copy files to the temporary job directory
cp -a ${start_files[@]} ${job_directory}/

#Change to the temporary job directory
cd ${job_directory}

#Run the calculation
psi4 water.dat

#Copy needed files back to directory from which the job was submitted
job_files_directory="${SLURM_SUBMIT_DIR}/${SLURM_JOB_NAME}_${SLURM_JOB_ID}"
mkdir ${job_files_directory}
cp -a ${end_files[@]} ${job_files_directory}/

#Delete the temporary job directory
rm -rf ${job_directory}