Running STATA on ManeFrame II

Types of Nodes on Which STATA is to Run

First, you must identify the type of compute resource needed to run your calculation. In the following table the compute resources are delineated by resource type and the expected duration of the job. The duration and memory allocations are hard limits. Jobs with calculations that exceed these limits will fail. Once an appropriate resource has been identified the partition and Slurm flags from the table can be used in the following examples.

Partition

Duration

Cores

Memory [GB]

Example Slurm Flags

development (1 standard-mem, 1 MIC, 1 P100)

2 hours

various

various

-p development

htc

1 day

1

6

-p htc --mem=6G

standard-mem-s

1 day

36

256

-p standard-mem-s --mem=250G

standard-mem-m

1 week

36

256

-p standard-mem-m --mem=250G

standard-mem-l

1 month

36

256

-p standard-mem-l --mem=250G

medium-mem-1-s

1 day

36

768

-p medium-mem-1-s --mem=750G

medium-mem-1-m

1 week

36

768

-p medium-mem-1-m --mem=750G

medium-mem-1-l

1 month

36

768

-p medium-mem-1-l --mem=750G

medium-mem-2

2 weeks

24

768

-p medium-mem-2 --mem=750G

high-mem-1

2 weeks

36

1538

-p high-mem-1 --mem=1500G

high-mem-2

2 weeks

40

1538

-p high-mem-2 --mem=1500G

mic

1 week

64

384

-p mic --mem=374G

gpgpu-1

1 week

36

256

-p gpgpu-1 --gres=gpu:1 --mem=250G

v100x8

1 week

1

20

-p v100x8 --gres=gpu:1 --mem=20G

fp-gpgpu-2

various

24

128

-p fp-gpgpu-2 --gres=gpu:8 --mem=120G

fp-gpgpu-3

various

40

384

-p fp-gpgpu-3 --gres=gpu:2 --mem=370G

Note

The STATA installation on ManeFrame II provides serial and parallel versions. The commands to run the parallel versions are same as the serial version, but with “-mp” appended, i.e. xstata-mp instead of xstata. Please do not run the parallel version via the “htc” queue (see the table above). The examples below all serial versions of STATA can be substituted with the parallel version provided an appropriate queue is used.

Running STATA Interactively with the Graphical User Interface

The STATA graphical user interface can be run directly off of ManeFrame II (M2) compute nodes using the HPC OnDemand Web Portal.

Running STATA Non-Interactively in Batch Mode

STATA scripts can be executed non-interactively in batch mode in a myriad ways depending on the type of compute resource needed for the calculation, the number of calculations to be submitted, and user preference. The types of compute resources outlined above. Here, each partition delineates a specific type of compute resource and the expected duration of the calculation. Each of the following methods require SSH access. Examples can be found at /hpc/examples/stata on M2.

Submitting a STATA Job to a Queue Using Wrap

A STATA script can be executed non-interactively in batch mode directly using sbatch’s wrapping function.

  1. Log into the cluster using SSH and run the following commands at the command prompt.

  2. module load stata to enable access to STATA.

  3. cd to the directory with STATA script.

  4. sbatch -p <partition and options> --wrap "stata <stata script file name>" where <partition and options> is the partition and associated Slurm flags for each partition outlined in the table above. and <stata script file name> is the STATA script to be run.

  5. squeue -u $USER to verify that the job has been submitted to the queue.

Example:

module load stata
sbatch -p standard-mem-s --exclusive --mem=250G --wrap "stata example.do"

Submitting a STATA Job to a Queue Using an sbatch Script

A STATA script can be executed non-interactively in batch mode by creating an sbatch script. The sbatch script gives the Slurm resource scheduler information about what compute resources your calculation requires to run and also how to run the STATA script when the job is executed by Slurm.

  1. Log into the cluster using SSH and run the following commands at the command prompt.

  2. cd to the directory with STATA script.

  3. cp /hpc/examples/stata/stata_example_htc.sbatch <descriptive file name> where <descriptive file name> is meaningful for the calculation being done. It is suggested to not use spaces in the file name and that it end with .sbatch for clarity.

  4. Edit the sbatch file using using preferred text editor. Change the partition and flags and STATA script file name as required for your specific calculation.

#!/bin/bash
#SBATCH --job-name=stata_example
#SBATCH --output=stata_example_%j.out
#SBATCH --error=stata_example_%j.err
#SBATCH -p htc

module purge
module load stata

stata -b example.do
  1. sbatch <descriptive file name> where <descriptive file name> is the sbatch script name chosen previously.

  2. squeue -u $USER to verify that the job has been submitted to the queue.

Submitting Multiple STATA Jobs to a Queue Using a Single sbatch Script

Multiple STATA scripts can be executed non-interactively in batch mode by creating a single sbatch script. The sbatch script gives the Slurm resource scheduler information about what compute resources your calculations requires to run and also how to run the STATA script for each job when the job is executed by Slurm.

  1. Log into the cluster using SSH and run the following commands at the command prompt.

  2. cd to the directory with the STATA script or scripts.

  3. cp /hpc/examples/stata/stata_array_example.sbatch <descriptive file name> where <descriptive file name> is meaningful for the calculations being done. It is suggested to not use spaces in the file name and that it end with .sbatch for clarity. Additionally, to run this specific example you will also need additional files that can be copied with the command cp /hpc/examples/stata/{array_example_1.do,array_example_2.do,example.dta} ..

  4. Edit the sbatch file using using preferred text editor. Change the partition and flags, STATA script file name, and number of jobs that will be executed as required for your specific calculation.

#!/bin/bash
#SBATCH -J stata_example                 # Job name
#SBATCH -p standard-mem-s                # Partition (queue)
#SBATCH --exclusive                      # Exclusivity 
#SBATCH --mem=250G                       # Total memory required per node
#SBATCH -o stata_array_example_%A-%a.out # Job output; %A is job ID and
                                         # %a is array index
#SBATCH --array=1-2                      # Range of indices to be executed

module purge                             # Unload all modules
module load stata                        # Load STATA, change version as needed

stata -b array_example_${SLURM_ARRAY_TASK_ID}.do
# Edit STATA script name as needed; ${SLURM_ARRAY_TASK_ID} is array index
  1. sbatch <descriptive file name> where <descriptive file name> is the sbatch script name chosen previously.

  2. squeue -u $USER to verify that the job has been submitted to the queue.