Job Submission Overview¶
- In order to run a job with the scheduler, you must submit your job to an appropriate queue.
- This will allow the scheduler to find and allocate the most appropriate nodes in order to run your job as quickly as possible.
- Jobs should be submitted with the
Step 1: Prepare a PBS Script¶
- For interactive submissions, users can pass job requirements to the scheduler directly, i.e., without using a script. See next section for more details on interactive job submission.
- For batch submissions, users must prepare a PBS script, which passes all the required information regarding a task to the scheduler so the scheduler can allocate proper nodes for it
- The PBS script can be regarded as a list of user 'demands', such as, which queue the task will be assigned to, how many nodes/cores will be required, how much memory is needed, etc.
- The file helloexample.pbs (name and extension are arbitrary, although using .pbs or .sh is the convention) is the PBS script, as annotated below:
# This is an example PBS script #PBS -N hello #PBS -l nodes=7:ppn=4 #PBS -l mem=2gb #PBS -l walltime=15:00:00 #PBS -q paceib #PBS -k oe #PBS -m abe #PBS -M firstname.lastname@example.org cd $PBS_O_WORKDIR echo "Started on `/bin/hostname`" echo "Nodes chosen are:" cat $PBS_NODEFILE module load gcc/4.9.0 module load mvapich2/2.1 mpirun -v -np 28 -machinefile $PBS_NODEFILE ~/mpi/hello
- Lines beginning with '#PBS' indicate instructions to the scheduler
- Unlike lines beginning with '#' (first line), they are not 'comments', instead they are 'commands', hence they should NOT be discarded
- All other lines are passed to the user's default shell (usually bash) for execution
PBS Script Explained #####¶
- Gives the job the name "hello".
- This name will be used to prefix all job output files (both standard output and errors) in your home directory.
- The name also appears in the queue listing once submitted.
- Tells the scheduler that you wish your job to run on 7 nodes, each with 4 processors per node (same thing as 'cores').
- Please note at present, this is advice to the scheduler.
- The scheduler may choose to run your job with 28 processors (7 nodes, 4 processors each) based upon any other combination of other available nodes and processors.
- Tells the scheduler that this code may use up to a total of 2GB memory. To specify memory per core, use '-l pmem' instead.
- Tells the scheduler you expect this job to require no more than 15 hours of wall clock time to run once started.
- This provides a good mechanism for detecting jobs that may encounter an infinite loop or some unexpected behavior.
- Once this time limit is reached, the scheduler may cause the job to fail and release its resources.
- The format is HH:MM:SS or, if expressed as a single integer, just seconds.
- Tells the scheduler you wish this job to run in the paceib queue using Infiniband-connected hosts.
- You can get a list of queues that you have access by running the "pace-whoami" command anywhere on the cluster.
- Tells the scheduler you wish to retain both standard output and error output of the job and to place this output in your home directory named for the job name with the suffix .o for output and .e for errors.
- In both cases, is the job ID assigned to your job when submitted to the queue.
- Tells the scheduler to send you email based upon:
- a mail is sent when the job is aborted by the batch system.
- b mail is sent when the job begins execution.
- e mail is sent when the job terminates.
-M email@example.com (optional)
Allows you to use an alternative email. The scheduler will use your default email if -M is not defined.
Unless otherwise noted, all queues have the following defaults:
- default memory is 1GB (this is used if you do not provide a "#PBS -l mem=" hint to the scheduler)
- default processors is 1 CPU (this is used if you do not provide a "#PBS -l nodes=" hint to the scheduler)
- default wall clock time is 1 hour (this is used if you do not provide a "#PBS -l walltime=" hint to the scheduler)
- maximum wall clock time is 60 days unless otherwise configured on a per-queue basis
- scheduling priority is configured per-queue basis
- a higher value for priority means that jobs will tend to be scheduled sooner than jobs with lower values for priority
- as jobs hang out in the queue, waiting for available processors their priority will increase
- large multi-processor jobs may cause "holes" of available processors, which may be filled with smaller jobs. This may cause a lower priority job to be scheduled before a higher priority job simply because it would fit. In general, the scheduler will try to keep CPUs busy rather than preserve a strict "this job runs before that job" ordering.
The remainder of the file is used as shell script commands to run the program. This example will:
cd $PBS_O_WORKDIR # Change to the directory where pbs script sits. echo "Started on `/bin/hostname`" # Shows where the job begins it's execution. echo "Nodes chosen are:" # These two lines show the nodes selected by the scheduler cat $PBS_NODEFILE mpirun -v -np 28 -machinefile $PBS_NODEFILE ./hellod # Begins execution of MPI-based application.
- The last step for MPI-based applications requests the execution for
~/mpi/helloapplication be started on 28 processors, with a list of machines chosen by the scheduler.
- Note that the number of processors passed to mpirun (using -np) should be equal or be smaller than nodes*ppn.
- The job does not need to be MPI. Many other parallel packages/applications (matlab, R, comsol, etc.), and even sequential jobs can be submitted using a PBS script.
Step 2: Submit PBS Script¶
- The second and final step is to submit the script to the scheduler using
- Once this step is complete, the job has been successfully submitted to the scheduler.