Updated 2019-06-02

Run Comsol on the Cluster - Batch Mode

Overview

  • There are a couple general parts to the Comsol Workflow:
    • Make model in gui / import model
    • Solve interactively (gui) or in batch mode
    • Analyze results
  • This guide will focus on how to solve models using batch mode. This is especially helpful if you want to solve multiple models at once
  • Important: models imported from windows may have binarys that dont work on the cluster. Make sure models are .mph not .mphbin, or just make them on the cluster using comsol interactively.

Set up Storage for Comsol

Danger

IMPORTANT: Comsol uses a hidden directory ~/.comsol that resides in your home directory to store configuration and temporary files.

  • Since the home dir size is only 5gb, this storage may cause a quota exceeded error, and since its a hidden file (/.), it may look like there is nothing in your home dir.
  • Solution: Move the ~/.comsol file to your ~/data dir and link it to the old location.
cd ~
mv .comsol ~/data
ln -s ~/data/.comsol

#Check to make sure everything worked

ls -ld ~/.comsol

#Should display:
# lrwxrwxrwx 1 <username> <group> <data> /nv/hp16/<username>/.comsol -> /nv/hp16/username/data/.comsol

Warning

We strongly recommend you create models on the cluster. If a model is created in windows, the binary of the model will not work on the cluster (linux).

  • If you are a researcher use the -research versions of Comsol, otherwise for things like class, use the non-research version.
  • Make sure you load matlab and then comsol in your PBS Script, using
    module load <matlab version> <comsol version>. Find available versions with module avail comsol.

Run Multithreaded Batch Job

  • Add the following line to your PBS script (after you have loaded the comsol module) to run a comsol on multiple cores:
  • comsol batch -np 8 -inputfile <input.mph> -outputfile <output_name.mph>

Important

The number after the -np flag (number of processors) must equal the number you requested in the PBS script

Run Multiple Models

  • One option is to use a job array. Please see the array guide for more information.
  • Another option is to supply a script that lists multiple jobs to be run, which will be explained below.
  • When logged into the cluster, create a plain file called COMSOL_BATCH_COMMANDS.bat (you can name it whatever you want, just make sure its .bat). Open the file in a text editor such as vim (vim COMSOL_BATCH_COMMANDS.bat).
  • With the file open, basically you just have to list the run command from above for every model:
#Contents of the .bat File
comsol batch -np 8 -inputfile <model1.mph> -outputfile <output_name.mph>
comsol batch -np 8 -inputfile <model2.mph> -outputfile <output_name.mph>
comsol batch -np 8 -inputfile <model3.mph> -outputfile <output_name.mph>
  • Then, in the PBS script instead of writing out the run command (the one that starts with comsol batch), include the name of the .bat file without the .bat, for example just write COMSOL_BATCH_COMMANDS
  • Make sure the .bat file and the model files are in the same dir as your PBS script, if you are using $PBS_O_WORKDIR in your script.
  • Since you are running multiple models in one job, you will have to increase the walltime of your job

Walkthrough: Run Comsol in Batch Mode

  • This walkthrough will use an example .mph model, cold_water_glass.mph. The model is an example provided by Comsol, and more detail on the model can be found on their website here
  • Model file can be found here
  • PBS Script can be found here
  • After logging in, You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.

Part 1: The PBS Script

#PBS -N comsolTest
#PBS -l nodes=1:ppn=8
#PBS -l pmem=8gb
#PBS -l walltime=10:00
#PBS -q force-6
#PBS -j oe
#PBS -o comsolTest.out

cd $PBS_O_WORKDIR
module load comsol/5.3a-research
comsol batch -np 8 -inputfile cold_water_glass.mph -outputfile cold_water_glass_solved.mph
  • The #PBS directives are standard, requesting just 1 minute of walltime and 1 node with 8 cores. More on #PBS directives can be found in the PBS guide
  • $PBS_O_WORKDIR is simply a variable that represents the directory you submit the PBS script from. Make sure the .mph comsol model you want to run (in this case, cold_water_glass.mph) is in the same directory you put the PBS script.
  • module load comsol/5.3a-research loads the 5.3a version of comsol. To see what comsol versions are available, run module avail comsol, and load the one you want
  • comsol batch runs comsol
  • For multiple cpus (parallel), make sure the number of processors you request in the directives (top) part of the script is equal to the number you specify in the -np part of the comsol batch line

Part 2: Submit Job and Check Status

  • Make sure you're in the dir that contains the PBS Script
  • Submit as normal, with qsub <pbs script name>. In this case qsub comsol.pbs
  • Check job status with qstat -t 22182721, replacing the number with the job id returned after running qsub
  • You can delete the job with qdel 22182721 , again replacing the number with the jobid returned after running qsub

Part 3: Collecting Results

  • In the directory where you submitted the PBS script, you should see all the generated output files, including the solved model .mph file. Use cat comsolTest.out to view information on the completed job, which should look like:
---------- Current Progress: 100 % - Assembling matrices
Memory: 1046/1118 7077/7081
1726      119.82    0.087925     3567 1732 3567     1     1      2  3.4e-11  5.4e-16
1727       119.9    0.087925     3569 1733 3569     1     1      2  3.3e-11  6.8e-16
---------- Current Progress: 100 % - Assembling sparsity pattern
Memory: 957/1118 7046/7081
1728      119.99    0.087925     3571 1734 3571     1     1      2  2.8e-11  6.3e-16
---------- Current Progress: 100 % - Assembling matrices
Memory: 989/1118 7081/7081
   -         120           - out
1729      120.08    0.087925     3573 1735 3573     1     1      2    6e-11  5.3e-16
Time-stepping completed.
---------- Current Progress: 100 % -
Memory: 957/1118 7047/7081
Solution time: 559 s. (9 minutes, 19 seconds)
Physical memory: 1.12 GB
Virtual memory: 7.08 GB
Ended at 14-Sep-2018 10:42:08.
----- Time-Dependent Solver 1 in Study 1/Solution 1 (sol1) -------------------->
Run time: 563 s.
Saving model: /gpfs/pace2/project/pf1/shollister7/comsol/cold_water_glass_solved.mph
Save time: 0 s.
Total time: 572 s.
---------- Current Progress: 100 % - Done
---------------------------------------
Begin PBS Epilogue Fri Sep 14 10:42:09 EDT 2018
Job ID:     22409572.shared-sched.pace.gatech.edu
User ID:    shollister7
Job name:   comsolTest
Resources:  neednodes=1:ppn=8,nodes=1:ppn=8,pmem=8gb,walltime=00:16:00
Rsrc Used:  cput=01:04:15,energy_used=0,mem=1080752kb,vmem=7572812kb,walltime=00:09:41
Queue:      force-6

  • After the result files are produced, you can move the files off the cluster. Refer to the file transfer guide for help.
  • To open the solved model in the Comsol postprocessor, see the Comsol interactive guide
  • Congratulations! You successfully ran Comsol in batch mode on the cluster.

Common mistake

  • If you got strange java errors on running Comsol on PACE, you can double check if ~/.comsol points to ~/data/.comsol and there is no recursive symbolic link exists.

  • "why the same input file produces the result on windows but not Linux?". The answer could be that you should initialize the results by right click "Study"--->"Get Initial Value".