Updated 2022-09-16
Run R and RStudio on the Cluster¶
Overview¶
This guide covers how to use R on the cluster in two scenarios:
- Through RStudio in Open OnDemand (OOD)
- In a batch job from the terminal
It also covers how to install your own R packages on the cluster.
The R Modules¶
In both Open OnDemand and the terminal, we recommend using one of the following R modules.
These include many useful pre-installed packages and provide a stable environment
for installing new packages:
r/4.2.1-tidy
includes R 4.2.1 with tidyverse and devtools. It is built from the Rockerr/tidyverse
Docker image.r/4.2.1-cuda
includes R 4.2.1 with CUDA 11.1, tidyverse and devtools. It is built from the Rockerr/ml
Docker image.r/4.2.1-bio
includes R 4.2.1 with BiocManager and many Bioconductor packages. It is built from the Bioconductor Docker image
With any of the modules, you may install additional packages for yourself. See "Installing Your Own R Packages" below.
Note
For reproducibility, you can also run these containers on your own computer using Docker. See the Rocker or Bioconductor documentation for details.
Using RStudio in Open OnDemand¶
Open OnDemand (OOD) allows you to run interactive graphical programs (such as RStudio, Jupyter, and MATLAB) on the cluster directly from your browser. An OOD RStudio session will let you access the same files and hardware resources (including GPUs) as a terminal-based cluster job. RStudio sessions persist when your browser closes (within the time limit requested for the job). See the OnDemand RStudio guide for information about starting an RStudio session.
Using R in in a Batch Job¶
For a detailed introduction to batch jobs, see "Job Submission Overview". In this section, we demonstrate a batch session with a simple R script.
To use R in a batch job, you should:
-
Write an R script with your desired R commands.
-
Write a PBS script that does the following:
a. Loads your desired R module with
module load
. See module descriptions above.b. Runs the R script with either the
Rscript
orR CMD BATCH
commands. -
Submit the PBS script with
qsub
.
As an example, we will create this script named my_add.R
my_add <- function(x,y) {
return(x + y)
}
print(my_add(1,1))
Then we will create a PBS script named my_add.pbs
. You can substitute module load r/4.2.1-tidy
for another R module. For details about the job resources (such as number of walltime and memory), see "How To Choose Job Resources".
#PBS -N R_test
#PBS -A [Account]
#PBS -l nodes=1:ppn=1
#PBS -l pmem=2gb
#PBS -l walltime=00:10:00
#PBS -q inferno
#PBS -j oe
#PBS -o my_add.out
cd $PBS_O_WORKDIR
module load r/4.2.1-tidy
Rscript my_add.R
Finally, to submit the job, run:
qsub my_add.pbs
Installing Your Own R Packages¶
This section is for users who intend to install their own R packages using the install.packages()
command in R.
You can install packages from either RStudio in OOD or R on the command line. Packages installed with RStudio are available from R on the command line and vice versa, since OOD shares the same file system as the rest of the cluster.
By default, R and RStudio place user-installed packages in your home directory
at ~/R
. Since your home directory has a relatively small storage quota (see
"Storage on the Cluster"), you may encounter
issues where very large R packages exceed your storage quota. In this case,
we recommend installing packages in your project space, which has a much large storage quota. The pathname to your storage space depends on which cluster you are using:
- For Phoenix:
~/p-<pi-username>-<number>
ord-<school-name>-<number>
- For Hive:
~/data
For the remainder of this section, we will use Phoenix's pathnames. The same commands can be used for Hive if you use data
instead of p-<pi-username>-<number>
in the pathnames.
The most convenient way to use project space for your R packages is to create a symlink named ~/R
that points to your project space:
-
First, check to see if you have an existing
~/R
directory or symlink by running:file ~/R
-
If
file ~/R
reports "No such file or directory", then:a. Create a
R
directory in your project space. Be sure to substitute your correct PI name and number.mkdir ~/p-rrahaman6-0/R
b. Create a symlink from your project space to your home directory:
ln -s ~/p-rrahaman6-0/R ~/R
-
If
file ~/R
reports that~/R
is a directory, then:a. Move your
R
directory to your project space. Be sure to substitute your correct PI name and number.mv ~/R ~/p-rrahaman6-0/
b. Create a symlink from your project space to your home directory:
ln -s ~/p-rrahaman6-0/R ~/R
-
If
file ~/R
reports that~/R
is a symlink, then:a. If the symlink points to your project space,
~/p-<pi-username>-<number>/R
, then you are done with first-time setup.b. If the symlink does not point to your project space, remove the symlink with
rm ~/R
and follow Step 2, above.
Note
Another way to explicitly load/store packages in your project space is to use the
lib
option in install.packages()
and lib.loc
option in (library()
)[https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/library]. For example: install.packages("jsonlite", lib="~/p-rrahaman6-0/my-r-libs")
.