Updated 2022-09-16

Run R and RStudio on the Cluster

Overview

This guide covers how to use R on the cluster in two scenarios:

  1. Through RStudio in Open OnDemand (OOD)
  2. In a batch job from the terminal

It also covers how to install your own R packages on the cluster.

The R Modules

In both Open OnDemand and the terminal, we recommend using one of the following R modules.
These include many useful pre-installed packages and provide a stable environment for installing new packages:

  • r/4.2.1-tidy includes R 4.2.1 with tidyverse and devtools. It is built from the Rocker r/tidyverse Docker image.
  • r/4.2.1-cuda includes R 4.2.1 with CUDA 11.1, tidyverse and devtools. It is built from the Rocker r/ml Docker image.
  • r/4.2.1-bio includes R 4.2.1 with BiocManager and many Bioconductor packages. It is built from the Bioconductor Docker image

With any of the modules, you may install additional packages for yourself. See "Installing Your Own R Packages" below.

Note

For reproducibility, you can also run these containers on your own computer using Docker. See the Rocker or Bioconductor documentation for details.

Using RStudio in Open OnDemand

Open OnDemand (OOD) allows you to run interactive graphical programs (such as RStudio, Jupyter, and MATLAB) on the cluster directly from your browser. An OOD RStudio session will let you access the same files and hardware resources (including GPUs) as a terminal-based cluster job. RStudio sessions persist when your browser closes (within the time limit requested for the job). See the OnDemand RStudio guide for information about starting an RStudio session.

Using R in in a Batch Job

For a detailed introduction to batch jobs, see "Job Submission Overview". In this section, we demonstrate a batch session with a simple R script.

To use R in a batch job, you should:

  1. Write an R script with your desired R commands.

  2. Write a PBS script that does the following:

    a. Loads your desired R module with module load. See module descriptions above.

    b. Runs the R script with either the Rscript or R CMD BATCH commands.

  3. Submit the PBS script with qsub.

As an example, we will create this script named my_add.R

my_add <- function(x,y) {
  return(x + y)
}

print(my_add(1,1))

Then we will create a PBS script named my_add.pbs. You can substitute module load r/4.2.1-tidy for another R module. For details about the job resources (such as number of walltime and memory), see "How To Choose Job Resources".

#PBS -N R_test
#PBS -A [Account]
#PBS -l nodes=1:ppn=1
#PBS -l pmem=2gb
#PBS -l walltime=00:10:00
#PBS -q inferno
#PBS -j oe
#PBS -o my_add.out

cd $PBS_O_WORKDIR
module load r/4.2.1-tidy
Rscript my_add.R

Finally, to submit the job, run:

qsub my_add.pbs

Installing Your Own R Packages

This section is for users who intend to install their own R packages using the install.packages() command in R.

You can install packages from either RStudio in OOD or R on the command line. Packages installed with RStudio are available from R on the command line and vice versa, since OOD shares the same file system as the rest of the cluster.

By default, R and RStudio place user-installed packages in your home directory at ~/R. Since your home directory has a relatively small storage quota (see "Storage on the Cluster"), you may encounter issues where very large R packages exceed your storage quota. In this case, we recommend installing packages in your project space, which has a much large storage quota. The pathname to your storage space depends on which cluster you are using:

  • For Phoenix: ~/p-<pi-username>-<number> or d-<school-name>-<number>
  • For Hive: ~/data

For the remainder of this section, we will use Phoenix's pathnames. The same commands can be used for Hive if you use data instead of p-<pi-username>-<number> in the pathnames.

The most convenient way to use project space for your R packages is to create a symlink named ~/R that points to your project space:

  1. First, check to see if you have an existing ~/R directory or symlink by running:

    file ~/R

  2. If file ~/R reports "No such file or directory", then:

    a. Create a R directory in your project space. Be sure to substitute your correct PI name and number.

    mkdir ~/p-rrahaman6-0/R

    b. Create a symlink from your project space to your home directory:

    ln -s ~/p-rrahaman6-0/R ~/R

  3. If file ~/R reports that ~/R is a directory, then:

    a. Move your R directory to your project space. Be sure to substitute your correct PI name and number.

    mv ~/R ~/p-rrahaman6-0/

    b. Create a symlink from your project space to your home directory:

    ln -s ~/p-rrahaman6-0/R ~/R

  4. If file ~/R reports that ~/R is a symlink, then:

    a. If the symlink points to your project space, ~/p-<pi-username>-<number>/R, then you are done with first-time setup.

    b. If the symlink does not point to your project space, remove the symlink with rm ~/R and follow Step 2, above.

Note

Another way to explicitly load/store packages in your project space is to use the lib option in install.packages() and lib.loc option in (library())[https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/library]. For example: install.packages("jsonlite", lib="~/p-rrahaman6-0/my-r-libs").