Updated 2022-05-31

Using Conda and Anaconda

Overview

This document covers two related tools: conda and anaconda.

Conda is a program that lets you manage isolated software environments. Conda is most commonly used to install and manage Python packages, but it is versatile and can be used for packages from other languages, including CUDA and C. It allows you to:

  • Create personal environments where you can install your desired packages
  • Install packages from online repositories
  • Update previously-installed packages when needed
  • Switch between different environments

Anaconda is a collection of third-party Python packages that are particularly useful for numerical and data analytics. An Anaconda installation uses conda to manage and install packages. Hence, Anaconda must use conda, but conda itself does not need an Anaconda installation.

One-Time Setup (Phoenix, Hive, and Firebird Only)

Warning

  • For ICE clusters, skip this one-time setup, since you have a home directory but no project space.
  • Do not load the anaconda module before doing the one-time setup. If the anaconda module is loaded (module list shows anaconda), then run: module unload anaconda.

By default, conda creates new environments in your home directory at ~/.conda. However, conda environments can be quite large, and your home directory has a relatively small storage quota (see the storage guides for Phoenix, Hive, and Firebird for more info).

Hence, we strongly recommend that you create a symlink named ~/.conda that points to your project space, which has a much larger storage quota. The pathname for your project space depends on which cluster you are using.

  • For Phoenix: ~/p-<pi-username>-<number> or d-<school-name>-<number>
  • For Hive: ~/data

For the remainder of this section, we will use Phoenix's pathnames. The same commands can be used for Hive if you use data instead of p-<pi-username>-<number> in the pathnames.

  1. First, check to see if you have an existing ~/.conda directory or symlink by running:

    file ~/.conda

  2. If file ~/.conda reports "No such file or directory", then:

    a. Create a .conda directory in your project space. Be sure to substitute your correct PI name and number.

    mkdir ~/p-rrahaman6-0/.conda

    b. Create a symlink from your project space to your home directory:

    ln -s ~/p-rrahaman6-0/.conda ~/.conda

  3. If file ~/.conda reports that ~/.conda is a directory, then:

    a. Move your .conda directory to your project space. Be sure to substitute your correct PI name and number.

    mv ~/.conda ~/p-rrahaman6-0/

    b. Create a symlink from your project space to your home directory:

    ln -s ~/p-rrahaman6-0/.conda ~/.conda

  4. If file ~/.conda reports that ~/.conda is a symlink, then:

    a. If the symlink points to your project space, ~/p-<pi-username>-<number>/.conda, then you are done with first-time setup.

    b. If the symlink does not point to your project space, remove the symlink with rm ~/.conda and follow Step 2, above.

Working With Environments on PACE

Warning

This workflow does not use conda init, since it may cause undesired behavior on the PACE clusters. See "Problems with conda init" for more info.

Loading the Anaconda Module

PACE has several anaconda installations available. You can view them with the command:

module avail anaconda

The different module versions follow the naming convention: anaconda<python-version>.<month>.<version>. Unless you have a very specific reason for using Python 2 (such as compatability with legacy software), you should use a Python 3 module. Load your desired module (in this example, anaconda3/2021.05) with:

module load anaconda3/2021.05

Like with other modules, you can add the module load anaconda3/2021.05 command to your shell startup scripts (such as ~/.bashrc) if you will be using the module routinely.

Listing Environments

All previously-installed environments can be listed by name. You will have a default environment named "base", but it is often better to create new environment for specific workflows.

conda env list

Creating an Environment

To create a new, empty conda environment with a given name, use conda create. You can use any name for the --name argument.

conda create --name my_new_env

After creating the new environment, it will be shown when you run conda env list.

Activating and Switching Your Environments

To start using a previously-created environment, you must activate it with conda activate. After that, the command prompt will show the name of the active environment. For example:

rrahaman6@login-phoenix-1:~$ conda activate my_new_env
(my_new_env) rrahaman6@login-phoenix-1:~$

If an environment is currently active, you can also switch to another environment with conda activate.

Installing New Packages

After activating your environment, you can install new packages with conda install. For example, this will install the Natural Language Toolkit package along with all of its dependencies:

conda install nltk

Conda searches for packages in "channels", which are specific online repositories of packages. By default, the Anaconda installation is configured to search in the "anaconda" channel. Many times, you need to specify an alternate channel with the -c flag. For example, this installs the OpenMM package from the "conda-forge" channel.

conda install -c conda-forge openmm 

Your package's documentation will specify if you need to use an alternate conda channel.

If necessary, you may also use the pip command to install packages from the Python Package Index in the active conda environment. However, interoperability between conda and pip is not perfect, and it is recommended to use conda whenever possible. For more information, see here. For troubleshooting, see "Problems with pip".

Deactivating Environments

If you want to deactivate the current environment, run:

conda deactivate

Deleting Environments

To remove an environment and all of its installed packages, use conda env remove

conda env remove --name my_new_env

Example: Installing and Running in Batch Mode

For demonstration purposes, we will install PyTorch in a new environment and run a PyTorch program on Phoenix in batch mode (Note: PACE already has a GPU-enabled installation of PyTorch. See here for more info).

After completing the "One-Time Setup", we will create a new environment and install PyTorch from the "pytorch" channel. This can be done on the login or compute nodes.

module load anaconda3/2021.05
conda create --name my_pytorch
conda activate my_pytorch
conda install pytorch cpuonly -c pytorch

Now suppose we want to run this PyTorch program as a script named pytorch_tensors.py. We can submit this as a batch job using the following PBS script. Be sure to use your correct project account for the -A option.

#PBS -N pytorch_tensors         # job name
#PBS -A pace-admins             # account to which job is charged, ex: GT-gburdell3
#PBS -l nodes=1:ppn=1           # number of nodes and cores per node required
#PBS -l pmem=2gb                # memory per core
#PBS -l walltime=00:10:00       # duration of the job (ex: 10 min)
#PBS -j oe                      # combine output and error messages into 1 file
#PBS -o pytorch_tensors.out     # output file name

cd $PBS_O_WORKDIR

module load anaconda3/2021.05
conda activate my_pytorch

python pytorch_tensors.py

Troubleshooting

Problems with conda init

The conda init command will add commands to your shell startup scripts (such as ~/.bashrc) to automatically activate conda at startup. On the PACE clusters, this can lead to undesired conflicts between software versions and is not recommended on PACE.

If you know or suspect you've previously run conda init, you can remove the added commands from your startup script by running this:

conda init --reverse

Then you must restart your shell by logging out and logging back in again.

Problems with pip

While trying to install pip packages in a conda environment, you may come across an issue such as the install clashing with items in other local directories. One way to resolve the issue is to move the ~/.local folder out of the way when doing pip installs under the conda environment with a command like mv ~/.local ~/.local.bak. If that doesn't work, then try to manually edit site.py in the conda Python by setting ENABLE_USER_SITE = False in the file. If you are using the tiny conda environment, you can find the file with the following:

module load anaconda3/2021.05
conda env list
conda activate tiny
python -c 'import site; print(site.__file__)'
  • More information about ENABLE_USER_SITE can be found here.

Using tcsh or csh shells

At this time, users that wish to use Anaconda 2 or Anaconda 3, and csh/tcsh shells, need to use the module anaconda3/2021.05 or newer versions. Furthermore, PACE staff advises tcsh users to consider using Bash scripts for job scripts:

  1. Insert #!/bin/bash on the first line of the job script
  2. After all #PBS directives, insert:
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi