Updated 2023-05-08
Using Conda and Anaconda¶
Overview¶
This document covers two related tools: conda and anaconda.
Conda is a program that lets you manage isolated software environments. Conda is most commonly used to install and manage Python packages, but it is versatile and can be used for packages from other languages, including CUDA and C. It allows you to:
- Create personal environments where you can install your desired packages
- Install packages from online repositories
- Update previously-installed packages when needed
- Switch between different environments
Anaconda is a collection of third-party Python packages that are particularly useful for numerical and data analytics. An Anaconda installation uses conda to manage and install packages. Hence, Anaconda must use conda, but conda itself does not need an Anaconda installation.
One-Time Setup (Phoenix, Hive, and Firebird Only)¶
Warning
- Do not load the anaconda module before doing the one-time setup. If the anaconda module is loaded (
module list
showsanaconda
), then run:module unload anaconda
.
By default, conda creates new environments in your home directory at ~/.conda
.
However, conda environments can be quite large, and your home directory has a
relatively small storage quota (see the storage guides for
Phoenix, Hive, and
Firebird for more info).
Hence, we strongly recommend that you create a
symlink named ~/.conda
that points to your
project space, which has a much larger storage quota. The pathname for your
project space depends on which cluster you are using.
- For Phoenix:
~/p-<pi-username>-<number>
ord-<school-name>-<number>
- For Hive:
~/data
For the remainder of this section, we will use Phoenix's pathnames. The same commands can be used for Hive if you use data
instead of p-<pi-username>-<number>
in the pathnames.
-
First, check to see if you have an existing
~/.conda
directory or symlink by running:file ~/.conda
-
If
file ~/.conda
reports "No such file or directory", then:a. Create a
.conda
directory in your project space. Be sure to substitute your correct PI name and number.mkdir ~/p-rrahaman6-0/.conda
b. Create a symlink from your project space to your home directory:
ln -s ~/p-rrahaman6-0/.conda ~/.conda
-
If
file ~/.conda
reports that~/.conda
is a directory, then:a. Move your
.conda
directory to your project space. Be sure to substitute your correct PI name and number.mv ~/.conda ~/p-rrahaman6-0/
b. Create a symlink from your project space to your home directory:
ln -s ~/p-rrahaman6-0/.conda ~/.conda
-
If
file ~/.conda
reports that~/.conda
is a symlink, then:a. If the symlink points to your project space,
~/p-<pi-username>-<number>/.conda
, then you are done with first-time setup.b. If the symlink does not point to your project space, remove the symlink with
rm ~/.conda
and follow Step 2, above.
Working With Environments on PACE¶
Warning
This workflow does not use conda init
, since it may cause undesired behavior on the PACE clusters. See "Problems with conda init
" for more info.
Loading the Anaconda Module¶
PACE has several anaconda installations available. You can view them with the command:
module avail anaconda
The different module versions follow the naming convention: anaconda<python-version>.<month>.<version>
. Unless you have a very specific reason for using Python 2 (such as compatability with legacy software), you should use a Python 3 module. Load your desired module (in this example, anaconda3/2021.05
) with:
module load anaconda3/2021.05
Like with other modules, you can add the module load anaconda3/2021.05
command to your shell startup scripts (such as ~/.bashrc
) if you will be using the module routinely.
Listing Environments¶
All previously-installed environments can be listed by name. You will have a default environment named "base", but it is often better to create new environment for specific workflows.
conda env list
Creating an Environment¶
To create a new, empty conda environment with a given name, use conda create
. You can use
any name for the --name
argument.
conda create --name my_new_env
After creating the new environment, it will be shown when you run conda env list
.
Activating and Switching Your Environments¶
To start using a previously-created environment, you must activate it with conda activate
. After that, the command prompt will show the name of the active environment. For example:
rrahaman6@login-phoenix-1:~$ conda activate my_new_env
(my_new_env) rrahaman6@login-phoenix-1:~$
If an environment is currently active, you can also switch to another environment with conda activate
.
Installing New Packages¶
After activating your environment, you can install new packages with conda install
. For example, this will install the Natural Language Toolkit package along with all of its dependencies:
conda install nltk
Conda searches for packages in
"channels",
which are specific online repositories of packages. By default, the Anaconda installation is configured to search in the "anaconda" channel. Many times, you need to
specify an alternate channel with the -c
flag. For example, this installs the OpenMM package from the "conda-forge" channel.
conda install -c conda-forge openmm
Your package's documentation will specify if you need to use an alternate conda channel.
If necessary, you may also use the pip
command to install packages from the
Python Package Index in the active conda environment.
However, interoperability between conda and pip is not perfect, and it is
recommended to use conda whenever possible. For more information, see here. For troubleshooting, see "Problems with pip".
Deactivating Environments¶
If you want to deactivate the current environment, run:
conda deactivate
Deleting Environments¶
To remove an environment and all of its installed packages, use conda env remove
conda env remove --name my_new_env
Example: Installing and Running in Batch Mode¶
For demonstration purposes, we will install PyTorch in a new environment and run a PyTorch program on Phoenix in batch mode (Note: PACE already has a GPU-enabled installation of PyTorch. See here for more info).
After completing the "One-Time Setup", we will create a new environment and install PyTorch from the "pytorch" channel. This can be done on the login or compute nodes.
module load anaconda3/2021.05
conda create --name my_pytorch
conda activate my_pytorch
conda install pytorch cpuonly -c pytorch
Now suppose we want to run this PyTorch
program
as a script named pytorch_tensors.py
. We can submit this as a batch
job using the
following PBS script. Be sure to use your correct project account for the -A
option.
#PBS -N pytorch_tensors # job name
#PBS -A pace-admins # account to which job is charged, ex: GT-gburdell3
#PBS -l nodes=1:ppn=1 # number of nodes and cores per node required
#PBS -l pmem=2gb # memory per core
#PBS -l walltime=00:10:00 # duration of the job (ex: 10 min)
#PBS -j oe # combine output and error messages into 1 file
#PBS -o pytorch_tensors.out # output file name
cd $PBS_O_WORKDIR
module load anaconda3/2021.05
conda activate my_pytorch
python pytorch_tensors.py
Troubleshooting¶
Problems with conda init
¶
The conda init
command will add commands to your shell startup scripts (such as ~/.bashrc
) to automatically activate conda at startup. On the PACE clusters, this can lead to undesired conflicts between software versions and is not recommended on PACE.
If you know or suspect you've previously run conda init
, you can remove the added commands from your startup script by running this:
conda init --reverse
Then you must restart your shell by logging out and logging back in again.
Problems with pip¶
While trying to install pip packages in a conda environment, you may come across an issue such as the install clashing with items in other local directories. One way to resolve the issue is to move the ~/.local folder
out of the way when doing pip installs under the conda environment with a command like mv ~/.local ~/.local.bak
. If that doesn't work, then try to manually edit site.py
in the conda Python by setting ENABLE_USER_SITE = False
in the file. If you are using the tiny conda environment, you can find the file with the following:
module load anaconda3/2021.05
conda env list
conda activate tiny
python -c 'import site; print(site.__file__)'
- More information about
ENABLE_USER_SITE
can be found here.
Using tcsh
or csh
shells¶
At this time, users that wish to use Anaconda 2 or Anaconda 3, and csh/tcsh shells, need to use the module anaconda3/2021.05
or newer versions. Furthermore, PACE staff advises tcsh users to consider using Bash scripts for job scripts:
- Insert
#!/bin/bash
on the first line of the job script - After all
#PBS
directives, insert:
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi