Updated 2020-10-16

Using Anaconda and Creating User Conda Environments

Important

Anaconda distributions have started to use a year.month scheme, starting from late last year. All PACE resources will now adopt the same convention in the use of anaconda2 and anaconda3 modules. Therefore, anaconda module files for latest will be removed to avoid ambiguities. Users currently loading an anaconda module ending in latest should modify their commands to reference a specific version of Anaconda (ex: anaconda3/2019.10).

Overview

  • Conda is a powerful tool that makes it easy to:
    • install whatever python packages you want
    • create a personal environment to use these packages
  • To use any conda commands, you must have an anaconda module loaded.

Step 1: Use the Right Storage

Warning

If using Phoenix Cluster, follow the same directions below to create symlinks, but instead of /data, you will have to link to your project storage. The naming scheme is p-<pi-username>-<number> and you can find out more about how /data is referred to as project storage in the Phoenix Storage Guide.

Conda environments can easily surpass the 5gb limit of the $HOME directory. To work around this, we simply have to make sure it stores the environment in the $HOME/data directory, since the /data directory has much more storage available.

  • We will use a symlink to make the .conda file in the home dir link to a .conda file in /data, so Conda doesn't exceed storage limits.

Warning

  • Do not load the anaconda module before doing these steps
  • Check for and remove existing symlinks: If a symlink including the .conda file already exists, please remove it before continuing. This is a common issue, be sure to double check existing symlinks if you run into issues.

Make sure there is no .conda file in your home directory. Use ls -a to check if the file exists. If the anaconda module is loaded or .conda already exists in the home folder, the symlink won't work properly. The environment will still be stored in your home folder, and you might overflow the 5gb limit. Again, if a symlink already exists, remove it before continuing. To make the symlink, follow these steps (log into the cluster first):

  1. cd ~/
  2. mkdir ~/data/.conda
  3. ln -s ~/data/.conda .conda

Step 2: Load the Anaconda Module

Warning

If you are using a PACE system, you cannot use anaconda3/latest. You can instead use module avail anaconda3 to determine the latest version (currently 2019.10)

  • Run module avail anaconda to see all the available versions
  • Load with module load anaconda3/2019.10, or you can replace anaconda3/2019.10 with any version

Step 3: List Available Conda Environments

  • Use conda env list
  • tiny-20181119 is the latest environment. It is useful to clone when you make your own environment since it has many useful packages already installed.

Step 4: Create Environment

  • Use conda create --name <your-name> to create a completely empty conda environment.
  • To clone the latest conda environment (recommended), do:
    • conda env create pace-georgia-tech/tiny --name <anythingyouwant>

Step 5: Activate Environment

To activate your enivronment, run:

source activate <your-env-name>

Newer anaconda installation permit the use of conda commands: conda activate <your-env-name>

Step 6: Install Packages

  • Make sure your conda environment is activated. The env name will show up in parentheses next to your name in the terminal. Install python packages how you normally would. Use conda as a the preferred package manager
  • Use conda install or pip install, following installation steps for whatever packages you choose
  • Install nltk example:
    • when your conda env is activated, just run:
    • conda install nltk

How to run a locally-installed Anaconda virtual env when some dependency is needed

  • You may need to load other modules on which your libraries are dependent. For example, you may need the intel module to make use of a Python2 library. In that case, run a series of commands:
module load anaconda2/2019.10
source activate my-conda-environment
module load intel/15.0
command-depending-on-intel

Deactivating & Removing Environments

Deactivating an environment is useful if you need to activate another environment.

To deactivate run:

source deactivate <your-env-name>

When using your environment in running a job, make sure you activate it in the PBS script. Conda Env may use up a lot of your storage quota. If you are done with an environment and want to delete it, run:

conda env remove -n <your-env-name>

Pip Interoperability

  • For users wanting a improved interoperability with installing pip packages inside of conda, please have a read here.

Users using tcsh or csh shells

Note

At this time, users that wish to use Anaconda 2 or Anaconda 3, and csh/tcsh shells, need to use the module anaconda3/2019.03 or newer versions. Older versions will not work reliably.

Furthermore, PACE staff advises tcsh users to consider using Bash scripts for job scripts:

  1. Insert #!/bin/bash on the first line of the job script
  2. After all #PBS directives, insert:
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi

Examples for Building Custom Conda environments

In this section we provide custom conda examples to help all understand a conda environment from start to finish.

In the below example, we demonstrate an installation of the package MACS2, on login-s.pace.gatech.edu using Anaconda 2 (2019.03), with pip interoperability support.

An example with MACS2 (RHEL6 PACE Systems on 05/2019)


# load anaconda2 2019.03 for conda python2
module load anaconda2/2019.03

# sanity check the python and python version
type python2
python2 --version

# create and activate the conda environment
# noting the storage practices we mentioned above!!!
conda create --name mymacs2
conda activate mymacs2

# ensure that we install pip locked into using python2.7
# this is specifically a requirement for the version of MACS2 we want
conda install pip python=2.7
conda config --set pip_interop_enabled True
pip install -U numpy
pip install -U macs2

# sanity check packages
conda list 

macs2 --version # macs2 user should know what to do here

# when you are finished or want to work with another conda environment
conda deactivate 

What if you wanted to make sure you have the exact MACS2 that PACE built? After all pip and conda package versions might drift over time. In that case, using the above example, we can generate an environment.yml file using the command (while inside the mymacs2 env):

conda env export > environment.yml

For your information, the file was altered a bit; there are a couple of keys that we cleaned out, such the name and prefix since we don't know what you want to name things and maybe you have a better idea where to store your environments. Here's ours just in case it changes over time ...

environment.yml:

name: mymacs2_pace
channels:
  - conda-forge
  - defaults
dependencies:
  - ca-certificates=2019.3.9=hecc5488_0
  - certifi=2019.3.9=py27_0
  - libffi=3.2.1=he1b5a44_1006
  - libgcc-ng=8.2.0=hdf63c60_1
  - libstdcxx-ng=8.2.0=hdf63c60_1
  - ncurses=6.1=hf484d3e_1002
  - openssl=1.1.1b=h14c3975_1
  - pip=19.1=py27_0
  - python=2.7.15=h721da81_1008
  - readline=7.0=hf8c457e_1001
  - setuptools=41.0.1=py27_0
  - sqlite=3.26.0=h67949de_1001
  - tk=8.6.9=h84994c4_1001
  - wheel=0.33.1=py27_0
  - zlib=1.2.11=h14c3975_1004
  - numpy=1.16.3
  - pip:
    - macs2==2.1.2.1

Then how would we build a conda environment from this file? Easy!

module load anaconda2/2019.03
# if 'mymacs2_pace` does not exist
conda env create -f environment.yml -n mymacs2_pace
conda activate mymacs2_pace
macs2 --version

In many cases, we emphasize creating environments with environment.yml because we are making explicit the versions being used. Both you, the user, and I, the PACEr can now agree that we used the same package components and improves our changes at reproducibility for all.

Troubleshooting

Problems with Installation of pip Packages

  • While trying to install pip packages in a conda environment, you may come across an issue such as the install clashing with items in other local directories
  • One way to resolve the issue is to move the ~/.local folder out of the way when doing pip installs under the conda environment with a command like mv ~/.local ~/.local.bak.
  • If that doesn't work, then try to manually edit site.py in the conda Python by setting ENABLE_USER_SITE = False in the file.
  • If you are using the tiny conda environment, you can find the file with the following:
module load anaconda3/2019.10
conda env list
conda activate tiny
python -c 'import site; print(site.__file__)'
  • More information about ENABLE_USER_SITE can be found here.