Docker Demos — Conda environments & containers

Introduction

Welcome back! In one of our previous tutorials, we learned how to use Microsoft Visual Studio Code to build our first containers. In the tutorial, we constructed a container based on an Alpine image, version 3.14. Given the size of the image, however, we did not have access to any coding languages such as Python or Miniconda. Without Conda, we are unable to install a Conda environment! So let’s revisit that blog post and see how we can build a customized environment within a Docker container.

The Setup

The main idea behind the setup we’ll be building is this:

  • Build a single Docker container; the container’s image should have Conda installed.
  • The container will have an /smee directory which will house an installation directory (/smee/installation).
  • Within the installation directory, we have two new files: environment.yml and requirements.txt.
  • The container will be built using Docker.

The steps are pretty straight forward, but some of our files could use some additional explanation. In the interest of transparency, our folder hierarchy will look like something like this. We have a folder (“docker_intro”) housing a Docker file (dockerfile), a Compose file (docker-compose.yml), and a subdirectory titled smee. The smee folder contains a folder titled installation; in turn, /smee/installation houses two new files: one to define our environment (environment.yml) and one to define other packages we’d like installed (requirements.txt). Now we need to describe the the contents of each file!

Docker file

Since we now want an Anaconda environment, we’ll use the latest miniconda image from the coninuumio portion of Docker Hub. We won’t be using a multi-stage build here, so we’ll just import the image as is without an AS flag.

We also need to copy files from our host directory into the container. So, we transition into the work directory /smee in our container and copy everything from our current host folder into the working directory. Finally, we define our base stage as our test stage. Within the test stage of our Docker file, we issue a command within the container to run the test.py file located within the /smee directory using the python command.

# Use the latest miniconda image
FROM continuumio/miniconda:latest

# Define & enter the installation directory within our container
WORKDIR /installation

# Copy contents of the local installation directory into the container directory
COPY ./smee/installation/ .

# Create a Conda environment based on our YAML file
RUN conda env create -f environment.yml

# Update conda, activate the new environment, and install our required packages
RUN conda init bash                                             \
    && . ~/.bashrc                                              \
    && conda update conda                                       \
    && conda activate myenv                                     \
    && pip install -r /installation/requirements.txt

Environment file: environment.yml

This is the first YAML (Yet Another Machine Language) file we’ve encountered so far, but it’s still pretty readable. Coincidentally, this is also one of the benefits of YAML files! In the first section, we define our environment name (myenv), and we define channels. In this context, channels are all the locations where we can expect to find the packages and dependencies we define later in our YAML file.

The dependencies list consists of a basic format: each line is one package with the format: - <package name>=<version number>=<package reference>. Using this format, we can list any number of packages, each specified by a very particular release number. Finally, we define a prefix — this is the location in which we’ll be installing our environment within the container. Here, we only show a handful of packages to illustrate the point; these packages include things like HDF5, NumPy, OpenCV, and the MATLAB plotting library, matplotlib.

name: myenv

channels:
  - defaults

dependencies:
  - hdf5=1.10.2=hba1933b_1
  - imageio=2.9.0=pyhd3eb1b0_0
  - matplotlib=3.3.4=py37h06a4308_0
  - matplotlib-base=3.3.4=py37h62a2d02_0
  - numpy=1.20.1=py37h93e21f0_0
  - numpy-base=1.20.1=py37h7d8b39e_0
  - opencv=3.4.2=py37h6fd60c2_1

prefix: /proj/myenv/users/yla0111/anaconda3/envs/myenv

Required packages file: requirements.txt

Similar to the YML file, the requriements.txt file simply lists additional packages we would like to be installed within our Conda environment. It really is just as simple as listing more packages we need to have, but this time we do not need to specify package versions! When Conda fetches these packages, it will grab the latest version for us automatically — this is a nice way to make sure that every time the container is built, we’re always using the most updated packages we can get.

numpy
scipy
xarray

Putting it all together: Docker

So now we’ve defined the files that will build our environment, install additional packages, and the Docker file that will put all this together for us. Similar to our previous tutorial on building Docker containers in VS Code, we’ll tell Docker to build our container.

vscode ➜ /com.docker $ docker build -t "environ:v0" .
[+] Building 737.2s (10/10) FINISHED                                                                                                              
 => [internal] load build definition from Dockerfile                                                                                         0.0s
 => => transferring dockerfile: 726B                                                                                                         0.0s
 => [internal] load .dockerignore                                                                                                            0.0s
 => => transferring context: 2B                                                                                                              0.0s
 => [internal] load metadata for docker.io/continuumio/miniconda:latest                                                                      1.9s
 => [1/5] FROM docker.io/continuumio/miniconda:latest@sha256:fee1354ae2435522b9a8a79c5f1c406facc07ec5c44d730d8053600b37c924f0              154.5s

< other jargon >

 => [2/5] WORKDIR /installation                                                                                                              0.6s

 => [3/5] COPY ./smee/installation/ .                                                                                                        0.0s

 => [4/5] RUN conda env create -f environment.yml                                                                                          480.0s

 => [5/5] RUN conda init bash                                                 && . ~/.bashrc                                                87.7s
 => exporting to image                                                                                                                      12.4s
 => => exporting layers                                                                                                                     12.4s
 => => writing image sha256:e0d9f82b4c418ced4ae3dc34d971fd618286d532ebefaeae5eb75abd615f1dcc                                                 0.0s
 => => naming to docker.io/library/environ:v0                                                                                                0.0s

Fantastic! Everything seems to have run smoothly. Let’s go ahead and enter the container and have a look around using docker run -it environ:v0. When we do this, we get an output with an interesting prefix — our directory prompt (root@5d8ba7fdfb87) is prefaced with (base) — this tells us that we have entered the container, but we are in the base environment, and not our Conda environment, myenv. Let’s have a look around anyways, shall we? Let’s go ahead and list all the packages in the base environment.

As we begin to look for the packages we specified in our requirements.txt file (ex. xarray), we notice that they are not in this list! This makes sense though, since we asked for those packages to be installed within the Conda environment myenv, and not the base environment. So it makes sense that we do not see packages like xarray listed in the list of Conda modules of the base environment.

# Run container
vscode ➜ /com.docker $ docker run -it environ:v0
(base) root@5d8ba7fdfb87:/installation# ls
environment.yml  requirements.txt

# Show all files in Base environment
(base) root@279718979ef2:/installation# conda list
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
asn1crypto                1.4.0                      py_0  
ca-certificates           2022.4.26            h06a4308_0  
certifi                   2020.6.20          pyhd3eb1b0_3  
cffi                      1.12.3           py27h2e261b9_0  
chardet                   3.0.4                 py27_1003  
colorama                  0.4.4              pyhd3eb1b0_0  
conda                     4.7.12                   py27_0  
conda-package-handling    1.6.0            py27h7b6447c_0  
cryptography              2.7              py27h1ba5d50_0  
enum34                    1.1.6                    py27_1  
futures                   3.3.0                    py27_0  
idna                      2.8                      py27_0  
ipaddress                 1.0.23                     py_0  
libedit                   3.1.20210910         h7f8727e_0  
libffi                    3.4.2                h295c915_4  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
ncurses                   6.3                  h7f8727e_2  
openssl                   1.1.1o               h7f8727e_0  
pycosat                   0.6.3            py27h7b6447c_0  
pycparser                 2.19                     py27_0  
pyopenssl                 19.0.0                   py27_0  
pysocks                   1.7.1                    py27_0  
python                    2.7.16               h9bab390_7  
readline                  7.0                  h7b6447c_5  
requests                  2.22.0                   py27_0  
ruamel_yaml               0.15.46          py27h14c3975_0  
setuptools                41.4.0                   py27_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.30.0               h7b6447c_0  
tk                        8.6.12               h1ccaba5_0  
tqdm                      4.63.0             pyhd3eb1b0_0  
urllib3                   1.24.2                   py27_0  
yaml                      0.1.7                had09818_2  
zlib                      1.2.12               h7f8727e_2  

Now, let’s go ahead and activate our myenv environment and check out the same package list. First, we notice that when we use conda activate myenv, the environment (base) is now replaced with (myenv) — so we have definitely installed our environment correctly, and we’ve been placed inside it!

Furthermore, if we look at packages installed in our Anaconda distribution (conda list), we have a much larger number of packages; among these packages are numpy, scipy, and xarray — the three modules we specifically requested to be installed within our Conda environment! Lastly, we confirm that very specific version of, say, OpenCV that we specified within our environments.yml file (version 3.4.2) has also successfully been installed within our Conda environment within our container.

# Change to myenv Conda environment
(base) root@5d8ba7fdfb87:/# conda activate myenv
(myenv) root@5d8ba7fdfb87:/# ls

(myenv) root@279718979ef2:/installation# conda list
# packages in environment at /opt/conda/envs/myenv:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
blas                      1.0                         mkl  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2022.4.26            h06a4308_0  
cairo                     1.16.0               h19f5f5c_2  
certifi                   2022.5.18.1      py37h06a4308_0  
cycler                    0.11.0             pyhd3eb1b0_0  
dbus                      1.13.18              hb2f20db_0  
expat                     2.4.4                h295c915_0  
ffmpeg                    4.0                  hcdf2ecd_0  
fontconfig                2.13.1               h6c09931_0  
freeglut                  3.0.0                hf484d3e_5  
freetype                  2.11.0               h70c0345_0  
giflib                    5.2.1                h7b6447c_0  
glib                      2.69.1               h4ff587b_1  
graphite2                 1.3.14               h295c915_1  
gst-plugins-base          1.14.0               h8213a91_2  
gstreamer                 1.14.0               h28cd5cc_2  
harfbuzz                  1.8.8                hffaf4a1_0  
hdf5                      1.10.2               hba1933b_1  
icu                       58.2                 he6710b0_3  
imageio                   2.9.0              pyhd3eb1b0_0  
importlib-metadata        4.11.4                   pypi_0    pypi
intel-openmp              2021.4.0          h06a4308_3561  
jasper                    2.0.14               hd8c5072_2  
jpeg                      9e                   h7f8727e_0  
kiwisolver                1.4.2            py37h295c915_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 11.2.0               h1234567_1  
libgfortran-ng            7.3.0                hdf63c60_0  
libglu                    9.0.0                hf484d3e_1  
libgomp                   11.2.0               h1234567_1  
libopencv                 3.4.2                hb342d67_1  
libopus                   1.3.1                h7b6447c_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              11.2.0               h1234567_1  
libtiff                   4.2.0                h2818925_1  
libuuid                   1.0.3                h7f8727e_2  
libvpx                    1.7.0                h439df22_0  
libwebp                   1.2.2                h55f646e_0  
libwebp-base              1.2.2                h7f8727e_0  
libxcb                    1.15                 h7f8727e_0  
libxml2                   2.9.14               h74e7548_0  
lz4-c                     1.9.3                h295c915_1  
matplotlib                3.3.4            py37h06a4308_0  
matplotlib-base           3.3.4            py37h62a2d02_0  
mkl                       2021.4.0           h06a4308_640  
mkl-service               2.4.0            py37h7f8727e_0  
mkl_fft                   1.3.1            py37hd3c417c_0  
mkl_random                1.2.2            py37h51133e4_0  
ncurses                   6.3                  h7f8727e_2  
numpy                     1.20.1           py37h93e21f0_0  
numpy-base                1.20.1           py37h7d8b39e_0  
opencv                    3.4.2            py37h6fd60c2_1  
openssl                   1.1.1o               h7f8727e_0  
pandas                    1.3.5                    pypi_0    pypi
pcre                      8.45                 h295c915_0  
pillow                    9.0.1            py37h22f2fdc_0  
pip                       21.2.2           py37h06a4308_0  
pixman                    0.40.0               h7f8727e_1  
py-opencv                 3.4.2            py37hb342d67_1  
pyparsing                 3.0.4              pyhd3eb1b0_0  
pyqt                      5.9.2            py37h05f1152_2  
python                    3.7.13               h12debd9_0  
python-dateutil           2.8.2              pyhd3eb1b0_0  
pytz                      2022.1                   pypi_0    pypi
qt                        5.9.7                h5867ecd_1  
readline                  8.1.2                h7f8727e_1  
scipy                     1.7.3                    pypi_0    pypi
setuptools                61.2.0           py37h06a4308_0  
sip                       4.19.8           py37hf484d3e_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.38.3               hc218d9a_0  
tk                        8.6.12               h1ccaba5_0  
tornado                   6.1              py37h27cfd23_0  
typing_extensions         4.1.1              pyh06a4308_0  
wheel                     0.37.1             pyhd3eb1b0_0  
xarray                    0.20.2                   pypi_0    pypi
xz                        5.2.5                h7f8727e_1  
zipp                      3.8.0                    pypi_0    pypi
zlib                      1.2.12               h7f8727e_2  
zstd                      1.5.2                ha4553b6_0  

Conclusion

In this tutorial, we’ve changed things up a bit. We’ve swapped out our Python image for an Anaconda image from Docker Hub, and we’ve installed a custom-named Conda environment within a Docker container. To validate the environment’s existence, we entered the Docker container, and compared the packages installed within the base environment and our custom-built environment, myenv. In doing so, we validated that packages from both our requirements.txt file and our environment.yml file existed only in our specified environment!

In our next tutorial, we’re going to take a step back to discuss how we can save, load, and move these images around. Given the functionality we’re starting to gain, it would be a shame to not share the images with others! Until then, thanks again for learning with me — we’re all in this together! If you’re enjoying the content, please feel free to Like, comment, and subscribe — see you next time!

Get new content delivered directly to your inbox.

(Header image: 3d studio by benzoix)

Advertisement
%d bloggers like this: