Introduction
Welcome back! In one of our previous tutorials, we learned how to use Microsoft Visual Studio Code to build our first containers. In the tutorial, we constructed a container based on an Alpine image, version 3.14. Given the size of the image, however, we did not have access to any coding languages such as Python or Miniconda. Without Conda, we are unable to install a Conda environment! So let’s revisit that blog post and see how we can build a customized environment within a Docker container.
The Setup
The main idea behind the setup we’ll be building is this:
- Build a single Docker container; the container’s image should have Conda installed.
- The container will have an
/smee
directory which will house an installation directory (/smee/installation
). - Within the
installation
directory, we have two new files:environment.yml
andrequirements.txt
. - The container will be built using Docker.
The steps are pretty straight forward, but some of our files could use some additional explanation. In the interest of transparency, our folder hierarchy will look like something like this. We have a folder (“docker_intro”) housing a Docker file (dockerfile
), a Compose file (docker-compose.yml
), and a subdirectory titled smee
. The smee
folder contains a folder titled installation
; in turn, /smee/installation
houses two new files: one to define our environment (environment.yml
) and one to define other packages we’d like installed (requirements.txt
). Now we need to describe the the contents of each file!
Docker file
Since we now want an Anaconda environment, we’ll use the latest miniconda image from the coninuumio portion of Docker Hub. We won’t be using a multi-stage build here, so we’ll just import the image as is without an AS
flag.
We also need to copy files from our host directory into the container. So, we transition into the work directory /smee
in our container and copy everything from our current host folder into the working directory. Finally, we define our base stage as our test stage. Within the test stage of our Docker file, we issue a command within the container to run the test.py
file located within the /smee
directory using the python
command.
# Use the latest miniconda image
FROM continuumio/miniconda:latest
# Define & enter the installation directory within our container
WORKDIR /installation
# Copy contents of the local installation directory into the container directory
COPY ./smee/installation/ .
# Create a Conda environment based on our YAML file
RUN conda env create -f environment.yml
# Update conda, activate the new environment, and install our required packages
RUN conda init bash \
&& . ~/.bashrc \
&& conda update conda \
&& conda activate myenv \
&& pip install -r /installation/requirements.txt
Environment file: environment.yml
This is the first YAML (Yet Another Machine Language) file we’ve encountered so far, but it’s still pretty readable. Coincidentally, this is also one of the benefits of YAML files! In the first section, we define our environment name (myenv
), and we define channels
. In this context, channels are all the locations where we can expect to find the packages and dependencies we define later in our YAML file.
The dependencies list consists of a basic format: each line is one package with the format: - <package name>=<version number>=<package reference>
. Using this format, we can list any number of packages, each specified by a very particular release number. Finally, we define a prefix — this is the location in which we’ll be installing our environment within the container. Here, we only show a handful of packages to illustrate the point; these packages include things like HDF5, NumPy, OpenCV, and the MATLAB plotting library, matplotlib
.
name: myenv
channels:
- defaults
dependencies:
- hdf5=1.10.2=hba1933b_1
- imageio=2.9.0=pyhd3eb1b0_0
- matplotlib=3.3.4=py37h06a4308_0
- matplotlib-base=3.3.4=py37h62a2d02_0
- numpy=1.20.1=py37h93e21f0_0
- numpy-base=1.20.1=py37h7d8b39e_0
- opencv=3.4.2=py37h6fd60c2_1
prefix: /proj/myenv/users/yla0111/anaconda3/envs/myenv
Required packages file: requirements.txt
Similar to the YML file, the requriements.txt
file simply lists additional packages we would like to be installed within our Conda environment. It really is just as simple as listing more packages we need to have, but this time we do not need to specify package versions! When Conda fetches these packages, it will grab the latest version for us automatically — this is a nice way to make sure that every time the container is built, we’re always using the most updated packages we can get.
numpy
scipy
xarray
Putting it all together: Docker
So now we’ve defined the files that will build our environment, install additional packages, and the Docker file that will put all this together for us. Similar to our previous tutorial on building Docker containers in VS Code, we’ll tell Docker to build our container.
vscode ➜ /com.docker $ docker build -t "environ:v0" .
[+] Building 737.2s (10/10) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 726B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/continuumio/miniconda:latest 1.9s
=> [1/5] FROM docker.io/continuumio/miniconda:latest@sha256:fee1354ae2435522b9a8a79c5f1c406facc07ec5c44d730d8053600b37c924f0 154.5s
< other jargon >
=> [2/5] WORKDIR /installation 0.6s
=> [3/5] COPY ./smee/installation/ . 0.0s
=> [4/5] RUN conda env create -f environment.yml 480.0s
=> [5/5] RUN conda init bash && . ~/.bashrc 87.7s
=> exporting to image 12.4s
=> => exporting layers 12.4s
=> => writing image sha256:e0d9f82b4c418ced4ae3dc34d971fd618286d532ebefaeae5eb75abd615f1dcc 0.0s
=> => naming to docker.io/library/environ:v0 0.0s
Fantastic! Everything seems to have run smoothly. Let’s go ahead and enter the container and have a look around using docker run -it environ:v0
. When we do this, we get an output with an interesting prefix — our directory prompt (root@5d8ba7fdfb87
) is prefaced with (base)
— this tells us that we have entered the container, but we are in the base
environment, and not our Conda environment, myenv
. Let’s have a look around anyways, shall we? Let’s go ahead and list all the packages in the base environment.
As we begin to look for the packages we specified in our requirements.txt file (ex. xarray
), we notice that they are not in this list! This makes sense though, since we asked for those packages to be installed within the Conda environment myenv
, and not the base
environment. So it makes sense that we do not see packages like xarray
listed in the list of Conda modules of the base
environment.
# Run container
vscode ➜ /com.docker $ docker run -it environ:v0
(base) root@5d8ba7fdfb87:/installation# ls
environment.yml requirements.txt
# Show all files in Base environment
(base) root@279718979ef2:/installation# conda list
# packages in environment at /opt/conda:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
asn1crypto 1.4.0 py_0
ca-certificates 2022.4.26 h06a4308_0
certifi 2020.6.20 pyhd3eb1b0_3
cffi 1.12.3 py27h2e261b9_0
chardet 3.0.4 py27_1003
colorama 0.4.4 pyhd3eb1b0_0
conda 4.7.12 py27_0
conda-package-handling 1.6.0 py27h7b6447c_0
cryptography 2.7 py27h1ba5d50_0
enum34 1.1.6 py27_1
futures 3.3.0 py27_0
idna 2.8 py27_0
ipaddress 1.0.23 py_0
libedit 3.1.20210910 h7f8727e_0
libffi 3.4.2 h295c915_4
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
ncurses 6.3 h7f8727e_2
openssl 1.1.1o h7f8727e_0
pycosat 0.6.3 py27h7b6447c_0
pycparser 2.19 py27_0
pyopenssl 19.0.0 py27_0
pysocks 1.7.1 py27_0
python 2.7.16 h9bab390_7
readline 7.0 h7b6447c_5
requests 2.22.0 py27_0
ruamel_yaml 0.15.46 py27h14c3975_0
setuptools 41.4.0 py27_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.30.0 h7b6447c_0
tk 8.6.12 h1ccaba5_0
tqdm 4.63.0 pyhd3eb1b0_0
urllib3 1.24.2 py27_0
yaml 0.1.7 had09818_2
zlib 1.2.12 h7f8727e_2
Now, let’s go ahead and activate our myenv
environment and check out the same package list. First, we notice that when we use conda activate myenv
, the environment (base)
is now replaced with (myenv)
— so we have definitely installed our environment correctly, and we’ve been placed inside it!
Furthermore, if we look at packages installed in our Anaconda distribution (conda list
), we have a much larger number of packages; among these packages are numpy
, scipy
, and xarray
— the three modules we specifically requested to be installed within our Conda environment! Lastly, we confirm that very specific version of, say, OpenCV that we specified within our environments.yml
file (version 3.4.2) has also successfully been installed within our Conda environment within our container.
# Change to myenv Conda environment
(base) root@5d8ba7fdfb87:/# conda activate myenv
(myenv) root@5d8ba7fdfb87:/# ls
(myenv) root@279718979ef2:/installation# conda list
# packages in environment at /opt/conda/envs/myenv:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
blas 1.0 mkl
bzip2 1.0.8 h7b6447c_0
ca-certificates 2022.4.26 h06a4308_0
cairo 1.16.0 h19f5f5c_2
certifi 2022.5.18.1 py37h06a4308_0
cycler 0.11.0 pyhd3eb1b0_0
dbus 1.13.18 hb2f20db_0
expat 2.4.4 h295c915_0
ffmpeg 4.0 hcdf2ecd_0
fontconfig 2.13.1 h6c09931_0
freeglut 3.0.0 hf484d3e_5
freetype 2.11.0 h70c0345_0
giflib 5.2.1 h7b6447c_0
glib 2.69.1 h4ff587b_1
graphite2 1.3.14 h295c915_1
gst-plugins-base 1.14.0 h8213a91_2
gstreamer 1.14.0 h28cd5cc_2
harfbuzz 1.8.8 hffaf4a1_0
hdf5 1.10.2 hba1933b_1
icu 58.2 he6710b0_3
imageio 2.9.0 pyhd3eb1b0_0
importlib-metadata 4.11.4 pypi_0 pypi
intel-openmp 2021.4.0 h06a4308_3561
jasper 2.0.14 hd8c5072_2
jpeg 9e h7f8727e_0
kiwisolver 1.4.2 py37h295c915_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 7.3.0 hdf63c60_0
libglu 9.0.0 hf484d3e_1
libgomp 11.2.0 h1234567_1
libopencv 3.4.2 hb342d67_1
libopus 1.3.1 h7b6447c_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 11.2.0 h1234567_1
libtiff 4.2.0 h2818925_1
libuuid 1.0.3 h7f8727e_2
libvpx 1.7.0 h439df22_0
libwebp 1.2.2 h55f646e_0
libwebp-base 1.2.2 h7f8727e_0
libxcb 1.15 h7f8727e_0
libxml2 2.9.14 h74e7548_0
lz4-c 1.9.3 h295c915_1
matplotlib 3.3.4 py37h06a4308_0
matplotlib-base 3.3.4 py37h62a2d02_0
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py37h7f8727e_0
mkl_fft 1.3.1 py37hd3c417c_0
mkl_random 1.2.2 py37h51133e4_0
ncurses 6.3 h7f8727e_2
numpy 1.20.1 py37h93e21f0_0
numpy-base 1.20.1 py37h7d8b39e_0
opencv 3.4.2 py37h6fd60c2_1
openssl 1.1.1o h7f8727e_0
pandas 1.3.5 pypi_0 pypi
pcre 8.45 h295c915_0
pillow 9.0.1 py37h22f2fdc_0
pip 21.2.2 py37h06a4308_0
pixman 0.40.0 h7f8727e_1
py-opencv 3.4.2 py37hb342d67_1
pyparsing 3.0.4 pyhd3eb1b0_0
pyqt 5.9.2 py37h05f1152_2
python 3.7.13 h12debd9_0
python-dateutil 2.8.2 pyhd3eb1b0_0
pytz 2022.1 pypi_0 pypi
qt 5.9.7 h5867ecd_1
readline 8.1.2 h7f8727e_1
scipy 1.7.3 pypi_0 pypi
setuptools 61.2.0 py37h06a4308_0
sip 4.19.8 py37hf484d3e_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.38.3 hc218d9a_0
tk 8.6.12 h1ccaba5_0
tornado 6.1 py37h27cfd23_0
typing_extensions 4.1.1 pyh06a4308_0
wheel 0.37.1 pyhd3eb1b0_0
xarray 0.20.2 pypi_0 pypi
xz 5.2.5 h7f8727e_1
zipp 3.8.0 pypi_0 pypi
zlib 1.2.12 h7f8727e_2
zstd 1.5.2 ha4553b6_0
Conclusion
In this tutorial, we’ve changed things up a bit. We’ve swapped out our Python image for an Anaconda image from Docker Hub, and we’ve installed a custom-named Conda environment within a Docker container. To validate the environment’s existence, we entered the Docker container, and compared the packages installed within the base
environment and our custom-built environment, myenv
. In doing so, we validated that packages from both our requirements.txt
file and our environment.yml
file existed only in our specified environment!
In our next tutorial, we’re going to take a step back to discuss how we can save, load, and move these images around. Given the functionality we’re starting to gain, it would be a shame to not share the images with others! Until then, thanks again for learning with me — we’re all in this together! If you’re enjoying the content, please feel free to Like, comment, and subscribe — see you next time!
Get new content delivered directly to your inbox.
(Header image: 3d studio by benzoix)