- Tue 04 April 2017
- HowTo
- #docker, #reproducible, #research, #jupyter
Recently, I started to use and get to know Docker. One of my central motivations is to utilize this technology for the creation of reproducible research/work. The first minimal working example I came up with contains a notebook which loads the data from a CSV file which is part of the image. You can see the details here. While preparing this image, I came across many useful items; I collected some of them in this post.
Connecting two containers over network 1
First, start a new network:
docker network create new-network
Next, start the two containers as follow:
docker run -i -t --name cont1 --net=new-network --net-alias=cont1 drorata/base-image /bin/bash
docker run -i -t --name cont2 --net=new-network --net-alias=cont2 drorata/base-image /bin/bash
Stop / Remove all running containers 2
docker stop $(docker ps -a -q)
The last part generates a list of IDs and in turn this list is passed to the stop
command.
Similarly,
docker rm $(docker ps -a -q)
will remove all the stoped containers.
You can use the -f
option for the rm
for (brutally) stopping and removing all containers.
Leave a container and keep in alive
If you start a new container in interactive mode and enter the shell, like in docker run -i -t ubuntu
, and exit it, Docker will stop the container.
You can check it using docker ps -a
.
The reason is that the process you asked has terminated and the container is stopped.
To avoid it, you can hit CTRL+P CTRL+Q
.
See this short and excellent answer and don't forget to read also this one.
Using Anaconda from Docker
First, you can run a simple container having full–fledged Anaconda installation. It is as simple as
docker run -i -t --name conda-base continuumio/anaconda3 /bin/bash
Once running, you can python in the container as much as you want.
As Jupyter is an important part of the work, let's discuss how to use it.
From the container's terminal you can start Jupyter but that won't be enough.
The localhost
of the container is not the same as of the host OS.
We'd have to enable port forwarding:
docker run -i -t -p 8888:8888 --name conda-base continuumio/anaconda3 /bin/bash
Next, in the new container's shell run
jupyter notebook --ip='*' --port=8888 --no-browser
Go to the address where the notebook is served and enjoy. There's one thing missing still, the nice notebooks don't have a place to be saved. They can be saved of course in the container, but they won't persists once you stop it. You should mount a local directory (on the host) as a data volume:
docker run -i -t -p 8888:8888 --name conda-base -v ~/tmp:/opt/notebooks continuumio/anaconda3 /bin/bash
and inside the container, execute:
jupyter notebook --notebook-dir=/opt/notebooks --ip='*' --port=8888 --no-browser
Now, what ever notebook you save from within the container, it will be also available on ~/tmp
.
If, for some reason, the container stopped, you can reuse it: docker exec -it conda-base /bin/bash
.
Lastly, putting everything together, you can instantiate a new container as follow:
docker run -i -t -p 8888:8888 -v ~/tmp:/opt/notebooks --name conda-base continuumio/anaconda3 /bin/bash -c "/opt/conda/bin/jupyter notebook --notebook-dir=/opt/notebooks --ip='*' --port=8888 --no-browser"
References
Location of images 3
On Mac images are stored in a file
$HOME/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2
Running PySpark within Jupyter
The pyspark-notebook image seems to be a simple and straightforward way to get started with Spark. Simply run:
docker run -it --rm -p 8888:8888 jupyter/pyspark-notebook
Naturally, you can also mount a local directory for persiting the generated notebooks.