Docker in research development

Development in research is very different from development in an industrial context. Many popular DevOps tools can nonetheless be effectively used in research development as well. Docker, an open-source containerization platform, is one such a tool. This blog post shows how Docker can help you develop the software needed for your research, and gives some ideas on how to integrate it in your day-to-day workflow.

Docker helps you to

  • Develop in a clean and stable environment.
  • Quickly deploy any related services such as a database server.
  • Share your results.

What is Docker?

Docker is an open-source tool to build, share and run applications. To do so, it creates a self-contained environment for an application to run in, the so-called image. This image can then be shared with other people, and then executed on their machines. An image that is being executed is called a container.

From the point of view of the application within a container it has its own filesystem, and operations within the container do not change the host system’s filesystem. For example, when a PostgreSQL server runs in a Docker container it has not been installed on the host. Conversely, from PostgreSQL’s point of view it is the only application running on the system. It is, however, also possible to exchange files between the host and the containers, or to define a network over which the containers can communicate with the host and each other.

Docker is available on Mac and Windows through Docker Desktop and on Linux as the Docker engine. You can find the installation instructions for the different operating systems in the official documentation. Docker Hub hosts official images for many popular operating systems, programming languages and services. Docker images can also be found on GitHub, GitLab, … . Many projects also include a Dockerfile that you can use to build the image yourself.

Work in a clean development environment

A researcher may work on multiple projects having similar but slightly different requirements. For example, two projects might require different versions of Python. While there exist solutions to address this locally, they often require fiddling with local or even global configurations. The risk of one environment “leaking” into another is still present and hard to pinpoint when it occurs. Completely reversing these configurations is also often not trivial.

In such a case, it can be interesting to develop directly within a Docker image.

It sounds a bit tedious at first to set up a Docker environment for each project, but it can be streamlined with the help of base images. For example, you can create a parent image with your day-to-day development tools such as Git, Vim, … and your preferred configuration. You can then use this image as a base to create more specific images for the actual development. To create your parent image you can choose from a wide range of official images for the different programming languages such as e.g. Python.

However, this approach is mainly suitable for text user interface text editors. Some editors, such as Visual Studio Code provide extensions to work with Docker but are in general not compatible with Docker containers.

Pros:

  • Clean development environment.
  • No need to change host’s system configuration.

Cons:

  • If you change the parent image you have to recreate all the derived images as well.
  • Mainly limited to text user interfaces.

Quickly and easily deploy related services

As a researcher, you can easily find yourself in the situation where you need different versions of the same service (e.g. PostgreSQL). This can be quite a challenge on a single machine. For example, when running multiple instances of the same service, they need to run on different (and therefore non-standard) ports. Or your package manager might not include the required version so you need to install it manually. Also, some operating systems integrate new services in the start-up routine so that they are started on boot even if you don’t need them at the moment. To handle all of this, it is important to properly configure the services and the host system.

You can relieve yourself of a good part of this headache if you deploy, instead, each version of a service within its own Docker container. This effectively isolates them from the host and from one another so that you do not have to change their configuration. Docker Hub provides official images for many popular services such as PostgreSQL, Redis, Nginx, … often with links to their documentation. To remove the service without leaving any trace on the host simply delete the container and its related image.

Pros:

  • Quickly deploy a given version.
  • Have multiple versions running at the same time without any additional configuration.
  • No trace on the host.
  • Set resource limits per container.

Cons:

  • On Linux, per default a container has no resource limits.

Share your results

It can be challenging for researchers unfamiliar with your software to recreate the environment to reproduce your results. They might also shy away from installing software packages they feel might interfere too much with their own local setup. Once they have successfully installed the software, they will have to execute it to reproduce your results. Since their environment is potentially very different from your development environment, unexpected errors can occur at this stage. Debugging can be long and tedious in this case and it can be difficult to pinpoint their source. In the worst case, the researcher who tries to reproduce your results gives up entirely.

Docker can be helpful here in several ways. Most importantly, it effectively replaces the entire installation by downloading and executing the image. That way the other researchers keep their local environment clean. Furthermore, you can catch any installation problems early when you build the image. Another advantage is that Docker can not only install the software but also execute any commands necessary to create its output to automate the process.

In case of any bugs, the developer can easily recreate the environment for debugging purposes.

Pros:

  • Docker is the only installation requirement.
  • Image execution replaces installation.
  • Automation.
  • Easier debugging in a reproducible environment.

Cons:

  • Researcher that tries to reproduce results needs a basic understanding of Docker.

Going further

Once you’re comfortable with using Docker in your everyday work, you may want to explore Docker Compose. It allows you to define and run multi-container applications.

Conclusion

This blog post showed how Docker can help you develop your software in a clean and stable environment, quickly deploy any additional service you might need and help others reproduce the results of your research. It is therefore a useful addition to any researcher’s toolbox.