Dockerfile Best Practices
As we make explicit recommendations on how to best write Dockerfiles for Open edX services, we will capture them here. Here is the current (very incomplete) list:
Use a still-supported Ubuntu LTS base image. Good candidates as of this writing are “ubuntu:focal” and “ubuntu:jammy”.
Explain each installed package with a comment. Each Ubuntu package explicitly installed in the Dockerfile should have a comment before the installation command explaining why that dependency is needed. This can help reveal possible future optimizations which could potentially eliminate some of those dependencies, and minimize “cargo cult” copying into new Dockerfiles which actually don’t need them.
Additionally, here are a few useful reference points to consider:
https://github.com/edx/edx-cookiecutters/blob/master/cookiecutter-django-ida/%7B%7Bcookiecutter.repo_name%7D%7D/Dockerfile (Dockerfile template for new Open edX IDAs)
tutor/tutor/templates/build/openedx/Dockerfile at release · overhangio/tutor (Tutor’s Dockerfile for edx-platform)
Note that these don’t all agree on some key points yet. Some decisions yet to be finalized include:
Should the services run in the system Python installation, a venv, a virtualenv, or a pyenv environment? Each option has its pros and cons, and we currently use all of the above in different Dockerfiles.
Many services use a few Python packages that require a native compilation toolchain to build C extensions. Can we use a builder pattern to produce wheels for these so all this compilation code doesn’t have to be included in the base images? We might want to add the toolchain to dev images anyway to simplify upgrades during development, but only for services that actually have dependencies that use C extensions and don’t have suitable Linux binary wheels on PyPI. (Adding these wheels could be a good contribution to the upstream packages.)
There are linters that can give advice on improving Dockerfiles, GitHub - hadolint/hadolint: Dockerfile linter, validate inline bash, written in Haskell seems to be by far the most actively maintained of these. We should try it and determine if we agree with its advice enough to more broadly recommend using it.