8 Week 3 Reading: Containers
8.1 Containers
We already learned about software modules (Section 3.5) on the gizmo cluster. There is an alternative way to use software: pulling and running a software .
8.1.1 What is a Container?
A container is a self-contained unit of software. It contains everything needed to run the software on a variety of machines. If you have the container software installed on your machine, it doesn’t matter whether it is MacOS, Linux, or Windows - the container will behave consistently across different operating systems and architectures.
The container has the following contents:
- Software - The software we want to run in a container. For bioinformatics work, this is usually something like an aligner like
bwa, or utilities such assamtools - Software Dependencies - various software packages needed to run the software. For example, if we wanted to run
tidyversein a container, we need to haveRinstalled in the container as well. - Filesystem - containers have their own isolated filesystem that can be connected to the “outside world” - everything outside of the container. We’ll learn more about customizing these with bind paths (Section 9.3.3).
In short, the container has everything needed to run the software. It is not a full operating system, but a smaller mini-version that cuts out a lot of cruft.
Containers are . They leverage the the file system of their host to manage files. These are called both Volumes (the Docker term) and Bind Paths (the apptainer term).
8.1.2 Docker vs. Apptainer
There are two basic ways to run Docker containers:
- Using the Docker software
- Using the Apptainer software (for HPC systems)
In general, Docker is used on systems where you have a high level of access to the system. This is because docker uses a special user group called docker that has essentially root level privileges. This is not something to be taken lightly.
This is not the case for HPC systems, which are shared and granting this level of access to many people is not practical. This is when we use (which used to be called Singularity), which requires a much lower level of user privileges to execute tasks. For more info, see Section 9.3 .
Before we get started, security is always a concern when running containers. The docker group has elevated status on a system, so we need to be careful that when we’re running them, these containers aren’t introducing any system vulnerabilities. Note that on HPC systems, the main mechanism for running containers is apptainer, which is designed to be more secure.
These are mostly important when running containers that are web-servers or part of a web stack, but it is also important to think about when running jobs on HPC.
Here are some guidelines to think about when you are working with a container.
- Use vendor-specific Docker Images when possible.
- Use container scanners to spot potential vulnerabilities. DockerHub has a vulnerability scanner that scans your Docker images for potential vulnerabilities. For example, the WILDS Docker library employs a vulnerability scanner and the containers are regularly patched to prevent vulnerabilities.
- Avoid kitchen-sink images. One issue is when an image is built on top of many other images. It makes it really difficult to plug vulnerabilities. When in doubt, use images from trusted people and organizations. At the very least, look at the Dockerfile to see that suspicious software isn’t being installed.