OCI (Open Container Initiative) is an industry collaborated effort to define open containers specifications regarding container image format and runtime. The history of how it comes to where it stands today from the initial disagreement is a very interesting story in terms of collaboration and competition in open source world.
Nowadays, all the main players in the container ecosystem follow the OCI container specification. For anyone interested to know how container actually works, it is a great technical source you will not want to miss.
Here is a diagram that illustrates what they cover and how they interact.
An OCI Image will be downloaded from somewhere (think Docker hub) and then it will be unpacked into an OCI Runtime filesystem bundle. From that point, the OCI Runtime Bundle will be run by an OCI Runtime. The Runtime Specification defines how to run a “filesystem bundle”.
Image Specification (image-spec)
Image specification defines the archive format of OCI container images, which consists of a manifest, an image index, a set of filesystem layers, and a configuration. The goal of this specification is to enable the creation of interoperable tools for building, transporting, and preparing a container image to run.
At the top level, a container image is just a tarball, and after being extracted, it has the
layout as below.
│ └── sha256
│ ├── 4297f0* (image.manifest)
│ └── 7ea049 (image.config)
The layout isn’t that useful without a specification of what that stuff is and how they are related (referenced).
We can ignore the file
oci-layout for simplicity.
index.json is the entry point, it contains primary a
manifest, which listed all the "resources" used by a single container image.
manifest contains primarily the
config and the
Put that into a diagram, roughly this:
The config contains notably 1) configurations of the image, which can and will be converted to the runtime configure file of the runtime bundle, and 2) the layers, which makes up the root file system of the runtime bundle, and 3) some meta-data regarding the image history.
Layers are what makes up the final
rootfs. The first layer is the base, all the other layers contain only the changes to its base. Let's take a closer look at what layer specifications are in the following section.
For layers, the specification essentially defines two things:
- How to represent a layer.
- For the base layer,
tarall the content;
- For non-base layers,
tarthe changeset compared with its base.
- Hence, first detect the change, form a
changeset; and then tar the changeset, as the representation of this layer.
- How to union all the layers.
Apply all the changesets on top of the base layer. This will give you the
Runtime Specification (runtime-spec)
Once the Image is unpacked to a runtime bundle on the disk file system, you will have something that you can run. This is when the Runtime Specification kick in. The Runtime Specification specifies the configuration, execution environment, and lifecycle of a container.
A container’s configuration contains metadata necessary to create and run a container. This includes the process to run, environment variables, the resource constraints and sandboxing features to use, etc. Some of the configurations are generic across all platforms including Linux, Windows, Solaris and Virtual Machine specific; but some of them are platform specific, say Linux only.
The runtime specification also defines the Lifecycle of a container, which is a series of events that happen from when a container is created to when it ceases to exist.
A container has a lifecycle, at its essence, as you can imagine, it can be modeled as following state diagram.
You can throw in a few other actions and states, such as
paused, but those are the fundamental ones.
The state diagram is conventional but there is one important thing worth mentioning — the
Hooks. Probably a little surprise to you, container specifications don't actually define how to set up the network, it actually relies on the hooks to set up the network properly, say create the network before container start and delete it after the container is stopped.
We mentioned before that a container’s configuration contains the config necessary to create and run a container. And we will look at some of the configs a little bit closer to get a sense of what is container really about, and we’ll focus on Linux platform for all the configurations.
It defines the root file system of the container.
It specifies addition filesystem you can mount into the root file system. This is the place you can either bind mount your local host dir or a distributed dir, such as Ceph.
It specifies all the things related to the process that you want to run inside the container. It includes environment variable and the arguments to the process.
For the Linux process, you can additionally specify things concerning the security aspect of the process, things such as the capabilities, rlimits, and selinux label can be specified here.
This is the place you can hook up into the container lifecycle and do things such as setting up and/or clean up the network.
- Linux Namespaces
A whole lot of configurations for Linux platform is dedicated to the Namespace configuration. Actually, namespaces are the foundations of container technology. Or put it another way, there is will be no container without namespaces. Linux provides seven type of namespaces and they are all supported by the OCI runtime specification:NamespaceDomain / DescriptionPIDProcess IDsMountMount pointsNetworkNetwork devices, stacks, ports, etc.UserUser and group IDsIPCSystem V IPC, POSIX message queuesUTSHostname and NIS domain name
In addition to what and how the container should be run. Annotations allow you to label the containers. The ability to label and select the container base on some properties is the basic requirement for a container orchestration platform.
Image, Container, and Processes
Containers are created from (container) Image. You can create more than one containers from a single Image, and you can also repack the containers, usually with changes to the base image, to create a new Image.
After you get the containers, you can run process inside of that container, without all the nice things about a container. Most notably, once we containerize an app, it is become self-contained and won’t mess up with the host environment, and thus it should “run everywhere (TM)”.
Here is the relationship between the various concept, Image, Container and Process and it is vitally important to get them right.
Docker and Kubernetes
Docker makes container an industry trend and there are a lot of people who consider Docker as container and container as Docker. Docker definitely deserves the credit here. But from the technical point of view, Docker is the most widely used container implementation. The architecture of the Docker implementation evolves very quickly from version to version. At the time of writing, it looks like below.
The diagram follows the format of
[github]Org/project. Most of the components are originated from Docker, but they are currently under different GitHub organization and project. At the top is the Docker command tool we use daily, it is the commercial offering from Docker Inc.; The Docker tool relies on an open source project called moby, which in turn uses the runc, which is the reference implementation of the oci runtime specification. runc heavily depend on libcontainer, which was donated from Docker, Inc as well.
If we only need to one or two containers, Docker probably is all we need. But if we want to run dozens or thousands of containers we have more problems to solve. To name a few:
- Scheduling: Which host to put a container?
- Update: How to update the container image?
- Scaling: How to add more containers when more processing capacity is needed?
That is the job of container orchestration system. And Kubernetes is one of them, but as of now, I think there is no argument it is the most promising one. But we’ll not deep dive into Kubernetes here, but will touch briefly from the perspective that how the container runtime fit into the container orchestration platform.
Following diagram illustrate how the Kubernetes interact with the container runtime.
Kubernetes decouple the runtime implementation using Container Runtime Interface. Simply speaking, CRI defines the interface to create, start, stop and delete a container. It allows pluggable container runtime for Kubernetes and you don’t have to lock into one particular runtime. There are currently several implementations, such as
cri-o, both of which eventually will use
This is an overview of OCI container image and runtime specifications. It covers the responsibility of each specification and how they cooperate with each other. We go over the container lifecycle and primary configurations for the runtime spec. And we then introduce the relationship between Docker and runc, and finish the article with a brief introduction to container orchestration and how the container runtime fit into it.
To learn more about containers on Alibaba Cloud, visit https://www.alibabacloud.com/product/container-service
Or check out Alibaba Cloud’s open source rich container engine — PouchContainer.