Design and Implementation of PouchContainer CRI

  1. New container runtimes, such as PouchContainer, are difficult to be added to the Kubernetes ecosystem. Container runtime developers must have a deep understanding of Kubernetes code (at least Kubelet) so as to complete the interconnection between the two.
  2. It will be more difficult to maintain Kubernetes code for two reasons: (1) Hard-coding all APIs of different container runtimes to Kubernetes will make the core Kubernetes code oversized; (2) even minor changes in container runtime interfaces will result in changes in the Kubernetes core code and make the Kubernetes unstable.

CRI Design Overview

CRI Manager Architecture

Implementation of Pod Models

  1. To create a Pod, Kubelet calls the CRI interface RunPodSandbox. To implement the interface, CRI Manager creates a special container called infra container. Considering the container implementation, the infra container is not special. It is an ordinary container with the pause-amd64:3.0 image created by calling Container Manager. However, considering the whole Pod container group, it plays a special role. As it contributes its Linux Namespace for containers to share, all containers in the container group are connected. It is like a carrier that carries all other containers in a Pod and provides infrastructure for their operations. Generally an infra container stands for a Pod.
  2. After an infra container is created, Kubelet creates other containers in a Pod. Two CRI interfaces CreateContainer and StartContainer are called continuously to create a container. For CreateContainer, CRI Manager simply converts the container's configuration format from CRI to PouchContainer, and then transfers the configuration to Container Manager for container creation. The only concern is how to add the container to Linux Namespace of the infra container mentioned above. This is very simple. The container configuration parameters of Container Manager include PidMode, IpcMode, and NetworkMode which are respectively used to configure Pid Namespace, Ipc Namespace, and Network Namespace of a container. Generally speaking, Namespace configuration of a container includes two modes: "None" mode (to create an exclusive Namespace for the container) and "Container" mode (to add the container to the Namespace of another container). You simply need to set the three parameters to the "Container" mode and add the container to the Namespace of an infra container. The specific addition process is not related to CRI Manager. For StartContainer, CRI Manager simply forwards the request, obtains the container ID form the request, and calls the Start interface of Container Manager to start the container.
  3. Kubelet continuously calls two CRI interfaces ListPodSandbox and ListContainersto obtain information about the running status of containers on the node. ListPodSandbox shows the status of each infra container while ListContainer shows the status of containers other than infra containers. The problem is infra containers and other containers are no different for Container Manager. Then how does CRI Manager distinguish these containers? In fact, when creating a container, CRI Manager adds a label to the existing container configuration, indicating the container type. In this way, when ListPodSandbox and ListContainers are implemented, different types of containers can be filtered by label value.

Pod Network Configuration

$ cat >/etc/cni/net.d/10-mynet.conflist <<EOF
{
"cniVersion": "0.3.0",
"name": "mynet",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "10.22.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
}
]
}
EOF
  1. When calling Container Manager to create an infra container, set NetworkMode to "None", indicating that an exclusive Network Namespace is created for the infra container without any configuration.
  2. Based on the PID of an infra container, obtain the corresponding Network Namespace path /proc/{pid}/ns/net.
  3. Call the SetUpPodNetwork method of CNI Manager. The core parameter is the Network Namespace path obtained in step 2. The method is used to call the specified CNI plugins during the initialization of CNI Manager, for example, bridge as shown above, and configure the Network Namespace specified in the parameter, including creating network devices, performing network configuration, and adding the Network Namespace to the CNI network of the plugins.

IO Stream Processing

aster $ kubectl exec -it shell-demo -- /bin/bash
root@shell-demo:/# ls
bin dev home lib64 mnt proc run srv tmp var
boot etc lib media opt root sbin sys usr
root@shell-demo:/#
  1. The essence of a kubectl exec command is to run the exec command on a container in the Kubernetes cluster and forward the IO stream generated to users. Requests are first forwarded layer by layer to the Kubelet of the node where such container is located, and Kubelet calls the Exec interface in CRI based on the configuration. The following are the requested configuration parameters:
type ExecRequest struct {
ContainerId string //Indicates the target container where an exec command is executed.
Cmd []string //Indicates the specific exec command executed.
Tty bool //Indicates whether to run the exec command in a TTY.
Stdin bool //Indicates whether a Stdin stream is contained.
Stdout bool //Indicates whether a Stdout stream is contained.
Stderr bool //Indicates whether a Stder stream is contained.
}
  1. Surprisingly, the Exec method of CRI Manager does not directly call Container Manager to run the exec command on the target container but calls the GetExecmethod of the built-in Stream Server.
  2. The GetExec method of Stream Server saves an exec request to Request Cache as shown in the figure above with a token returned. With the token, the specified exec request can be retrieved from Request Cache. Finally, the token is written to a URL that is returned as an execution result to ApiServer layer by layer.
  3. ApiServer uses the returned URL to directly initiate a request over HTTP to the node where the target container is located. “Upgrade” is contained in the request header, requesting that HTTP should be upgraded to such streaming protocol as websocket or SPDY so as to support the processing of multiple IO streams. Here is an example of SPDY.
  4. Stream Server processes the request sent by ApiServer and first retrieves the exec request saved in Request Cache. Then Stream Server replies to the HTTP request, agreeing to upgrade HTTP to SPDY. Based on the exec request, ApiServer creates a specified number of streams corresponding to Stdin, Stdout, and Stderr.
  5. When Stream Server obtains a specified number of streams, the CreateExec and startExec methods of Container Manager are called in succession to run the exec command on the target container and forward IO streams to the corresponding streams.
  6. Finally, ApiServer forwards stream data to users and enables the IO interaction between users and the target container.

Conclusion

References

  1. Introducing Container Runtime Interface (CRI) in Kubernetes
  2. CRI Streaming Requests Design Doc

--

--

--

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

From Architecture to Code: Analysis of the Latest Trends in Software Development

The Blind 75 Leetcode Series: Merge Intervals

How to Safeguard Apache Web Server on Ubuntu

Configuring SAP HANA Cluster with SUSE HAE

Metaverse-Use cases and benefits

Saas for Healthcare | Healthcare Software as a service

Full Human

Thrift types — best practice

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

More from Medium

Notes on “Paxos made simple”

Distributed systems and Parallel computing

Distributed Systems & Baskets

colorful eggs set on a yellow background

Elasticsearch for Multi-Tenant Architecture