Understanding the Kubelet Core Execution Frame

Kubelet is the node agent in a Kubernetes cluster, and is responsible for the Pod lifecycle management on the local node. Kubelet first obtains the Pod configurations assigned to the local node, and then invokes the bottom-layer container runtime, such as Docker or PouchContainer, based on the obtained configurations to create Pods. Then Kubelet monitors the Pods, ensuring that all Pods on the node run in the expected state. This article analyzes the previous process using the Kubelet source code.

Obtaining Pod Configurations

Kubelet can obtain Pod configurations required by the local node in multiple ways. The most important way is Apiserver. Kubelet can also obtain the Pod configurations by specifying the file directory or accessing the specified HTTP port. Kubelet periodically accesses the directory or HTTP port to obtain Pod configuration updates and adjust the Pod running status on the local node.

During the initialization of Kubelet, a PodConfig object is created, as shown below:

// kubernetes/pkg/kubelet/config/config.go
type PodConfig struct {
pods *podStorage
mux *config.Mux
// the channel of denormalized changes passed to listeners
updates chan kubetypes.PodUpdate
...
}

PodConfig is essentially a multiplexer of Pod configurations. The built-in mux can listen on the sources of various Pod configurations (including apiserver, file, and http), and periodically synchronize the Pod configuration status of the sources. The pods caches the Pod configuration status of the sources in last synchronization. After comparing the configurations, mux can get the Pods of which the configurations have changed. Then, mux classifies the Pods based on the change types, and injects a PodUpdate structure into each type of Pod:

// kubernetes/pkg/kubelet/types/pod_update.go
type PodUpdate struct {
Pods []*v1.Pod
Op PodOperation
Source string
}

The Op field defines the Pod change type. For example, its value can be ADD or REMOVE, indicating to add or delete the Pods defined in Pods. Last, all types of PodUpdate will be injected to updates of PodConfig. Therefore, we only need to listen to the updates channel to obtain Pod configuration updates of the local node.

Pod Synchronization

After the Kubelet initialization is complete, the syncLoop function as shown below is invoked:

// kubernetes/pkg/kubelet/kubelet.go
// syncLoop is the main loop for processing changes. It watches for changes from
// three channels (file, apiserver, and http) and creates a union of them. For
// any new change seen, will run a sync against desired state and running state. If
// no changes are seen to the configuration, will synchronize the last known desired
// state every sync-frequency seconds. Never returns.
func (kl *Kubelet) syncLoop(updates <-chan kubetypes.PodUpdate, handler SyncHandler){
...
for {
if !kl.syncLoopIteration(...) {
break
}
}
...
}

As indicated in the comments, the syncLoop function is the major cycle of Kubelet. This function listens on the updates, obtains the latest Pod configurations, and synchronizes the running state and desired state. In this way, all Pods on the local node can run in the expected states. Actually, syncLoop only encapsulates syncLoopIteration, while the synchronization operation is carried out by syncLoopIteration.

// kubernetes/pkg/kubelet/kubelet.go
func (kl *Kubelet) syncLoopIteration(configCh <-chan kubetypes.PodUpdate ......) bool {
select {
case u, open := <-configCh:
switch u.Op {
case kubetypes.ADD:
handler.HandlePodAdditions(u.Pods)
case kubetypes.UPDATE:
handler.HandlePodUpdates(u.Pods)
...
}
case e := <-plegCh:
...
handler.HandlePodSyncs([]*v1.Pod{pod})
...
case <-syncCh:
podsToSync := kl.getPodsToSync()
if len(podsToSync) == 0 {
break
}
handler.HandlePodSyncs(podsToSync)
case update := <-kl.livenessManager.Updates():
if update.Result == proberesults.Failure {
...
handler.HandlePodSyncs([]*v1.Pod{pod})
}
case <-housekeepingCh:
...
handler.HandlePodCleanups()
...
}
}

The syncLoopIteration function has a simple processing logic. It listens to multiple channels. Once it obtains a type of event from a channel, it invokes the corresponding function to process the event. The following is the processing of different events:

  1. Obtain the Pod configuration changes from configCh, and invoke the corresponding function based on the change type. For example, if new Pods are bound to the local node, the HandlePodAdditionsfunction is invoked to create these Pods. If some Pod configurations are changed, the HandlePodUpdates function is invoked to update the Pods.
Image for post
Image for post

As shown in the above figure, the execution paths of most processing functions are similar. The functions, including HandlePodAdditions, HandlePodUpdates, and HandlePodSyncs will invoke the dispatchWork function after completing their own operations. If the dispatchWork function detects that the Pod to be synchronized is not in the Terminated state, it invokes the Update method of podWokersto update the Pod. We can consider the process of Pod creation, update, or synchronization as the status transition from running to desired. This helps you understand the Pod update and synchronization processes. For Pod creation, we can consider that the current status of new Pod is null. Then the Pod creation can also be considered as a status transition process. Therefore, in Pod creation, update, or synchronization, the status of Pods can be changed to the target status only by invoking the Updatefunction.

podWorkers is created during Kubelet initialization, as shown below:

// kubernetes/pkg/kubelet/pod_workers.go
type podWorkers struct {
...
podUpdates map[types.UID]chan UpdatePodOptions
isWorking map[types.UID]bool lastUndeliveredWorkUpdate map[types.UID]UpdatePodOptions workQueue queue.WorkQueue syncPodFn syncPodFnType

podCache kubecontainer.Cache
...
}

Kubelet configures a dedicated pod worker for each created pod. The pod worker is in fact the goroutine. It creates a channel with buffer size 1 and type UpdatePodOptions (which is a pod update event), listens to the channel to obtain pod update events, and invokes the specified synchronization function in the syncPodFn field of podWorkers to perform synchronization.

In addition, the pod worker registers the channel to the podUpdates map in podWorkers so that the specified update event can be sent to the corresponding pod worker for processing.

If another update event occurs when the current event is being processed, what will happen? podWorkerscaches the latest event to lastUndeliveredWorkUpdate, and processes it immediately after the processing of the current event is complete.

The pod worker adds the processed pod to workQueue of podWorkers every time an update event is processed, and inserts an additional delay. The pod can be retrieved from the queue only when the delay expires, and the next synchronization is performed. As previously mentioned, syncCh is triggered every second to collect the Pods to be synchronized on the local node, and then HandlePodSyncs is invoked to perform synchronization. These Pods are expired at the current time point and are obtained from workQueue. Then, the entire pod synchronization process form a closed ring, as shown below.

Image for post
Image for post

When creating the podWorkers object, Kubelet uses its own syncPod method to initialize syncPodFn. However, this method is only used to prepare the synchronization. For example, it uploads the latest Pod status to Apiserver, creates the dedicated directory for Pods, and obtains the pull secrets of Pods. Then, Kubelet invokes the SyncPod method of its own containerRuntime for synchronization. containerRuntime abstracts the bottom-layer container running of Kubelet, and defines various interfaces for container running. SyncPod is one of the interfaces.

Kubelet does not carry out any container-related operation. Pod synchronization is essentially the container status change. Achieving container status change must invoke and run the bottom-layer container such as PouchContainer.

The following describes the SyncPod method of containerRuntime to show the real synchronization operations:

// kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go
func (m *kubeGenericRuntimeManager) SyncPod(pod *v1.Pod, _ v1.PodStatus, podStatus *kubecontainer.PodStatus, pullSecrets []v1.Secret, backOff *flowcontrol.Backoff) (result kubecontainer.PodSyncResult)

This function first invokes computePodActions(pod, podStatus) to compare the current Pod status podStatus and target Pod status pod, and then calculates the required synchronization operations. After the calculation is complete, a PodActions object is returned, as shown below:

// kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go
type podActions struct {
KillPod bool

CreateSandbox bool

SandboxID string

Attempt uint32

ContainersToKill map[kubecontainer.ContainerID]containerToKillInfo

NextInitContainerToStart *v1.Container

ContainersToStart []int
}

Actually, PodActions is an operation list:

  1. Generally, the values of KillPod and CreateSandbox are the same, indicating whether to kill the current Pod sandbox (if a new Pod is created, this operation is null) and create a new sandbox.

With such an operation list, the remaining operations of SyncPod are simple. That is, it only needs to invoke the interfaces corresponding to the bottom-layer container running one by one to perform the container adding and deleting operations, to complete synchronization.

The summarized Pod synchronization procedure is: When the Pod target status changes or a synchronization interval times out, a Pod synchronization is triggered. Synchronization is to compare the container target status with the current status, generate a container start/stop list, and invoke the bottom-layer container runtime interfaces based on the list to start or stop the containers.

Conclusion

If a container is a process, Kubelet is a container-oriented process monitor. The job of Kubelet is to continuously change the Pod running status on the local node to the target status. During the transition, unwanted containers are deleted and new containers are created and configured. There is no repeated modification, start, or stop operations on an existing container. This is all about Kubelet’s core processing logic.

Note

  1. The source code in this article is from Kubernetes v1.9.4, commit: bee2d1505c4fe820744d26d41ecd3fdd4a3d6546

Reference:

https://www.alibabacloud.com/blog/understanding-the-kubelet-core-execution-frame_593904?spm=a2c41.11896786.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store