As the most popular container cluster management platform, Kubernetes needs to coordinate the overall resource usage of a cluster and allocate appropriate resources to containers in pods. It must ensure that resources are fully utilized to maximize resource utilization, and that important containers in operation are allocated sufficient resources for stable operation.
Configure Constraints on Container Resources
The most basic resource metrics for a pod are CPU and memory.
Kubernetes provides requests and limits to pre-allocate resources and limit resource usage, respectively.
Limits restrict the resource usage of a pod as follows:
- If its memory usage exceeds the memory limit, this pod is out of memory (OOM) killed.
- If its CPU usage exceeds the CPU limit, this pod is not killed, but its CPU usage is restricted to the limit.
Testing the Memory Limit
Deploy a stress testing container in a pod and allocate a memory of 250 MiB for stress testing. The memory limit of the pod is 100 MiB.
- name: memory-demo-2-ctr
args: ["--vm", "1", "--vm-bytes", "250M", "--vm-hang", "1"]
After deployment, check the pod status. We can see that it is OOM killed.
kubectl -n example get po
NAME READY STATUS RESTARTS AGE
memory-demo 0/1 OOMKilled 1 11s
Testing the CPU Limit
- name: cpu-demo-ctr
Check the container information. Although the pod is not killed, its CPU usage is restricted to 1,000 millicpu.
kubectl -n example top po cpu-demo
NAME CPU(cores) MEMORY(bytes)
cpu-demo 1000m 0Mi
Kubernetes manages the quality of service (QoS). Based on resource allocation for containers, pods are divided into three QoS levels: Guaranteed, Burstable, and BestEffort. If resources are insufficient, scheduling and eviction strategies are determined based on the QoS level. The three QoS levels are described as follows:
- Guaranteed: Limits and requests are set for all containers in a pod. Each limit is equal to the corresponding request. If a limit is set but the corresponding request is not set, the request is automatically set to the limit value.
- Burstable: Limits are not set for certain containers in a pod, or certain limits are not equal to the corresponding requests. During node scheduling, this type of pod may overclock nodes.
- BestEffort: Limits and requests are not set for any containers in a pod.
Code for querying QoS: https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/helper/qos/qos.go
Impact of Different QoS Levels on Containers
Kubernetes sets the
oom_score_adj parameter based on the QoS level. The oom_killer mechanism calculates the OOM score of each pod based on its memory usage and comprehensively evaluates each pod in combination with the
oom_score_adj parameter. Pods with higher process scores are preferentially killed when OOM occurs.
If the memory resources of a node are insufficient, pods whose QoS level is Guaranteed are killed in the end. Pods whose QoS level is BestEffort are preferentially killed. For pods whose QoS level is Burstable, the oom_score_adj parameter value ranges from 2 to 999. According to the formula, if the memory request is larger, the oom_score_adj parameter value is smaller and such pods are more likely to be protected during OOM.
# kubectl describe no cn-beijing.i-2zeavb11mttnqnnicwj9 | grep -A 3 Capacity
- name: memory-demo-qos-1
args: ["--vm", "1", "--vm-bytes", "50M", "--vm-hang", "1"]
- name: memory-demo-qos-2
args: ["--vm", "1", "--vm-bytes", "50M", "--vm-hang", "1"]---
- name: memory-demo-qos-3
args: ["--vm", "1", "--vm-bytes", "50M", "--vm-hang", "1"]
Each node can be allocated a memory of 8,010,196 KiB, which is about 7,822.45 MiB.
According to the formula for the QoS level at Burstable:
request 200Mi: (1000 ¨C 1000 x 200/7822.45) = About 975request 400Mi: (1000 ¨C 1000 x 400/7822.45) = About 950
The oom_score_adj parameter values of these three pods are as follows:
// request 200Mi
kubectl -n example exec memory-demo-qos-1 cat /proc/1/oom_score_adj
975// request 400Mi?
kubectl -n example exec memory-demo-qos-2 cat /proc/1/oom_score_adj
kubectl -n example exec memory-demo-qos-3 cat /proc/1/oom_score_adj
Code for setting OOM rules: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/qos/policy.go
If the memory and CPU resources of a node are insufficient and this node starts to evict its pods, the QoS level also affects the eviction priority as follows:
- The kubelet preferentially evicts pods whose QoS level is BestEffort and pods whose QoS level is Burstable with resource usage larger than preset requests.
- Then, the kubelet evicts pods whose QoS level is Burstable with resource usage smaller than preset requests.
- At last, the kubelet evicts pods whose QoS level is Guaranteed. The kubelet preferentially prevents pods whose QoS level is Guaranteed from being evicted due to resource consumption of other pods.
- If pods have the same QoS level, the kubelet determines the eviction priority based on the pod priority.
Kubernetes provides the ResourceQuota object to set constraints on the number of Kubernetes objects by type and the amount of resources (CPU and memory) in a namespace.
- One or more ResourceQuota objects can be created in a namespace.
- If the ResourceQuota object is configured in a namespace, requests and limits must be set during deployment; otherwise, pod creation is rejected.
- To avoid this problem, the LimitRange object can be used to set the default requests and limits for each pod.
- For more information about extended resources supported in versions later than Kubernetes V1.10, see https://kubernetes.io/docs/tasks/configure-pod-container/extended-resource/
The LimitRange object is used to set the default resource requests and limits as well as minimum and maximum constraints for each pod in a namespace.
- default: # default limit
defaultRequest: # default request
max: # max limit
min: # min request
maxLimitRequestRatio: # max value for limit / request
type: Container # limit type, support: Container / Pod / PersistentVolumeClaim
The LimitRange object supports the following parameters:
- default: indicates default limits.
- defaultRequest: indicates default requests.
- max: indicates maximum limits.
- min: indicates minimum requests.
- maxLimitRequestRatio: indicates the maximum ratio of a limit to a request. Because a node schedules resources based on pod requests, resources can be oversold. The maxLimitRequestRatio parameter indicates the maximum oversold ratio of pod resources.
Kubernetes sets requests and limits for container resources. To maximize resource utilization, Kubernetes determines scheduling strategies based on pod requests to oversell node resources.
Kubernetes restricts the resource usage of pods based on preset limits. If the memory usage of a pod exceeds the memory limit, this pod is OOM killed. The CPU usage of a pod cannot exceed the CPU limit.
Based on requests and limits of each pod, Kubernetes determines the QoS level and divides pods into three QoS levels: Guaranteed, Burstable, and BestEffort. If node resources are insufficient and pods are to be evicted or OOM killed, the kubelet preferentially protects pods whose QoS level is Guaranteed, and then pods whose QoS level is Burstable (where pods whose QoS level is Burstable with larger requests but less resource usage are preferentially protected). The kubelet preferentially evicts pods whose QoS level is BestEffort.
Kubernetes provides the RequestQuota and LimitRange objects to set constraints on pod resources and the number of pods in a namespace. The RequestQuota object is used to set the number of various objects by type and the amount of resources (CPU and memory). The LimitRange object is used to set the default requests and limits, minimum and maximum requests and limits, and oversold ratio for each pod or container.
Reasonable and consistent pod limits and requests must be set for some important online applications. Then, if resources are insufficient, Kubernetes can preferentially guarantee the stable operation of such pods. This helps improve resource usage.
Pod requests can be appropriately reduced for some non-core applications that occasionally occupy resources. In this case, such pods can be allocated to nodes with fewer resources during scheduling to maximize resource utilization. However, if node resources become insufficient, such pods are still preferentially evicted or OOM killed.
To learn more about Alibaba Cloud Container Service for Kubernetes, visit https://www.alibabacloud.com/product/kubernetes