Knowledge Sharing: Category-based Interpretation of Kubernetes v1.14 Release Notes
This article is jointly written by Zhang Lei, Xin Gui, Lin Shi, Xi Yuan, Zhong Yuan, and Xun Ming
Kubernetes 1.14.0 was officially released on March 25, 2019. As you can see, this release has many important changes in comparison with Kubernetes v1.12 and v1.13. The length of Kubernetes 1.14 Release Notes also hit a record high.
The “lengthy” release notes contain a large amount of information. How can we efficiently filter and explore the information that we need to help our teams accurately and quickly sort out the most important technology updates?
This article reorganizes and sorts out the Kubernetes v1.14 Release Notes by topics, and conducts technical analysis and discussion on important changes by category. We hope the “category-based interpretation” can help you better understand the core content of the Kubernetes v1.14 release.
Production-Level Support for Windows Nodes
The release of Kubernetes v1.14 marked an important milestone in its production-level support for Windows nodes. Kubernetes v1.14 significantly enhances its support for Windows:
- Pod: Kubernetes supports readiness and liveness probes; single or multiple containers per pod with process isolation; volume sharing between containers in a pod; Kubernetes ConfigMap and Secrets; emptyDir; and resource limits. However, some other useful features such as graceful deletion, termination message, Privileged Containers, HugePages, and pod eviction policies are still unavailable in Kubernetes v1.14.
- Service: Kubernetes supports service environment variables, and provides DNS resolution; supports NodePort, ClusterIP, LoadBalancer, and Headless service. The hostnetwork mode for pods is not currently supported.
- Workload controllers: Common workload containers, such as ReplicaSet, Deployments, StatefulSets, DaemonSet, Job, and CronJob all support Windows containers.
- Kubernetes also supports other wonderful features such as pod & container metrics, Horizontal Pod Autoscaling (HPA), KubeCtl Exec, scheduler preemption, resource quotas, and CNI network. All these make Windows workload more cloud-native. Currently, the host OS version must match the container base image OS because of the special compatibility characteristics of Windows. Kubernetes v1.14 supports Windows Server 2019. The Hyper-V isolation mechanism may be used in future Kubernetes versions to solve the compatibility issue.
With the growth of the Windows container ecosystem, container services of more and more major cloud vendors began to provide production-level support for Windows nodes. Alibaba Cloud Kubernetes (ACK) recently added support for Windows Container. ACK now provides unified management for hybrid deployment of Linux and Windows applications.
For more information, see Support for Windows Nodes is Graduating to Stable (#116).
Local Persistent Volumes Are Now Generally Available (GA)
The Kubernetes community has long been yearning for a feature that allows Kubernetes to directly use local storage devices (such as a local SSD) of the host as a persistent volume (Local PV). The reasons are obvious: in comparison with remote storage (network storage), Local PV has outstanding advantages, such as low latency, ease of use, stability, and low cost. For applications that have special requirements for such characteristics (for example database and search engine applications), Local PV can make a big difference.
Local PV goes GA in Kubernetes v1.14. This offers an important possibility for persistent storage in the cloud.
However, it is important to understand that Local PV has some potential risks as well:
- Current open-source community solutions do not support dynamically creating volumes.
- The scheduler requires additional scheduling logic to ensure the scheduled node can be assigned with sufficient disk capacity.
- Poor fault tolerance — if the host on which a pod runs is down, information contained in the persisted PV of the pod may be lost.
You can solve the first problem by using a local volume provisioner provided by Alibaba Cloud to allow your local SSD Nvme instance to automatically create data volumes. However, the poor fault tolerance and poor robustness are tricky problems to solve.
For more information, see Durable Local Storage Management is Now GA (#121).
Pod Priority and Preemption Is In GA
The motivation of pod priority and preemption is clear: it enables Kubernetes (K8s) to run high-priority tasks by preempting resources of low-priority tasks.
The priority of a pod determines the importance of the pod in a cluster: (1) A high-priority pod is more likely to be scheduled first (K8s uses a queue scheduling model). But this does not necessarily mean that a high-priority pod will always be scheduled first, because there are many factors that affect the scheduling order. (2) Sometimes, a cluster may run out of resources, and the high-priority pod cannot be scheduled (no qualified node is available to run the pod). In this case, K8s will start the preemption mechanism to preempt resources of a low-priority pod that is running, and then run the high-priority pod. This is how the preemption mechanism works.
From time to time, K8s Scheduler may detect that a pod (Pod-A) does not have a proper node to run on (predicates of all nodes in the cluster failed). In this case, it removes some pods with a lower priority to “create room” for Pod-A. To implement such a “simple” idea in a distributed environment, many detailed problems must be solved, some of which are as follows. How does K8s Scheduler decide which pods of which node to remove? How can K8s Scheduler ensure that the resources created for Pod-A are not occupied by other Pods? How does K8s Scheduler prevent Pod-A from starvation? How does K8s Scheduler handle pod scheduling constraints with affinity requirements? Does K8s need to support cross-node preemption to cope with certain constraints (such as anti-affinity constraints of Failure Domain)? For more information, see Pod Priority and Preemption in Kubernetes (#564) .
Must-Know about Pod Ready++
Before the release of Kubernetes v1.14, Kubernetes determines whether a pod is ready by checking whether all containers within this pod run normally. There is a problem. The readiness of containers (or main processes of containers) in a pod does not necessarily mean that the pod is ready. To ensure the pod is normal and ready to serve traffic, we hope to use some external indicators to tell us whether the pod is really ready. These external indicators include the readiness of the service, DNS, and storage of the pod.
This feature is called Pod Readiness Gates or Pod Ready ++ in Kubernetes v1.14. Pod Ready ++ provides a strong extension to indicate the readiness of a pod. Note that, you need to compile an external controller to set values to corresponding indicators of Pod Readiness Gates.
For more information, see Pod Ready++ (#580) .
Kubernetes Native Application Management
After the release of Kubernetes v1.14, Kubernetes itself is able to manage Kubernetes native applications. The most important command that implements this feature is Kustomize.
Kustomize allows you to generate YAML files that are required to deploy applications by using the overlay method based on a base YAML file (template). This avoids directly modifying the template YAML file through replacing strings as Helm does. When you create new YAML files by using the overlay method, other users can freely use any base YAML files or YAML files that are generated at any layers. This allows every user to manage large amounts of YAML files by using Git style procedures such as fork, modify, and rebase. The idea of patching is similar to docker images. It not only avoids the modification or string replacement of YAML files, but also saves you the effort in studying DSL syntaxes (such as Lua).
After the release of Kubernetes v1.14, Kustomize becomes a subcommand of kubectl. The Kubernetes community is exploring an application management method that is different from Helm and is more Kubernetes native. We’ll see how it works.
When we get more and more familiar with Kubernetes, we rely more and more on Kubectl. Our requirements also become increasingly diversified. In Kubernetes v1.14, kubectl improves user experience and enhances its support for daily management capabilities in the following aspects:
- In the past, a kubectl cp operation can only copy one file at a time, and does not support copying multiple files at a time by using wild cards. An Ant Financial engineer submitted a Kubernetes Enhancement Proposal (KEP) to support copying multiple files at a time by using wild cards. This enhancement is included in Kubernetes v1.14 to make operating container files more convenient.
- For more information, see #72641.
- In the past, users could not easily determine which permissions have been granted to them by the administrator through RBAC. Things are changed starting from Kubernetes v1.14. For example, you can use kubectl auth can-i — list — namespace=ns1 to view the resources (such as pod and service) that you have access to in the ns1 namespace. You can also view the operation permissions that have been granted to you, such as Get, List, Patch, and Delete.
- For more information, see #64820.
- Generally, Kubernetes API resources are distributed in multiple namespaces. Deleting all of them could be troublesome. After the Kubernetes v1.14 release, you can use command kubectl delete xxx — all-namespaces to delete them all at once (xxx can be pod, services, deployment, or your custom CRD). You can also use this command together with
--field-selectorto more precisely delete resources that meet specific requirements.
- For more information, see #73716.
Like all previous versions, the Kubernetes v1.14 release has attracted many attentions to its stability and reliability enhancements. Now, take a look at some notable fixes and upgrades.
- Pod eviction now honors graceful deletion by default if no delete options are provided in the eviction request, instead of force deleting pod data in etcd. This allows evicted pods to quit gracefully.
- For more information, see #72730.
- Before restarting or removing a pod container in an unknown state, Kubelet now tries to stop the container first. This avoids the risk that multiple instances run simultaneously in the same container of a pod.
- For more information, see #73802.
- In a large cluster, if the input/output per second (IOPS) workload of a pod in a node is full, the node may frequently switch between the Ready and NotReady statuses. This frequent status change may cause large scale and unpredictable pod evictions, and cause online faults. An Ant Financial engineer proposed a fix to the problem in the Docker environment. We recommend that you check if the same problem exists in other runtime clusters.
- For more information, see #74389.
- When Kubelet is under heavy pressure, the event consumption frequency may be smaller than the event generation frequency during a Kubelet pod lifecycle. As a result, the channel that runs this event will be fully occupied, leading to Kubelet deadlock after a while. An Alibaba engineer proposed a fix to this problem.
- For more information, see #72709.
Enhanced and Optimized Performance in Large Scale Scenarios
When main functions of Kubernetes become stably available, the community has been increasingly focusing on various problems of the Kubernetes project in large scale scenarios. Kubernetes v1.14 introduces many optimizations from the perspective of end users. For example:
- kubectl traverses all groups, versions, and kinds of resources exposed by APIServer, until it discovers the resources that it needs to process. The traverse method significantly affects the user experience when you use kubectl in large clusters. In Kubernetes v1.14, this method is changed to the parallel method, which greatly improves the kubectl user experience. This sped up kubectl by >10 when calling out to kube-apiserver for discovery information.
- For more information, see #73345.
- One of the most important updates in Kubernetes v1.14 for APIServer is that the maximum number of operations for a single patch request is set to 10,000. Requests with more operations are declined. By doing so, Kubernetes prevents the entire cluster from crashing when APIServer has to process large amounts of patch requests, some may even be malicious. This is also the main fix to the CVE-2019–1002100 vulnerability.
- For more information, see #74000.
- The Aggregated API of Kubernetes allows k8s developers to develop a custom service, register this service with K8s, and use it as if it were a native API. In this case, APIServer needs to aggregate user defined API specifications with native API specifications, which consumes a lot of CPU resources and significantly affects the performance. In Kubernetes v1.14, the community greatly optimizes the operation efficiency, and considerably improved the performance of APIServer in aggregating API specifications.
- For more information, see #71223.