By Container Service Team
As we all know, the central project for cloud native architectures is Kubernetes, which focuses on applications. Better application deployment and more efficient development can bring tangible benefits to teams and organizations and utilize the cloud-native technology better. The momentum of revolution not only challenges old and closed internal systems, but also facilitates the emergence of new tools for developers. At this KubeCon, lots of new knowledge about application management was disclosed. What ideas and train of thought can we learn from this KubeCon to avoid detours? What technology evolution trend do these ideas indicate?
In this article, we invited Deng Hongchao, a technical expert on Alibaba Cloud Container Platform, former engineer at CoreOS, and one of the core authors of the k8s Operator project, to analyze and comment on some important aspects of application management.
Configuration Changes and Grayscale Upgrades
Applications deployed on Kubernetes generally store configurations in ConfigMap and then mount them to the Pod file system. When ConfigMap is changed, only files mounted in the Pod are automatically updated. This method is okay for applications that automatically perform hot updates (such as nginx). However, most application developers are inclined to perform a phased release after configurations are changed and the grayscale upgrade of containers associated with ConfigMap.
Grayscale upgrade simplifies user code and enhances security and stability while also reflecting the idea of immutable infrastructure. Once an application is deployed, no changes will be made. When the application needs to be upgraded, developers only need to deploy a new version, verify the new version and destroy the older version. If the verification fails, it is also convenient and easy to roll back to the older version. Based on this idea, engineers at Pusher developed Wave, a tool that automatically listens to the ConfigMap/Secret associated with a Deployment and triggers the upgrade of the Deployment when the configuration is changed. One unique feature of Wave is that it automatically searches for ConfigMaps/Secrets within a Deployment PodTemplate, calculates a hash of all the data stored and stores the calculated hash as an annotation on the PodTemplate. Wave will recalculate the hash and update the PodTemplate annotations to trigger the update of the Deployment whenever the hash is changed. Coincidentally, the open-source community also provides another similar tool — Reloader, which has similar functions. In addition, Reloader allows users to choose which ConfigMaps/Secrets will be monitored.
Analysis and Comment
If you do not use the grayscale upgrade, you may be blamed for potential problems. Whether you are upgrading application images or changing the configuration, remember to make a new phased release and verification.
In addition, immutable infrastructures bring a new trend to the construction of cloud computing applications. This development trend not only makes architectures safer and more reliable, but also allows them to be combined with popular tools, fully utilizing the advantages of the cloud-native community and surpassing traditional application services. Let me give you an example. By combining the aforementioned Wave project and the weighted routing feature in Istio, we can implement the configuration verification on a website through a small amount of traffic.
Kubernetes is a declarative resource management system. A user defines an expected status locally and uses “kubectl apply” to update the part specified in the current cluster status. However, it is easier said than done.
The original kubectl apply is implemented based on the client. When applying, you cannot simply replace the overall state of a single resource, because other people may also change resources such as controllers, admissions, and webhooks. Then how can you ensure that your changes to a resource will not overwrite changes made by others? To do that, you can use 3-way merge: Store the last applied state in Pod annotations, conduct a 3-way diff based on the latest state, last applied, and specified state during the next apply operation and send the generated patches to the APIServer.
However, this approach still has a problem. The original goal of “apply” is to allow individuals to specify which resource fields they manage. However, the original implementation neither prevents different individuals from tampering with fields, nor notifies users of conflicts and resolves conflicts when conflicts occur. For example, when I was working at CoreOS, both built-in controllers and users may change some special labels of the node objects, leading to conflicts and requiring manual efforts to fix cluster failures.
This Cthulhu-type fear has hung over every k8s user. Today, we finally have a solution — server-side apply. The APIServer performs the diff and merge operations and solves many of the previous problems. More importantly, compared with the last-applied annotations used before, server-side apply provides a new declarative API called ManagedFields to specify who manages which resource fields. In the event of a conflict, for example, if both kubectl and a controller change the same field, an error will be returned for non-Admin requests and a prompt will be provided to solve the problem.
Analysis and Comment
You do not have to worry about kubectl apply any more. Although server-side apply is still in the alpha phase, it is only a matter of time for server side apply to replace client-side apply. In this way, it is more safe and more reliable for different components to change the same source at the same time.
In addition, as the system is developed, especially with widespread use of declarative APIs, there will be less local logic, and more server-side logic. Server-side logic has many advantages. Many operations such as kubectl dry-run and diff run easier on the server-side. HTTP endpoints provided make it easier to build the “apply” feature into other tools. Implementing and releasing complex logic on the server side enables easier management and control, and allows users to use secure, consistent and high-quality services.
At this conference, a symposium team also discussed the advantages of GitOps:
- GitOps makes the entire team more “democratic”. Anything can be written and recorded for review at any time. Pull requests are required to release any changes. You can clearly see each change, participate in the review and provide your comments. All the changes and discussions are recorded in tools like GitHub. The history of changes and discussions is always available. These features make team collaboration smoother and more professional.
- GitOps makes the release safer and more stable. Code can no longer be published at will. Instead, review by a corresponding person in charge is required to publish the code. When rollback is needed, older versions are available in Git. An audit history provides information about who publishes what code at what time. This makes the release process more professional and the release result more stable and reliable.
Analysis and Comment
GitOps is more than a technical solution. More importantly, it utilizes the version, history, audit, and permission features of tools like GitHub to ensure team collaboration and release is more professional and standardized.
If widely applied, GitOps will have significant industry-wide impact. For example, a new developer recruited into any company can quickly start to publish code.
It is worthwhile to learn from the “configuration as code” and “Git as the source of truth” ideas of GitOps and apply them in practice.
Automated Canary Rollout
Canary rollout refers to importing a small portion of traffic to a new version during the release process and then analyze and verify if the “go live” operation is normal. If everything is normal, traffic will be further switched to the new version until the old version has no traffic and is destroyed. We know that it is required to pass manual validation in tools like Spinnaker. This process can be replaced by using an automated tool. After all, the check process is done in a mechanical manner, for example, checking the success rate and p99 latency.
Based on the aforementioned ideas, engineers from Amadeus and Datadog shared how to use tools such as Kubernetes, Operator, Istio, and Prometheus to perform the canary rollout. The main procedure is to abstract the entire canary rollout into a CRD and write a declarative YAML file. When receiving the YAML file created, Operator will automatically complete the complex O&M operations. The main steps are as follows:
- Deploy a new version of the service.
- Modify the Istio VirtualService configuration and switch a portion of the traffic to the new version first.
- Check whether the success rate of services in the new version and the p99 response time meet the requirements.
- If the requirements are met, upgrade the entire application to the new version; otherwise roll back to the old version.
Weave also developed its own automated canary rollout tool — Flagger. However, Flagger can progressively switch a portion of traffic to the new version, for example, 5% of the traffic each time, and directly destroy the old version when all the traffic is switched to the new version.
Analysis and Comment
You can benefit a lot from the canary rollout. Canary rollout helps improve the success rate of publishing an application and the system stability. Canary rollout is an important part of application management.
In addition, the complex O&M procedures in the cloud-native era will be simplified and standardized. The CRD abstraction converts these complex procedures to several short APIs, which are provided to users. When using Operator for automated O&M, you can implement these aforementioned features on Kubernetes. Being two top-level standard platforms, Istio and Kubernetes provide powerful basic capabilities to have users get started easily.
In this article, we have discussed some new approaches to application management, and provided a detailed commentary of each approach.
- We have mentioned why and how to publish a new application version when the configuration file is changed.
- Running kubectl apply on the client-side can cause many problems, for example, it may allow tampering with resource fields. These problems can be solved by using server-side apply.
- GitOps not only resolves a technical problem, but also makes team collaboration and release more professional and standardized.
- Top-level standard platforms such as Kubernetes, Operator, Istio, and Prometheus can simplify the O&M operations for the canary rollout and lower the threshold for developers.
In the past, we were always envious of the architectures developed by others, which are excellent but not available to us. Now, open-source projects and technology standards are reducing the technical threshold so that every developer can utilize these technologies. In addition, a subtle change is also happening: Self-built basic software inevitably follows the law of diminishing marginal utility, causing more and more enterprises (for example, Twitter) to join the cloud-native community. It is an important opportunity and challenge for Internet enterprises to embrace the open-source ecosystem and technical standards. To get fully prepared for cloud migration, we need to build cloud-native applications and architectures and make use of the power of the cloud and open-source technology.