Relive the best moments of the Apsara Conference 2019 at https://www.alibabacloud.com/apsara-conference-2019.
In this blog, Wang Xu, technical expert at Ant Financial and one of the founders of the Kata containers architecture, and Liu Jiang, technical expert in the Operating System Division of Alibaba Cloud, share their thoughts on the development of secure Container Technology. Below is a translation of their presentation at Apsara Conference 2019.
Hi everyone! First of all, we’d like to thank everyone for coming to hear our presentation! Today, we’d like to discuss our thoughts on secure container technology. I am Wang Xu, from the Ant Financial team at Alibaba Group. Six months ago, I left my startup company and joined Ant Financial. Before then, in 2017, my team and I launched the Kata Containers project together with Intel OTC in Austin, Texas. And beside me is Liu Jiang, who is a technical expert for AliOS at Alibaba Cloud. He is one of the leading talents behind Alibaba Cloud’s Elastic Container Instances (ECI) and is an active contributor to the rust-vmm project. This presentation will be from both of our perspectives. We have witnessed the development of container security and secure container technology, and today would like to talk about the future development of secure container technology based on the history of secure containers in the community and at Alibaba Cloud.
Our presentation today will consist of four parts:
- The current state of container technology in terms of usage and security
- The development of secure container technology in and outside Alibaba Cloud
- The past, present, and future of Alibaba Cloud Sandbox
- Some major technical efforts being made by Alibaba Cloud and the entire community
To the left is Wang Xu and to the right is Liu Jiang. Both are enthusiastic to share their thoughts on Container Tech.
The Current State of Container Technology
As you probably all know, containers, microservices, and cloud-native technologies are the current big trends in IT. According to a 2019 report from Portworx and Aqua Security, most companies surveyed are either using containers or considering using them.
Early this morning, I talked with Chris, from the US, who delivered a speech just before us. He said that San Diego is expecting the turnout at KubeCon at the end of this year to reach 12,000 people! He also mentioned that cloud-native technology has not only changed the architecture of existing applications, but also promoted the development of a wider variety of services, accelerating the application of IT systems.
But, even if container technology being all the hype these days, challenges still do exist. According to a report from Tripwire, about half of the companies surveyed, especially those that run a deployment of more than 100 containers, believe that their containers have security vulnerabilities. On the other hand, an even larger number of companies are not sure whether their containers are vulnerable or not. As we see things, security is not only a question of technology, but it also involves issues of confidence-something that shouldn’t be underestimated in its importance. As is shown in pink in their survey, 42% of respondents cannot fully embrace the container ecosystem due to security concerns. Therefore, security is definitely an important matter.
To put things another way, concerns about security is a step forward. Because, it’s only when you are ready to use a technology in the production environment, are you willing to take a hard look at it from a security perspective, right? Interestingly when it comes to containers, security is a rather complex subject, involving several aspects. Container security is an end-to-end technology, therefore its security thus involves the security and integrity of container images themselves, the security of the software and hardware infrastructure on which containers run, and the security of container engines.
The Development of Secure Container Technology
Container security has a long history of development. Take the namespace and cgroup features of the Linux kernel for example. This set of container technology extends kernel features from the perspective of process scheduling. With benefits such as an interface for user-friendly operation and low overhead, this set can be used outside the existing applications to serve as an isolated environment. As a pure software solution, it is compatible with both physical and virtual machines at various layers. However, the problem is that it is one part of the Linux kernel, so certain Linux isolation problems in it cannot be eradicated and may worsen due to some newly added features. At LinuxCon 2015 in Seattle, Linux Creator Linus Torvalds said in an interview, “the only real solution is to admit that a) bugs happen, and b) try to mitigate them by having multiple layers of security.”
An isolation layer is about allowing application containers to run in their own kernels. The simplest way to do this is to deploy containers in virtual machines, as shown in the leftmost section in the preceding figure. This solution does not require you to change the software stack but allows your containers to run in your own virtual machines. However, this will result in more overhead and increase your overall maintenance complexity as there will now be two layers.
Another well-established solution for dedicated kernels is unikernel, as shown in the rightmost section in the preceding figure. This solution allows applications to run with their own kernels. The benefit of this solution lies in a minimal, cut-down version of Library OS (LibOS), which has a lower overhead and a smaller attack surface. However, as of now at least it is not widely used because applications often have to be modified to work with it. Of course, compatibility is always the biggest obstacle to the adoption of platform technology. Therefore, we think, then, more suitable secure container solutions for unmodified applications would be either one of the two solutions in the middle -microVM and process virtualization. The former uses a lightweight virtual machine and a tailored Linux kernel to reduce O&M overhead while maintaining compatibility. The latter uses a specific kernel to provide Linux ABI and directly virtualizes the process runtime, maximizing compatibility for Linux applications.
Kata Containers is a microVM–based secure container solution. From an application perspective, it is a runC-compatible container runtime engine that can be called by Kubernetes through containerd or CRI-O and can directly run Docker images or OCI images. But unlike runC, Kata Containers uses hardware virtualization. In this way, your applications no longer run directly in the host kernel, but in a dedicated kernel installed in a virtual machine. Even if the dedicated kernel is attacked due to an unknown security vulnerability, the virtual sandbox cannot be easily cracked. The Kata Containers project became open-sourced in 2017. And then, in April 2018, it became the first open infrastructure project in the seven years to be under the OpenStack Foundation umbrella. As a community project, it also involved many developers outside of Alibaba Cloud and Ant Financial. Currently, the development roadmap for Kata Containers version 2.0 is under discussion. And, of course, with it being open-source, you’re welcome to contribute your code and requirements for the project.
Technically, in the Kubernetes ecosystem, Kata Containers can integrate with CRI daemons such as containerd and CRI-O like runC. We recommend that you call the containerd shim API v2, which was first introduced in the containerd community last summer. This API is also supported by CRI-O. Kata Containers is the first container engine that officially supports this new interface. This interface allows only one additional Kata worker process per pod regardless of the number of containers in the pod, which helps to alleviate pressure off of the host scheduler. Shim manages OCI containers in the pod by controlling agents in microVMs through VSOCK. VMMs supported by the community version of Kata Containers includes QEMU and the open-source Firecracker from AWS. The former has richer features and better compatibility while the latter is more compact. Based on the Alibaba spirit of “bringing you the best of everything”, you do not have to give up anything. With Kubernetes RuntimeClass, you can specify the VMMs to be used for each pod. For more information, you can refer to our documentation on the GitHub or join the discussion in our Slack channel. Do not forget to submit the issues that you encounter. This is also a huge support for the community and a contribution to open source in addition to writing code.
There are quite a few similar container solutions based on the microVM technology, in which Microsoft Hyper-V containers and WSL 2 integrated with Windows recently are familiar to us. In terms of quantity, such solutions are the norm because of their compatibility with general Docker images. Among these solutions, the most orthodox open-source solution is our Kata Containers. Of course, there are highlights in solutions based on process virtualization, in which the most prominent one is Google’s open-source gVisor container runtime. Zhengyu He, who served as a technical leader of Google when gVisor was open-sourced, is now leading our team at Alibaba Cloud.
The Past, Present, and Future of Alibaba Cloud Sandbox
During the six years, from 2013 to 2019, container technology and its ecosystem have taken a big step forward with a rapid development cycle from proposing technical concepts to building a cooperative ecosystem, and then to becoming commercially available today. Now, I believe is another turning point for this technology, a message shared at the opening of this conference.
Let us recap how containers and sandbox technology developed at Alibaba Cloud.
In 2015, more than a year after the concept of Docker was first proposed, the IT industry take the plunge towards container technology. Companies like Alibaba Cloud, Hyper.sh, and Intel all realized the limitations of runC and began to build secure container runtimes based on the virtual machine technology.
Then, in 2016, secure container technology was being deployed to the production environment after a year of research and development. The vLinux technology developed by Alibaba Cloud was used in combination with MaxCompute, and Hyper.sh also provided container services based on runV. Another important thing happened in the same year. The Container Runtime Interface (CRI) was introduced to Kubernetes to integrate container runtimes in various forms. This provided a solid foundation for the integration of secure container technology and the Kubernetes ecosystem.
In 2017, Microsoft launched a new container service called Azure Container Instances (ACI) and Alibaba Cloud developed a virtual machine-based container service. In December 2017, it was announced that the Hyper.sh runV and Intel Clear Container projects were to merge into the Kata Containers project to jointly develop secure container runtimes based on hardware virtualization technology and bring the security of virtual machines and speed of containers together.
In 2018, Alibaba Cloud launched a virtual machine-based container service and began to develop microVM-based sandbox technology. In the same year, Google gVisor and AWS Firecracker were also made open-source.
In 2019, Alibaba Cloud sandbox technology became commercially available. It supports multiple services, such as Elastic Container Instance (ECI), Container Service for Kubernetes (ACK), and Serverless App Engine (SAE) edge computing. Intel launched the Cloud Hypervisor project to create a dedicated hypervisor for cloud native scenarios.
Since the end of 2017, open source secure container technologies such as Kata Containers, gVisor, Firecracker, Nabla, and Cloud Hypervisor have emerged, and container technology has entered a stage of rapid development.
I am very pleased that the Cloud Hypervisor team joined Alibaba Digital Economy this year to build a secure and reliable container runtime for our customers.
You may wonder, what are the relationships and differences between these container runtime technologies? I will explain it using renting as an analogy.
Let’s start with Firecracker. Firecracker is not a container runtime, but rather a lightweight VMM.
RunC is similar to a shared apartment situation, where the living space is sparsely decorated, but every part of the space is being used. However, the sound insulation, security, and living experience may not be that great. On the other hand, Kata 1.x or gVisor integrated with Kubernetes through containerd or CRI-O is also similar to a shared apartment. But each tenant have his or her own separate bedrooms, but have to share the living room, kitchen, and bathroom. Next, there’s Alibaba Cloud sandbox, which is similar to a serviced apartment that offers a full-featured, standardized, room-serviced experience. Then, VM + containerd is similar to renting a single apartment, which offers much more space but may need renovation. But again Alibaba Cloud Sandbox being dedicated to providing customers with fully tailored container service, it’s more like a newly renovated single apartment.
Contributions by Alibaba Cloud and the Entire Community
As you probably already know, Alibaba Group has multiple business assets within its large ecosystem including Taobao, Tmall, and AutoNavi for customers as well as Alibaba Cloud and DingTalk for enterprise users. As a foundamental technology, Alibaba Cloud Sandbox needs to support multiple application scenarios, which can be divided into the following application models and scenarios:
The first scenario is container service for public cloud services. In this scenario, you’ll need to ensure the security of infrastructure and data and isolation between tenants in the serviced apartment that we are dedicated to.
In the second scenario, both trusted and untrusted code requires to be executed on a single server. For example, you need to execute user-uploaded code, executable files without source code, or unaudited open source code. The most typical application scenario of a sandbox is to build an execution environment with security borders and limit untrusted code to this environment, thereby protecting other components in the system.
In the third scenario, multiple services are deployed in a mixed manner. To improve resource utilization, one major technical idea is to deploy online and offline services on the same server and use the remaining resources of online services to serve offline business. This is a rather challenging task. The traditional method is to preserve enough resources for online tasks to ensure excellent online service experience. However, you’ll want to think then, how do we ensure the service experience of online tasks in the case of mixed deployment? A simple solution could be to give the online business a higher priority. However, offline tasks still tend to cause interference to online tasks due to kernel sharing, memory allocation, and I/O scheduling. Therefore, offline tasks can be placed in a dedicated sandbox to isolate resources, faults, and environments.
Therefore, to meet the preceding business requirements, At Alibaba we have developed the Alibaba Cloud sandbox technology based on our mature and stable infrastructure and the microVM technical path to build a secure, reliable, lightweight, and ecosystem-friendly container runtime for enterprises.
Simply put, a sandbox is a secure container runtime built based on microVM. First, it is a microVM built based on hardware virtualization technology. It uses a highly customized and optimized hypervisor and a minimized virtual machine device model. VMMs basically do not access guest memory. Second, a sandbox is also a container runtime that provides image distribution and management, container network, and container storage and is fully compatible with OCI and CRI specifications.
The security of Alibaba Cloud Sandbox lies in its new security system language, minimal and controllable source code, minimized device model, highly customized hypervisor, and security-hardened Alibaba Cloud Linux for Sandbox system.
So, what values can Alibaba Cloud Sandbox provide for customers? In addition to security and reliability, Alibaba Cloud Sandbox can provide customers with rapid jumpstarts, high elasticity, and low resource overheads. The test data shows that, in addition to the time to download container images, an Alibaba Cloud sandbox starts container instances in less than 500 ms (or 0.5 s). On a 96-core CPU system, the number of instances started per second is greater than 200 and the average memory usage per microVM is less than 2.5 MB.
What is the next challenge for secure containers? What is the ideal container runtime for users?
1) Surpass virtual machines in terms of security.
2) Provide similar performance to native applications.
3) Be as compatible and user-friendly as runC.
In the past, Ant Financial and Alibaba Cloud were active contributors to secure container technology. In the future, we will continue to work closely with the open source community to create a roadmap for Kata Containers version 2.0, contribute our best practices in the container and cloud services fields to the community, and contribute general-purpose technologies to the Kata Containers and Rust-VMM communities to ensure the technical alignment of Alibaba Cloud Sandbox with the community. In addition, we will work with other industry players to create a secure, reliable, efficient, and ecosystem-friendly container runtime for users.
About the Presenters
Wang Xu is a senior expert in container and cloud-native infrastructure for the Infrastructure Software Division of Ant Financial. He is one of the founders to the architecture committee for Kata Containers, which is the top project of the OpenStack Foundation. He was a former co-founder and CTO of Yin Su Shen Tong.
Liu Jiang is a senior expert of the Operating System Division at Alibaba Cloud, and is responsible for the design and implementation of cloud native underlying system architecture. He has long been committed to the development of operating system technology. He is one of the major contributors to the Linux and OpenSolaris kernels in China.