When Kubernetes Encounters Confidential Computing, How Does Alibaba Protect the Data in the Container?
By Jia Zhiguang, a Senior Development Engineer at Alibaba. He specializes in the Kubernetes sandbox and confidential computing fields and participates in the development of the Inclavare Containers community.
This article describes confidential computing, the architecture of the open-source program of Inclavare Containers, supported features, the iteration plans of the insular containers, and the development status and planning of Alibaba Cloud ACK-TEE.
Introduction to Confidential Computing
1. Application Containers Security Status
According to the 2019 Annual Container Adoption Survey released by Portworx and Aqua Security, security has become the biggest challenge for users to use the container technology and migrate the business to the cloud. Data security is the most critical issue. According to the data breach report released by Risk Based Security, the number and volume of data leaks in 2019 increased by more than 50% compared with 2018.
2. Arrival of the Confidential Computing Era
Data has three states throughout its lifecycle: at-rest, in-transit, and in-use.
- Generally, the user stores data in hard disks, flash memory, or other storage devices when the data is under the at-rest state. There are many ways to protect the at-rest data, such as encrypting files for storage or encrypting storage devices.
- The in-transit state refers to transmitting data from one place to another through public or private networks. Users can encrypt files or use secure transmission protocols to ensure the security of data in transmission, such as HTTPS, Secure Sockets Layer (SSL), Transport Layer Security (TLS), and File Transfer Protocol with SSL Security (FTPS).
- However, data in the in-use state has not been well protected for a long time (until confidential computing was created.)
The Confidential Computing Consortium (CCC) defines confidential computing as protecting data in a Trusted Execution Environment (TEE) based on hardware.
The core functions of confidential computing are listed below:
- Protect the confidentiality of the in-use data: The in-memory data is encrypted, so the data will not be disclosed even if it is stolen by an attacker.
- Protect the integrity of the in-use data: A metric value guarantees the integrity of the data and code. Any changes to data or code during use may cause changes to the metric value.
- Protect the security of the in-use data: Compared with common applications, confidential computing applications have a smaller Trusted Compute Base (TCB), which means a smaller attack surface and more security. For example, taking Intel Software Guard Extensions (SGX) as an example, access to all but the CPUs and trusted applications, including the operating system and hypervisors, is denied.
Confidential computing was also included in Gartner’s Computing Infrastructure Maturity Curve in 2019. Although it is still in the early stage, it shows how confidential computing has gradually entered the public’s field of view and received attention.
In 2020 Gartner’s Cloud Vendor Local Security Solution Comparison, Alibaba Cloud got an H in the TEE field after the Alibaba Cloud Container Service for Kubernetes (ACK) released a confidential computing product, ACK-TEE, in early 2020.
3. Confidential Computing Business Scenarios
Confidential computing is designed to protect sensitive code and data. Business scenarios include blockchain, key management, finance, AI, multi-party computing, data leasing, and edge computing.
Taking multi-party computing as an example, different users or manufacturers share data with each other to dig out the greater economic value of data but do not want to disclose their data to each other. Confidential computing ensures that shared data is run in a TEE protected by hardware. Data is encrypted in the memory to prevent data leakage.
4. Differences Between Secure Containers and Confidential Computing
In addition to confidential computing, another security-related concept is the secure containers. Alibaba Cloud has dedicated time to the fields of secure containers and confidential computing. Both are related to security, but their positioning and application scenarios are different.
The positioning of secure containers is to isolate malicious applications and prevent them from destroying other applications.
There are three main application scenarios:
- Untrusted Load Isolation
- Multi-Tenant Application Isolation
- Performance and Fault Isolation
Confidential computing is used to prevent data theft and breaches of the application. It protects sensitive code and data.
5. TEE Hardware Platform
Three hardware platforms support the TEE: Intel SGX, ARM TrustZone, and Advanced Micro Devices (AMD) Secure Encrypted Virtualization (SEV). They have different application scenarios and implementation methods:
- ARM TrustZone divides hardware resources into two parts, the secure world and the non-secure world. All the confidential operations are performed in the secure world, while other operations are performed in the non-secure world. You can switch between the secure world and the non-secure world using the monitor mode. Typical application scenarios include mobile payments and digital wallets.
- AMD uses technologies, such as SEV, Secure Memory Encryption (SME), and Encrypted State (ES), to encrypt the guest memory and secure the isolation of virtual machines.
- Intel SGX is a set of instructions provided by Intel to improve the security of application code and data. Users can put sensitive data into the enclave environment. The enclave is a protected TEE.
The Alibaba Cloud ACK-TEE and the Inclavare Containers are confidential computing based on Intel SGX.
6. Intel SGX With a Smaller TCB
When sensitive applications are deployed normally, they depend on the operating system, Virtual Machine Monitor (VMM), hardware, and cloud vendors. If the TCB is large, it will face a large range of attacks. As long as there is an attack in TCB, the application is at risk of data leakage and destruction.
TCB only has the CPU and TEE itself when deploying sensitive applications in the TEE of Intel SGX. While reducing the attack surface, the TEE-based security mechanism makes applications more secure.
7. Process of Developing and Using Intel SGX-Based Trusted Applications
Intel SGX divides applications into trusted and untrusted zones. Users can define trusted and untrusted zones and the functions used in the Enclave Definition Language (EDL). The functions that communicate between trusted and untrusted zones are divided into ECALL and OCALL functions. ECALL functions allow you to access data in the untrusted zone, and OCALL functions allow you to access data in the untrusted zone.
The process of developing and using Intel SGX-based trusted applications is listed below:
1. Apply for secret key: Apply for SGX-related commercial signature encryption keys from Intel.
2. Install the environment:
- Install the Intel SGX driver
- Install the SGX Software Development Kit (SDK) and Program Status Word (PSW)
- Install Application Enclave Services Manager (AESM)
3. Application Development:
- Specify the code and data to be protected in the trusted zone of applications
- Compile the EDL files to clarify ECALL and OCALL functions
- Write the code of the trusted zone and the code of the non-trusted zone
4. Code Compilation and Building:
sgx_edger8rto produce proxy functions for ECALL's untrusted zones and trusted proxy functions for OCALL based on edl files
- Compile the enclave dynamic link library files
- Sign the enclave dynamic link library file generated in the previous step
- Compile the application and package the image
5. Run a container using Docker
Protecting Sensitive Applications and Data With Inclavare Containers
1. Objectives and Value of Inclavare Containers
Inclavare is the Latin etymology of the word enclave [ˈinklɑveə]. The enclave refers to a protected execution environment that provides strong security isolation based on key science algorithms for sensitive and confidential data in this environment. It prevents untrusted entities from accessing users’ digital assets.
Inclavare Containers is an open-source container runtime technology for confidential computing scenarios. It is led by the Alibaba Cloud Operating System Security Team, Alibaba Cloud Cloud-Native Container Service Team, and multiple R&D teams in the Alibaba economy, including the Ant Financial Security Computing Team, Cloud Security Team, Language Runtime Team.
The current confidential computing technology in the cloud-native scenarios has many defects and shortcomings:
- The use and development costs are relatively high.
- Containerization and integration with Kubernetes are costly and complex.
- The technical solutions provided by service providers are also relatively single.
Due to the reasons above, it is not conducive to the popularization and application of confidential computing technology. The technology provides an open-source container runtime engine and security architecture for the industry in the field of confidential computing. Its values are listed below:
- It reduces the high threshold of confidential computing and provides users with the same sense of use as ordinary containers.
- Based on the various hardware security technologies provided by processors, it provides user workloads with different enclave forms, providing more choice and flexibility between security and cost.
2. Inclavare Containers Architecture
Before introducing the Inclavare Containers architecture, let’s take a look at the role of each component in the architecture:
- Kubelet is the main node agent running on each node of a Kubernetes cluster. It is responsible for communicating with Apiserver and managing pods on nodes.
- Containerd is an industry-grade standard container runtime that emphasizes simplicity, robustness, and portability. Containerd can manage the complete container lifecycle on a host, the transmission and storage of container images, container execution and management, storage, and network.
- Shim-rune: Shim is provided for container runtime. It is used to manage the lifecycle of containers and convert common images into TEE images.
- Rune: Rune is a command line tool for generating and running the enclave in containers according to the Oracle Call Interface (OCI) specifications. Rune was developed based on the runc code. You can run a common runc container or the enclave container.
- SGX LibOS: SGX Library Operating System (LibOS) allows common applications to run on Intel SGX with or without making a few changes. Currently, the Inclavare Containers supports LibOS, such as Occlum, and the Graphene-SGX component is being integrated.
- Language Runtime: LibOS supports multiple languages. For example, Occlum provides the Golang and JDK runtime.
- Platform Abstraction Layer (PAL)-API is an interface used for communication between rune and LibOS. For example,
pal_initis used to initialize the enclave, and
pal_create_processis used to create the enclave.
- Liberpal.so is a Linux dynamic library that implements PAL-API. It is mainly responsible for the communication between rune and LibOS.
The workflow of Inclavare Containers is listed below:
Kubelet initiates Container Runtime Interface (CRI) requests to Containerd, such as, requests to create a Pod.
A cri-containerd plug-in is provided for Containerd. After Containerd receives a request, it forwards the request to shim-rune.
Shim-rune can create both runc and rune containers. The processes for creating runc containers are different from those for rune containers:
- Create a runc container: The process is the same as creating a common runc container. For example, the pause container of a Pod is the runc container.
- Create a rune container: Use LibOS to convert a common image to a TEE Image. Rune creates an enclave in the container and runs the application in the enclave.
Rune loads liberpal.so for communication between rune and LibOS.
Rune loads the Intel SGX driver into a container, creates process 1 in the container as init-runelet, and then uses init-runelet to create the enclave. The enclave is a TEE protected by Intel SGX. It includes the LibOS, language runtime, and applications. Now, a trusted application is running.
A summary of the Inclavare Containers characteristics are listed below:
- It integrates the Intel SGX with the container ecosystem and is compatible with the Open Container Initiative (OCI) runtime and OCI image standards. It implements the enclave container architecture.
- It achieves seamless integration with the Kubernetes ecosystem.
- It improves the compatibility problems caused by the constraints introduced by Intel SGX with the help of LibraryOS (LibOS) technology.
- It provides support for runtime in advanced languages to improve compatibility.
- It defines the Enclave Runtime PAL API specification to build the enclave runtime ecosystem.
3. Shim-Rune Workflow
The shim-rune consists of two parts: Core and Carrier. They serve the following:
- Manage the container lifecycle
- Use LibOS to convert common containers into TEE images.
The shim-rune workflow is listed below:
- Taking the container image as input, the enclave dynamic libraryis generated by LibOS.
- The signature materials are exported from the enclave dynamic library.
- Use the signature material as the input to request the signature service to sign
- The digest file and public key are returned.
- A signed dynamic library is generated.
- Rune loads the signature dynamic library and creates and starts the enclave
4. Client Signature and Server Signature
Inclavare Containers works in two ways, client signature and server signature. The differences are listed below:
Compared with client signature, server signature has the following advantages:
It lowers the requirements for developers to use LibOS. Developers don’t need to master the technology of Intel SGX. They can build ordinary images according to LibOS requirements.
Note: Each LibOS has certain requirements for common images. For example, Occlum only supports musl libc and does not support glibc. Therefore, the glibc application needs to be transformed into the musl libc application before it can run in the Inclavare Containers.
- Users do not need to apply for business certificates from Intel.
- It can be run in a Kubernetes cluster.
5. Multi-Team Construction and Cooperation
The Inclavare Containers project is built and cooperated by multiple teams. The roles of each component and the team division are listed below:
- Occlum is a multi-process library operating system developed by the Ant Financial Security Computing Team. It is based on the Intel SGX technology, which achieves memory security.
- Graphene-SGX is an open-source LibOS based on the Intel SGX technology that runs unmodified programs.
- Dragonwell: The Long-Term Support (LTS) OpenJDK release version customized by the Alibaba Compiler Team
- Sgx-device-plugin is a Kubernetes device plugin developed by the Alibaba Cloud Container Service Team and Ant Financial Security Computing Team for Intel SGX.
- Alibaba Cloud Linux: The Alibaba BaseOS Team provides a full-stack adaptation of Alibaba Cloud Linux for Inclavare Containers.
6. Open-Source Project: Inclavare Containers
Inclavare Containers is the first open-source container runtime technology stack in the industry to serve the cloud-native confidential computing scenarios. It has been rated as a key open-source project by the Alibaba Open-Source Committee. It is added to the official reference implementation list of confidential computing OCI runtime.
The following features are currently supported:
- Use Kubernetes and Docker to start the enclave container
- Two mainstream library operating systems, Occlum and Graphene
- Java and Golang runtimes
This project is released at the end of each month, providing the community with binary releases of Community Enterprise Operating System (CentOS) and Ubuntu. The Alibaba Cloud Linux release version is provided to the insiders.
7. Milestones of Inclavare Containers
8. Confidential Computing Technology Industry in 2020
ACK-TEE was established in September 2019. Its features are listed below:- Users with strong security requirements for digital assets (algorithms, data, and code) get a TEE based on hardware encryption technology.
- It lowers the application threshold of confidential computing technology.
- It simplifies the development, delivery, and management costs of trusted and confidential applications.
- Cooperation Teams: The Alibaba Cloud Container Service Team, Operating System Kernel Team, Cloud Security Team, and Ant Financial Security Team, and Runtime Language Team
- Positioning is a cloud-native confidential computing container platform.
- Mission: It aims to free the world from difficult-to-use confidential computing.
- Product Principle: Trusted and safe, easy to develop and deliver, open standards, and cloud native..
2. ACK-TEE 1.0
ACK-TEE 1.0 launched in January 2020.
- Target Users: Native SGX users
- New Kubernetes Hosting Cluster Form: Dedicated cluster for confidential computing, supporting Intel SGX1.
It reuses the existing capabilities of the managed Kubernetes cluster, including the integration of various cloud services and the Kubernetes cluster O&M capability. It reduces the Kubernetes cluster O&M complexity.
It supports Enclave Page Cache (EPC) encrypted memory management and scheduling and reduces the complexity of users using SGX devices.
3. ACK-TEE 2.0
ACK-TEE 2.0 was launched in the second half of 2020.
- Feature: It supports native applications running in the TEE.
- Target Users: Users that have not mastered confidential computing technologies but require data security.
- It converts a common image to a TEE image and then runs the image in TEE.
- It provides secure and trusted service components through a controller, such as KMS-Enclave-Plugin.
Q1: Does this product rely on Intel chips? Why do I need to find Intel to apply for a key?
A1: Intel chips can ensure that applications are executed in the hardware-based enclave (a type of TEE) to ensure the security of applications, but they cannot ensure that the creators are legitimate. When building the enclave, we use Intel’s secret key to sign it, ensuring legitimate users.
Q2: Is Inclavare Containers essentially an implementation of container runtimes? Can it completely replace the runtime scenarios of Docker containers?
A2: Inclavare Containers is a software stack that contains multiple tools, such as rune, shim-rune, and runelet. Rune is the container runtime and is developed based on the runc code. It can run either a common runc container or a container with the enclave. Functionally, it can replace Docker container runtime (runc), but the biggest significance is to run the enclave container to ensure code and data security.
Q3: How much impact does the application performance have? Has a similar test been done?
A3: Inclavare Containers are responsible for solving data security problems. The bottom layer is based on the Intel SGX technology. Currently, the Extended Capabilities Port (ECP) of Intel SGX1 only has 128MB of memory, which makes the performance much worse than cloud-native container applications.
Q4: We only need to use it for the core needs of the in-use encryption, right?
A4: Yes, protecting in-use code and data is the greatest value of confidential computing.
Q5: Does ACK have this usage method and sample now?
A5: There is a managed version of “confidential computing” in ACK, ACK-TEE 1.0 mentioned above. However, its customers are SGX-native customers that need to transform applications and construct images based on SGX. ACK-TEE 2.0 was launched in late 2020 with the support of the Inclavare Containers capability.