Cloud-Native: Best Practices for Container Technology Implementation
By Yili
Introduction: With the rapid development and extensive application of container technologies, cloud-native technologies are becoming the future of IT development. As the first Chinese company that deployed container technologies, Alibaba Cloud has made great achievements in both technologies and products. Yili, a senior technical expert at Alibaba Cloud, shares the best practices of container technology implementation through Alibaba Cloud Container Service. This article is meant to help you better understand the container technologies and cloud-native concepts, properly design cloud architecture, and make full use of the cloud.
A quote taken The Economist magazine said, “Without the container, there would be no globalization.”
What Is Container Service?
Economic globalization is based on the modern transportation system and at its core are containers. The emergence of the shipping container enabled the standardization and automation of logistics, greatly reducing transportation costs and making it possible to integrate global supply chains. Therefore, without containers, there would be no globalization.
The standardization and modularization concepts of containers are promoting the supply chain reform in the construction industry. After the coronavirus (COVID-19) pandemic, Huoshenshan Hospital, a specialized hospital that can accommodate thousands of beds was built within 10 days in Wuhan, China. It assumed an important role during the fight against the pandemic, especially in the early days. The whole hospital was assembled from prefabricated container houses. The modular rooms were pre-equipped with air conditioners, disinfection stations, water supplies, and drainage, greatly accelerating the speed of construction of the hospital.
General Definition of Containers
The software container technology is also reshaping the entire software supply chain. As a lightweight virtualization technology for operating systems, containers are different from traditional physical machines and virtualization technologies. Think about it like this:
A traditional physical machine is like a single-family detached home.
- A family lives in the house comfortably, without being disturbed by others.
- An application is exclusively installed on a physical machine, providing outstanding performance. However, this is costly, with a long delivery period and low resource utilization.
Virtual machines are like townhouses.
- Each house is independent and well isolated. All the houses share water, electricity, and a foundation. The cost decreases, but the floor area ratio and the delivery speed increases.
- With the virtualization technology, applications in virtual machines can be isolated and resource utilization can be effectively improved. Applications in a virtual machine need to be configured and installed after the virtual machine is delivered. Therefore, the delivery speed is not fast enough.
A container is like a container house.
- A container house is modularized with decoration. It can be built quickly and moved whenever needed. A stadium for the 2022 FIFA World Cup in Qatar will be designed this way. This stadium will be assembled from prefabricated container houses and will accommodate 40,000 people. Each container house will be made in China and come pre-equipped with bleachers, washrooms, and bars. Then, the container houses will be assembled in Qatar. Using this method, the construction period will be reduced by three years and the stadium can be disassembled and moved to other places after the event.
- Container resources are isolated using technologies such as cgroups and namespaces on operating systems. Containers share the operating system kernel, which is a lightweight, resource-free process that can be started in seconds. This significantly improves application deployment density and elasticity on operating systems. Container images pack an application and its dependent system components and configurations in a standardized and self-contained format. When applications are distributed and delivered through container images, the applications can be out-of-the-box (OOTB) and run consistently in different environments.
Values of Containers
In the past few years, container technologies have been widely used in the IT industry. Their most important values are:
Agility
Speed matters a lot. In the era of digital transformation, each enterprise is facing the impact of emerging business modes and numerous uncertainties. An enterprise’s continuous innovation ability, rather than its current large scale or past successful strategies, determines its success in the future. Container technologies improve the IT architecture agility of an enterprise, thereby enhancing its business agility and accelerating its business innovation. For example, during the COVID-19 pandemic, online businesses in the education, video, and public health industries experienced explosive growth. Container technologies help seize opportunities for rapid business growth. According to industrial statistics, container technologies increase delivery efficiency 3 to 10 times over, which allows enterprises to carry out fast iteration and low-cost trial and error.
Elasticity
In the Internet age, enterprise IT systems often encounter both predictable and unexpected traffic growth, such as e-commerce promotions and emergencies. Container technologies can give full play to the elasticity of cloud computing and reduce the computing cost by increasing deployment density and elasticity. For example, after the exponential growth of online traffic during the COVID-19 pandemic, container technologies can be used to alleviate the expansion pressure for online education, supporting online teaching for hundreds of thousands of teachers and online learning for millions of students.
Portability
Container technologies have promoted the standardization of cloud computing. Containers have become the standard for application distribution and delivery and can decouple applications from the underlying runtime environment. Kubernetes has become the standard for resource scheduling and orchestration. It shields differences of underlying architectures and allows applications to run smoothly on different infrastructures. The Cloud-Native Computing Foundation (CNCF) provides Certified Kubernetes Conformance Programs to ensure compatibility with different Kubernetes implementations. By using container technologies, it will be easier to build application infrastructures in the age of the cloud.
Kubernetes: Infrastructure in the Cloud-Native Era
Kubernetes has become a cloud application operating system. More and more applications are running on Kubernetes, such as stateless web applications, transactional applications (databases and message-oriented middleware), and data-based intelligent applications. The Alibaba economy also implements comprehensive cloud-native migration to the cloud base on container technologies.
Introduction to Alibaba Cloud Container Service
Alibaba Cloud Container Service products provide an enterprise container platform within Alibaba Cloud, edge computing, and Apsara Stack environments. The core of Alibaba Cloud Container Service products is Alibaba Cloud Container Service for Kubernetes (ACK) and Serverless Kubernetes (ASK.) They are built on a foundation of a series of Alibaba Cloud infrastructure capabilities, such as computing, storage, networking, and security. In addition, they provide standardized APIs, optimized capabilities, and enhanced user experience. ACK is certified by the Certified Kubernetes Conformance Program and provides a series of core capabilities required by enterprises, such as security governance, end-to-end observability, multi-cloud, and hybrid cloud.
Alibaba Cloud Container Registry (ACR) is the core of asset management for enterprise cloud-native applications. It can manage application assets, such as Docker images and Helm charts, and can be integrated with continuous integration and continuous delivery (CI/CD) tools for a complete DevSecOps process.
Alibaba Cloud Service Mesh (ASM) is a platform for fully managing the traffic of microservice-oriented applications. It is compatible with Istio, supports unified traffic management of multiple Kubernetes clusters, and provides consistent communication, security, and observability for application services in containers and virtual machines.
Managed Kubernetes Clusters
This section describes the topology of a managed Kubernetes cluster. The Kubernetes cluster managed by ACK is based on the Kubernetes architecture. Master nodes of the Kubernetes cluster run on the control plane (a Kubernetes cluster) of the Virtual Private Cloud (VPC) network.
ACK adopts the default high-availability architecture design, where three etcd replicas run in three different zones, respectively. Two etcds are also provided according to scalability best practices. One etcd stores configuration information and the other stores system events. This improves the availability and scalability of etcds. Master nodes of the Kubernetes cluster, such as API Server and Scheduler, are deployed with multiple replicas and run in two different zones. Master nodes can be elastically expanded based on the workload, and worker nodes access the API Server through the Server Load Balancer (SLB.) This design ensures that the Kubernetes cluster runs properly, even if a zone becomes faulty.
Worker nodes run on the VPC network. You can run the nodes in different zones and use the zone-based anti-affinity feature of the application to ensure the high availability of the application.
Best Practices of Container Technology Implementation
Flexible and Rich Elasticity Capabilities
Elasticity is a core capability of the cloud. Only the robust elastic computing power provided by the cloud can support the typical traffic pulse scenarios, such as the Double 11 Global Shopping Festival and the rapid growth of traffic for online education and collaborative office work after the COVID-19 pandemic. Kubernetes can maximize the elasticity of the cloud.
ACK provides various elasticity policies at the resource layer and application layer. The current mainstream solution at the resource layer is to scale nodes in or out by using cluster-autoscaler (CA.) When a pod fails to be scheduled due to insufficient resources, CA automatically creates nodes in the node pool based on the application workload.
Elastic Container Instance (ECI) provides a serverless container runtime environment based on lightweight virtual machines. You can schedule and run applications on instance groups in ACK. This is suitable for offline big data tasks, CI/CD jobs, and burst business scaling. On the Weibo app, 500 ECI pods can be scaled out in 30 seconds to easily respond to burst events.
At the application layer, Kubernetes provides Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA). Alibaba Cloud provides metrics-adapters to support more elasticity metrics. For example, you can adjust the number of pods for an application based on the queries per second (QPS) of the ingress. In addition, the resource profiles of many application workloads are periodic. For example, the business peak of the securities industry is the opening time of the stock market on weekdays. The resources required for the peak are 20 times those for the valley. To solve this problem, Alibaba Cloud Container Service provides a scheduled scaling component so developers can define a scheduled scaling policy to scale out resources in advance and reclaim resources regularly at the valley. This can balance the stability and resource costs of the system well.
Serverless Kubernetes
Kubernetes provides powerful functions and flexibility, but it is extremely challenging to operate and maintain a Kubernetes production cluster. Even if a managed Kubernetes service is used, you need to retain the worker node resource pool and perform routine maintenance on the worker nodes, such as upgrading the operating system and installing security patches. You also need to plan the capacity at the resource layer based on your resource usage.
To address the complex O&M of Kubernetes clusters, Alibaba Cloud launched ASK. Compatible with Kubernetes applications, ASK enables Kubernetes O&M to be done on cloud infrastructures. This allows developers to focus on the applications.
- You do not need to reserve, maintain, or manage any nodes.
- All resources are created on-demand and run on instance groups. You need to pay fees based on the resources consumed by applications.
- No capacity planning is required.
For serverless containers, we provide two technical solutions: ACK on ECI and ASK.
ACK on ECI
ACK clusters are functional and flexible, which meets the demands of large Internet enterprises and traditional enterprises. You can run different applications and jobs in an ACK cluster. ACK clusters are intended for Site Reliability Engineering (SRE) teams in enterprises, allowing them to perform customized development and flexible control for Kubernetes.
ACK clusters support the following container runtime technologies.
- RunC containers (also known as Docker containers) share the kernel with the host’s Linux system, which is simple and efficient but provides weak isolation. Once malware escapes by exploiting kernel vulnerabilities, other applications on the host may be affected.
- To improve the isolation performance, the Alibaba Cloud team worked with the Ant Financial team to introduce the kangaroo sandboxed container technology. Alibaba Cloud is the first public cloud container service provider in the industry that provides secure RunV containers. Different from RunC containers, each RunV container has an independent Kernel. Even if the kernel of one container is attacked, other RunV containers are not affected. This technology can be used to run untrusted third-party applications or for better isolation in multitenancy scenarios. In addition, both RunC and RunV containers support resource over-provisioning. You can flexibly control your resources to balance stability and cost.
- ACK supports the scheduling of instance groups. In essence, ECI implements a secure and isolated container runtime environment based on lightweight virtual machines and makes full use of the computing power of Alibaba Cloud’s elastic computing resource pool to meet users’ demands on elastic computing costs, scale, and efficiency. ECI is fully optimized for container scenarios. Through technologies such as operating system tailoring, Elastic Network Interface (ENI) pass-through, and direct storage mounting, applications on instance groups have execution efficiency equal to or higher than the container runtime environment in virtual machines. ECI does not support resource over-provisioning. However, spot instances are provided, allowing you to balance costs with computing efficiency.
ECI applies to Kubernetes clusters in the following scenarios:
- Burst traffic of online businesses: A static resource pool is available for daily traffic. You can use instance groups to handle burst traffic.
- Batch computing tasks: For some temporary or periodic computing tasks, it is not easy to predict the resource scale or resource reservation that may cause waste. In this case, you can use ECI to handle bulk data processing tasks.
- Isolation: Third-party untrusted applications need to run for some business applications. For example, the ECI security sandbox is used to isolate the uploaded artificial intelligence (AI) algorithm model so the model can run securely.
ASK
ASK is a customized container for independent software vendors (ISVs), departments of large enterprises, and small- and medium-sized enterprises. You can create and deploy Kubernetes applications without the Kubernetes management and O&M capabilities, which greatly simplifies the management and is suitable for scenarios such as application hosting, CI/CD, AI, and data computing. For example, you can use the ASK and Graphics Processing Unit (GPU) instance groups to build an O&M-free AI platform. You can also create a machine learning environment on-demand. In either case, the overall architecture is very simple and efficient.
Cloud-Native, Elastic, and Highly Available Architecture
The cloud-native distributed application architecture has the following features: high availability, auto scaling, fault tolerance, easy management, high observability, standardization, and portability. We can build a cloud-native application reference architecture on Alibaba Cloud that includes:
- Cloud-native infrastructures: Elastic Compute Service (ECS) enterprise instances based on X-dragon Hypervisor
- Cloud-native application platform: ACK
- Cloud-native database: Apsara PolarDB
End-to-End Elastic Application Architecture: You can containerize frontend applications and business logic, deploy them in a Kubernetes cluster, and configure HPA based on the application load.
At the backend data layer, you can use cloud-native databases such as Apsara PolarDB. Apsara PolarDB uses storage-computing separation architecture and supports scale-out. With the same specification, the performance of Apsara PolarDB is seven times that of the MySQL database, while the cost is half of the MySQL database.
Systematic High-Availability Design:
- You can deploy replica instances of applications in different zones by using the zone-based anti-affinity feature.
- You can access application portals in different zones through SLB.
- Apsara PolarDB provides the cross-zone high availability feature by default.
This ensures the zone-based availability of the entire system and can tolerate one failed zone.
Application High Availability Service (AHAS) provides the architecture awareness capability and can visualize the system topology. Moreover, AHAS provides the application inspection capability to detect availability issues, for example, whether the number of application replicas meets the availability requirements, and whether multi-zone disaster recovery is enabled for ApsaraDB for RDS (RDS) instances.
Observability in Multiple Dimensions
In a large-scale distributed system, various stability or performance problems may occur in infrastructures (networks, computing nodes, and operating systems) or applications. Observability helps you understand the status of the distributed system and make decisions accordingly. It also serves as the basis for auto scaling and automated O&M.
In general, observability consists of several important aspects:
Logging (Event Streams)
We provide a complete log solution based on Log Service (SLS) to collect and process application logs and provide capabilities such as ActionTrail and Kubernetes event centers.
Monitoring Metrics
Observability provides comprehensive monitoring of infrastructure services, such as ECS, storage, networking, and CloudMonitor. For business application performance metrics, such as the heap memory usage of Java applications, Application Real-Time Monitoring Service (ARMS) provides comprehensive performance monitoring for Java and PHP applications without modifying business code. For Kubernetes applications and components, ARMS provides managed Prometheus services, various OOTB preset monitoring dashboards, and APIs to facilitate third-party integration.
End-to-End Tracing
Tracing Analysis provides developers with comprehensive tools for distributed application trace statistics and topology analysis. It can help developers quickly locate and troubleshoot performance bottlenecks in distributed applications and improve the performance and stability of microservice-oriented applications.
From DevOps to DevSecOps
Security is an enterprises’ biggest concern about container technologies. To systematically improve the security of container platforms, we need to perform comprehensive security protection. First, we need to upgrade DevOps to DevSecOps, emphasizing the need to integrate security concepts into the entire software lifecycle and perform security protection in the development and delivery phases.
ACR Enterprise Edition provides a complete security software delivery chain. After you upload images, ACR can automatically scan the images to detect common vulnerabilities and exposures (CVEs.) You can then use the Key Management Service (KMS) to automatically add digital signatures to the images. You can configure automated security policies in ACK. For example, only the images that have been scanned and meet the launch requirements in the production environment can be released. This way, the entire software delivery process is observable, traceable, and policy-driven. This ensures security and improves delivery efficiency.
During runtime, applications face many risks, such as CVEs and virus attacks. Alibaba Cloud Security Center provides security monitoring and protection for applications during runtime.
Alibaba Cloud Security Center can monitor container application processes and networks, and detect application exceptions and vulnerabilities in real-time. When Alibaba Cloud Security Center detects a problem, it notifies you by email or SMS and automatically isolates and rectifies the problem. For example, a mining worm virus can exploit your configuration errors to launch attacks on container clusters. In this case, Alibaba Cloud Security Center can help you easily locate and clear the virus.
ASM
In February 2020, we released the first fully managed and Istio-compatible ASM in the industry. The control plane components of ASM are managed by Alibaba Cloud and independent of user clusters on the data plane. The hosting mode greatly simplifies the deployment and management of the Istio service mesh and decouples the lifecycle of the service mesh from the Kubernetes clusters. This makes the architecture simpler and more flexible and the system more stable and scalable. ASM integrates the Alibaba Cloud observability service and SLS based on Istio, which helps you manage applications in the service mesh more efficiently.
On the data plane, ASM supports various computing environments, including ACK Kubernetes clusters, ASK clusters, and ECS virtual machines. Cloud Enterprise Network (CEN) and ASM can implement service mesh between Kubernetes clusters across multiple regions and VPC networks. This enables ASM to implement traffic management and phased release for large-scale distributed applications in multiple regions. ASM will soon support multi-cloud and hybrid clouds.
Hybrid Cloud: A New Norm for Enterprises’ Cloud Migration
Cloud migration has become inevitable. However, due to business data sovereignty and the security privacy of some businesses, enterprises can use the hybrid cloud architecture but cannot directly migrate their businesses to the cloud. Gartner predicts that 81% of enterprises will adopt multi-cloud or hybrid clouds. Hybrid cloud architecture has become a new norm for an enterprises’ cloud migration.
Traditional hybrid cloud architecture is designed to abstract and manage cloud resources. However, differences in infrastructures and security architecture capabilities between different cloud environments can separate an enterprise’s IT architecture from its O&M system. This makes hybrid cloud implementation more complex and increases O&M costs.
In the cloud-native era, technologies, such as Kubernetes, shields infrastructure differences for better centralized resource scheduling and application lifecycle management in hybrid cloud environments. Application-centric hybrid cloud architecture 2.0 is now available.
The following lists several typical scenarios:
- Use of the elastic computing power of public clouds to cope with burst traffic: The on-premises data center hosts daily traffic. Upon traffic spikes, the on-premises data center scales out cloud resources to host burst traffic.
- Use of public clouds to build a low-cost cloud disaster recovery center: The off-premises and on-premises systems are available, and the off-premises system is used for hot standby. When a fault occurs in the on-premises data center, business traffic can be quickly migrated to the cloud.
- Construction of active geo-redundancy application architecture: Unitized business systems are deployed in multiple regions in the cloud and the unified service governance capability is provided. When a fault occurs in one region, business traffic is migrated to another region, which ensures business continuity.
Based on ACK, the hybrid cloud network, storage gateway, and database replication capabilities of Alibaba Cloud, we can help enterprises build a new hybrid cloud IT architecture.
Hybrid Cloud Architecture 2.0
ACK provides a centralized cluster management capability. In addition to Alibaba Cloud Kubernetes clusters, ACK can also manage your Kubernetes clusters in the on-premises Internet data center (IDC) and on other clouds. The centralized control plane enables unified security governance, observability, application management, backup, and recovery for multiple clusters. For example, SLS and managed Prometheus services can provide you with a unified observability dashboard for off-premises and on-premises clusters without code invasion. Security Center enables AHAS to help you detect and rectify security and stability risks in the hybrid cloud architecture.
ASM provides a unified service governance capability, which enables access to the nearest service, failover, and phased release with the multi-region and hybrid cloud network capabilities provided by CEN and Smart Access Gateway (SAG.) This compound solution can be used in scenarios, such as cloud disaster recovery and active geo-redundancy, to improve business continuity.
Cloud-Native Hybrid Cloud Solution
UniCareer is an e-learning career development platform that serves users in many regions around the world. Its applications are deployed in multiple Kubernetes clusters in four regions of Alibaba Cloud. In these clusters, CEN is used to connect multiple cross-region VPC networks. An ASM instance is used to manage the traffic of microservice-oriented applications in multiple Kubernetes clusters.
Service routing policies are centrally managed by the ASM control plane and delivered to multiple Kubernetes clusters. User requests are distributed to the ingress gateway in the nearest region through Domain Name System (DNS.) Then, the service endpoints are accessed in this region first through ASM. If services in this region are unavailable, the requests are automatically routed to other regions for traffic switching.
Cloud-Native Hybrid Cloud Management
The hybrid cloud solution of Alibaba Cloud has the following features:
- ACK provides centralized cluster management, security governance, application management, and observability.
- CEN connects all regions through high-speed and low-latency networks.
- ASM supports centralized intelligent application traffic management, which can optimize service access and improve business continuity.
Hitless Migration of Windows Containers to the Cloud
Now, let’s talk about support for Windows containers. As of 2020, the Windows operating system still dominates the market, with a market share of 60%. Enterprises use a large number of Windows apps, such as Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), and ASP.Net. Windows containers and Kubernetes enable you to implement containerized delivery without rewriting the code of .Net applications. This maximizes the elasticity and agility of the cloud to achieve fast iteration and scaling of business applications.
ACK supports Windows 2019 in Kubernetes container clusters:
1) Provide a consistent user experience and unified capabilities for Linux and Windows applications.
- Supports the scheduling and orchestration of resources such as CPUs, memory, and volumes.
- Supports stateless and stateful application workloads.
2) Support hybrid deployment and interconnection of Linux and Windows applications in a cluster. For example, PHP applications running on Linux nodes can access the SQL Server database running on Windows nodes.
The Future of Alibaba Cloud Container Service
The following briefly introduces the cloud-native marketing strategy of Alibaba Cloud.
New cornerstone: Container technology allows users to use cloud resources. Cloud-native technology helps quickly deliver the value of the cloud.
- Support for global application delivery: By using ACR EE, applications can be submitted once and released globally, increasing the release efficiency by seven times.
- Implementation of the serverless application architecture: ASK and Knative free developers from infrastructure management so that they can focus on business applications.
- Support for hybrid-cloud and multi-cloud architectures: It helps enterprises hitlessly migrate to the cloud and migrates workloads in different environments.
- Distributed cloud architecture with cloud-edge-terminal integration: The cloud capabilities are extended to edges and devices to embrace innovation opportunities in the 5G and Artificial Intelligence & Internet of Things (AIoT) era. After the edge container technology was adopted by Youku, the API end-to-end network latency was reduced by 75%.
New computing power: The innovation of the cloud-native-based software and hardware integration technology improves computing efficiency and accelerates intelligent business upgrades.
- Integrated with X-dragon Hypervisor, containers deliver a performance that is 20% higher than physical machines.
- Scheduling and sharing of heterogeneous computing power, such as GPUs and Network Processing Units (NPUs) (Hanguang chips), can improve the utilization 2 to 4 times over.
- Upon strong security isolation, sandboxed containers implement 90% of the performance of native processes. Intel SGX-based confidential computing is also supported, which can provide a secure and trusted execution environment for private and confidential information processing.
New ecosystem: We will provide the technology ecosystem and the Global Partner Program to enable more enterprises to enjoy the benefits of Alibaba’s technologies in the age of the cloud.
- Cloud application market of containers: It empowers enterprises with cloud-native innovation. Now, we are partnering with Fortinet, Zhuyun, and Intel to provide a variety of products, including container security, monitoring, and business applications. This way, you can obtain complete containerized solutions conveniently.
- An ecosystem of global partners: We have integrated our products and capabilities with our global technology partners, such as SAP, Red Hat, Rancher, Click2Cloud, and Banzai Cloud. This helps enterprises leverage the cloud-native technology on Alibaba Cloud.
Get to know our core technologies and latest product updates from Alibaba’s top senior experts on our Tech Show series