What is cloud-native? Everyone has his or her own interpretation of this term. Drawing on extensive discussions and practical experience from various projects, this article presents the interpretation of cloud-native technologies Alibaba’s delivery experts. It discusses how to build cloud-native applications, key cloud-native technologies, and ideas about cloud-native implementation.
The Internet has changed the way people live, work, study, and entertain themselves. The rapid development of technologies has driven the evolution of the cloud computing market from the early physical machines to virtual machines (Bare Metal Instance) and then to containers, while the Internet architecture evolved from centralized architectures to distributed architectures, and then to cloud-native architectures. Nowadays, the term “cloud-native” has been elevated by enterprises and developers to the status of an industry standard and the future of cloud computing. If I were to describe cloud-native technologies in one sentence, it would be “the future is here, but not evenly distributed.”
Cloud-native technologies (architectures) have seen a sharp increase in popularity, but the concept is still interpreted differently by different people, despite the wide-ranging articles and discussions on this topic in the online community and inside Alibaba. In my opinion, we are exploring what it means to be cloud native and trying to understand and put cloud-native technologies into practice. Therefore, there is still no clear or overarching standard definition.
A cloud migration project I took part in recently involved many cloud-native technologies. I would like to take this opportunity to share my insights while drawing on the discussions and practical experience from the project.
Going Back to the Source
Before getting into this topic, let’s see how industry influencers define “cloud native,” namely Pivotal Software and CNCF.
Pivotal Software is a leader in the field of agile development (previously contracted with Google) and has an impressive pedigree (it was founded by EMC and VMware). It launched Pivotal Cloud Foundry (a big hit in the field of PaaS between 2011 and 2013) and the Spring Framework and is a pioneer in cloud-native technologies. The following figure shows how Pivotal defines “cloud native”:
Matt Stine at Pivotal Software first proposed the concept of cloud-native in 2013. In 2015, in his book “Migrating to Cloud-Native Application Architectures”, Matt Stine defined the key characteristics of cloud-native application architectures, including twelve-factor application, microservice architecture, self-service agile infrastructure, API-based collaboration, and antifragility. Matt Stine revised his definition in 2017 and indicated six characteristics of cloud-native architecture: modularity, observability, deployability, testability, handleability, and replaceability. The latest piece published on Pivotal Software’s official website characterizes cloud-native applications and services as an integration of the concepts of DevOps, continuous delivery, microservices, and containers.
Cloud Native Computing Foundation (CNCF) is a well-known organization in the industry. It is a foundation co-sponsored by leading open-source infrastructure companies such as Google and RedHat. The mission of CNCF was to compete in the container market dominated by the then-prominent platform Docker. Through the Kubernetes project, CNCF has maintained undisputed leadership in the field of orchestration in the open-source community and is the champion in defining and promoting cloud-native architectures. Here is how CNCF defines “cloud-native”:
In 2015, CNCF originally defined three characteristics of cloud-native architectures: containerized encapsulation, automated management, and microservices. In 2018, CNCF updated its definition of cloud-native architectures to include two new features, declarative API and service mesh (a new technology that emerged in the open-source community in 2017; it is a parallel technology to microservices). These technologies are used to build loosely coupled systems that are highly fault-tolerant and easy to manage and observe.
Reaching a Consensus
As the community continues to grow the ecosystem and push the boundary of cloud-native architectures, the definition of cloud native is constantly changing. Companies (like Pivotal and CNCF) define this concept differently, and one company may use different definitions at different times. Following Moore’s Law, we can expect the definition of cloud native to continue to shift in the future.
As for the two different definitions given by Pivotal and CNCF, I believe the distinction is caused by the respective organizational structures and perspectives adopted by the two industry influencers:
- Pivotal Software is committed to achieving end-to-end solutions and digital transformation on the Platform as a Service (PaaS) layer and offers a comprehensive set of models for culture, processes, methodologies, blueprint planning, and software development. Its solutions are designed for CIOs from large and medium-sized traditional enterprises who take a top-down approach.
- Having established itself as the innovator and reformer for the cloud-native ecosystem and technologies, CNCF emphasizes technologies, toolchains, and underlying infrastructure and has a great influence on its target audience made up of developers in the open-source community, Internet companies, and emerging businesses. It adopts a bottom-up approach.
Pivotal Software is a pioneer in the concepts and methodologies of cloud-native architectures, while CNCF contributes to best practices.
However, it seems Pivotal Software advocates the concept of container technology, while CNCF implements its technology through the microservices content. So are they really all that different? We welcome you to tell us your opinion in the comment section below.
My Personal View of Cloud Native
From the Cloud-native Thinking to the Cloud-native Applications
From the birth of the Internet to the present, we have adopted Internet thinking and then Internet+ thinking (which is essentially Internet native). When enterprises reach a certain stage, they need to develop value thinking (or, value-native thinking). Therefore, it is necessary for cloud computing practitioners to develop cloud-native thinking. Abstract paradigms always preceded tangible solutions in any technological reform or widespread adoption of new methods.
Drawing on the definitions given by Pivotal Software and CNCF, I came to the following understanding of what it means to be cloud-native:
Being cloud-native means building an application system that runs on the cloud through both a methodology (such as that from Pivotal Software) and a technical framework (such as that from CNCF). Such an application system breaks away from traditional system building methods and makes full use of the native capabilities of the cloud to maximize its value. It adopts the characteristics of cloud-native architectures in order to rapidly empower businesses.
This abstract interpretation can be broken down into four questions:
- What are the capabilities of the cloud that we need to make full use of?
- How can we build cloud-native applications in a way that breaks away from the traditional methods?
- What are the key characteristics of cloud-native applications?
- What are the key technologies adopted in a cloud-native technology framework?
Capabilities of the Cloud
The emergence of cloud computing is closely related to the development and maturity of virtualization technology. It is an emerging IT infrastructure delivery method. that relies on virtualization technology to standardize, abstract, and scale IT hardware resources and software components into product-like services that allow users to “pay as they go”. In a sense, this reconstructs the IT industry’s supply chain. Its models of service delivery include Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Function as a Service (FaaS), and Data as a Service (DaaS):
IaaS indicates the fundamental and underlying capabilities of cloud computing, such as computing, storage, network, and security.
PaaS generally refers to the high-level domain- or scenario-oriented services that are built on top of the underlying cloud capabilities, such as cloud databases, cloud object storage, middleware (including caches, message queues, load balancing, service mesh, and container platforms), and application services.
This is a serverless computing architecture, through which users can run applications without purchasing or concerning themselves with infrastructure and elastically scale services using the pay-as-you-go billing method. This is also an extreme form of evolution from PaaS. Currently, three types of solutions are available under this architecture:
- Function-oriented solutions: Developers only provide functions and the corresponding features are realized through events or HTTP requests. Examples of such solutions include Alibaba Cloud Function Compute and AWS Lambda.
- Application-oriented solutions: Developers only provide business applications without purchasing server resources. Examples of such solutions include Google Cloud Run and Alibaba Cloud EDAS Serverless.
- Container-oriented solutions: These are an upgraded version of application-oriented solutions that uses container images to shield environment differences and provides great flexibility. Examples of such solutions include Alibaba Cloud Serverless Kubernetes and AWS Fargate.
Using data as a service, the architecture extends to upper-layer applications and, when used with AI and cloud services, can deliver various high-value services. These services include big data-based decision making, video and facial recognition, deep learning, and scenario-based semantic understanding, among others. This is also the core strength of the cloud of the future.
As technologies and open-source solutions continue to develop and cloud service providers provide more products and capabilities, every layer of today’s technology architecture, from physical machines, virtual machines, and containers to middleware and then to the serverless architecture, has been gradually standardized. The more standardized the layers are, the greater the added value they can contribute. Relatively common technologies that are not directly related to business (such as service mesh) have also been standardized and incorporated into the underlying infrastructure. Every time a layer of the technology architecture becomes standardized, it will eliminate some of the inefficient and tedious tasks. In addition, the application layer provides emerging technologies, such as AI, to help enterprises reduce the costs incurred during the exploration of suitable solutions, speed up the verification and delivery of new technologies, and truly empower the business.
Meanwhile, users can choose the cloud products that best fit their needs just like building with LEGO blocks, using readily-available resources to avoid repetitive work. This greatly improves the efficiency in each stage of software and service development and accelerates the implementation of various applications and architectures. Users who are already on the cloud can realize huge cost-savings by consuming resources as needed and scaling out at any time.
Construction of Cloud-native Applications
The preceding section discusses the strong capabilities of the cloud. In comparison with traditional applications, new cloud applications, need to be adapted to these capabilities in each stage of the entire application lifecycle. This involved adaption during the design of software architecture, development, construction, deployment, delivery, monitoring, and O&M. I will discuss this process in terms of various issues users must face.
How to Design Cloud-native Architecture
Great architectures come into being after evolving and progressing over time. They are not created all at once. Therefore, it is meaningless to talk about architectural design. The purpose of architectural evolution must be to solve a certain problem. We can address the problems listed below to better understand the design the cloud-native architecture:
- Use the microservice architecture to solve the problem of complexity in a monolithic architecture.
- Use a governance framework and monitoring solutions to solve communication problems between microservices.
- User container services to solve the problem of deploying many applications in the microservice framework.
- Use Kubernetes to solve the problem of orchestration and scheduling for container services.
- Use service mesh to solve the problem of intrusion for the microservice framework.
- Run service mesh on Kubernetes to provide better underlying support.
Single-microservice applications are adapted to the cloud-native architecture due to their low complexity and comprehensive set of functions for monitoring, governance, deployment, and scheduling supported by the strong underlying system. However, from the perspective of the overall system, the complexity does not decrease. Instead, enterprises must bear the high costs of building a robust underlying system with strong architectural and O&M capabilities.
In addition, the technology stacks and middleware systems used by enterprises to achieve these functions are closed and highly private, making it difficult to meet all business needs (as is the case with Alibaba). Cloud hosting can reduce the overall complexity of such a project. The cloud service provider can take over the complex underlying system and provide attentive services. Projects will eventually evolve into an infrastructure-free design and use YAML or JSON declarative code to orchestrate the underlying infrastructure, middleware, and other resources. In this way, the cloud can meet every need of an application. Eventually, enterprises will embrace an open and standard cloud technology system.
How to Deliver Cloud-native Applications
We introduced DevOps to address the problem of the continuous delivery of applications.
DevOps is a concept everyone is familiar with. I see it as a series of values, principles, methods, practices, and tools designed to achieve fast delivery and continuous optimization. Its core advantage is to close the gap between R&D and O&M, expedite the software delivery process, and improve software quality. The chart below shows a DevOps pipeline:
The platforms involved in this process include: GitHub, Travis, Artifactory, Spinnaker, FIAAS, Kubernetes, Prometheus, Datadog, Sumology, and ELK.
The key to truly implementing and practicing DevOps lies in the answers to the following questions:
- Methods: Can developers push the code they write to the test and production environments without O&M support?
- Tools: Are there mature O&M tool platforms and monitoring systems that allow the development team to easily handle various online issues, faults, and rollback?
- Culture: Do developers take direct ownership of the online user experience, taking responsibility for problems caused by code defects, O&M failures, or code changes committed by developers?
- Delivery measurement: Are the KPIs, including deployment frequency, change lead time, service recovery time, and change failure rate, in line with the user requirements in the industry?
In essence, DevOps supports O&M services. By introducing a series of automation tools for new technologies and development into O&M, it brings development closer to the production environment and manages the entire development and O&M processes, ensuring freedom and innovation. When monitoring and fault prevention and control tools are used together with function switches, they can help reach achieve a balance between the user experience and fast delivery.
If technology professionals only need to consider business solutions and business code in the future, it would be necessary to quickly integrate the abundant technical products and cloud vendor platforms already available on the market. This would allow technical professionals to focus on finding solutions and connecting business and technology in a bid to satisfy increasingly diversified and complicated business needs. In terms of O&M, the cloud hides the complexity of the infrastructure and shifts to the O&M mid-end and large-scale O&M for toolchain development. This allows practitioners to focus on cost, efficiency, and stability while ensuring the steady progress of application development.
Key Characteristics of Cloud-native Applications
- Elastic scalability: Using elastic billing policies, applications can complete auto scaling within seconds and dynamically allocate or release resources in accordance with business workloads. This helps users significantly reduce expenses. The key technologies are the lightweight containerization of services and the immutable infrastructure achieved through container services.
- Fault tolerance: The applications support load balancing, automatic traffic shaping, degradation and circuit breaking, automatic scheduling of abnormal traffic, fault isolation, and automatic failover.
- Observability: The applications provide a wide range of fine-grained monitoring metrics, such as real-time metrics, tracing analysis, and logs, and support monitoring for automatic alert triggers and persistent queries precise to the second.
- Release stability: To cope with the stability risks caused by frequent changes, the applications have a fully automated change release system that supports automatic gray and blue-green release policies and can be used to establish a monitoring baseline before, during, and after changes. It is also capable of circuit breaking and automatic rollback in the case of abnormal changes.
- Ease of management: To transition from manual maintenance to automatic maintenance, the applications support automatic exception analysis and diagnosis without the need to log on to servers.
- Ultimate user experience: The applications provide an all-in-one experience by offering smooth and easy-to-use features, such as application allocation and creation, resource application, environment configuration, development and testing, release, monitoring and alarming, and troubleshooting. These features can be combined like building blocks, avoiding complex operations.
- Flexible billing: The applications support various pricing strategies such as pay-as-you-go (by traffic, storage, calls, and duration), subscription (by days, months, or years), reservation, and preemptive billing methods. The business system can dynamically switch to the optimal billing method based on actual conditions.
Key Technologies of Cloud-native Architectures
The earliest container, known as Chroot Jail, was developed in 1979. It was re-defined in 2008 as LXC (Linux Container) and combined the resource management of cgroups with the view isolation of namespace to achieve process-level isolation. However, the greatest innovation in container technology is the container image (or Docker container). This container contained the complete environment (the file system of the entire operating system) required to run an application. Additionally, it was consistent, lightweight, portable, and language independent. It allowed users to achieve “build once, run anywhere” (that is, in development, testing, and production environments) and completely standardize building, distribution, and delivery activities. It also supplies the foundation of immutable infrastructure.
Kubernetes is a Linux system for cloud computing and cloud-native architectures.
As Google’s open-source container orchestration and scheduling system built on Borg, Kubernetes makes it possible to use container applications in large-scale industrial production environments.
Relying on declarative APIs, scalable programming interfaces (by using CRD and controllers), and an advanced design philosophy, Kubernetes dominated the field of container orchestration (beating out Docker Swarm and Apache Mesos) and has become the de facto standard for container orchestration systems.
The Kubernetes platform frees users from resource management, further standardizes the infrastructure, reduces complexity, and improves resource utilization. In addition, Kubernetes reduces the cost of cross-data center deployment of hybrid clouds, multiple clouds, and edge clouds.
Service mesh aims to decouple the business logic from the non-business logic, allowing developers to focus solely on the business logic. The solution separates a number of client SDKs unrelated to the business logic (such as service discovery, routing, load balancing, and traffic shaping and degradation) from business applications and puts them into a separate proxy (Sidecar) process that is pushed down to the infrastructure middleware mesh (similar to the shift from TDDL to DRDS). With this solution, an application will face fewer risks from changes in the system framework, become more streamlined and lightweight, and enjoy a faster startup speed. This makes it easier to migrate the application to the serverless architecture. The meshes can implement automatic iteration and upgrade based on their own needs. This facilitates global service governance, phased release, and monitoring. In addition, the mesh boundary can be extended to the database mesh, cache mesh, and message mesh. In this way, service communication can be truly standardized by adopting the TCP/IP protocol for inter-service communication.
Infrastructure as Code (IaC)
The infrastructure and its complete life cycle (creation, destruction, scaling, and replacement) are described in code and orchestrated, executed, and managed with appropriate tools, such as terraform, ROS, and CloudFormation. For example, users only need to define the code and then easily create all the basic resources needed by applications (such as Elastic Compute Service (ECS), Virtual Private Cloud (VPC), ApsaraDB for RDS, Server Load Balancer (SLB), and ApsaraDB for Redis), without the need to frequently switch between pages in the console to apply for and purchase resources. With this approach, the infrastructure code is version-controlled, reviewable, testable, and traceable and can be rolled back, maintain consistency, and prevent configuration drift. It is also easy to share, create templates for, and scale the infrastructure code. In addition to improvement in the overall O&M efficiency and quality, IaC allows users to easily see the full picture of the infrastructure.
The entire lifecycle of cloud-based IDE research provides a complete experience that integrates development, debugging, pre-release, production environment, and CI/CD release. The cloud platform also offers a variety of code library templates to improve the compilation speed through distributed computing and intelligently realize code recommendation and optimization, automatic bug scanning, and identification of logical and systematic risks. It is conceivable that the development models of the cloud era, completely different from those of the local development environment, will feature higher development efficiency, faster iteration speed, and better quality control.
Implementation of Cloud-native Architectures
As a member of the GTS delivery team that was tasked with empowering enterprises to succeed in their digital transformations, I have been thinking about the ways to help traditional enterprises transform themselves and embrace cloud-native architectures by drawing on the experience of the Internet industry. Here is a roadmap for the implementation of cloud-native architectures.
The Y-axis in this figure indicates business agility. To achieve cloud-native business agility, you need to:
- Step 1: Lay the foundation by migrating to the cloud.
- Step 2: Build a PaaS platform. Alibaba Cloud Container Service for Kubernetes (ACK) shields O&M staff from the underlying resources and the complexity of O&M and provides high-performance and scalable container application management capabilities. It also provides developers with an environment in which they can build applications so as to accelerate application development, realize PaaS, and achieve business agility, elasticity, fault-tolerance, and observability.
- Step 3: Implement DevOps based on PaaS. The PaaS platform boosts business agility by improving infrastructure agility, while DevOps does the same through process delivery. DevOps (Apsara DevOps in Alibaba Cloud) enables continuous integration and delivery of applications, accelerates the creation of value streams, and achieves fast business iteration.
- Step 4: Establish microservice governance. The microservice-based transformation divides complex services into small independent units that are loosely coupled with each other and support independent deployment and updates. This truly improves agility at the business layer. Users can implement microservices by using Alibaba Cloud EDAS, which supports services such as SpringCloud and Dubbo. However, as technologies continue to develop, the optimal solution for microservice governance is now service mesh (for the compatibility between ASM and Istio).
- Step 5: Implement advanced management of microservices. The microservice architecture implements API management, distributed integration of microservices, and the automation of microservice processes. API management empowers enterprises to establish a multi-channel ecosystem (with self-owned channels, WeChat, and Tmall) and ultimately build an API economy. The distributed integration and process automation of microservices allow enterprises to set up a unified business mid-end.
The X-axis in the figure indicates business robustness. To achieve business robustness, you need to:
- Step 1: Build a single data center. Most enterprise customers in industries such as finance, telecommunications, and energy, run their business systems on private clouds within their data centers. Enterprises usually choose to build a single data center first in the early phase of their development.
- Step 2: Set up multiple data centers. As the business grows and the importance of the data center increases, enterprises will build a disaster recovery centers or an active-active data center architecture to ensure that the services are still available when one of the data centers fails completely.
- Step 3: Construct a hybrid cloud. As public clouds become increasingly popular, many enterprise customers are migrating their front-end business systems to public clouds or using cloud services from multiple cloud service providers. In this way, the underlying IT infrastructure eventually becomes a hybrid cloud or multi-cloud implementation.
Cloud-native architectures seem to be extremely appealing, but once you go deep into the stage of implementation, you will find that they are very complicated. The complexity is not only reflected in the wide range of new concepts and technical features, but also in the huge gap between customers’ expectations and the value created by cloud-native technologies and the uncertainty about the future. In the future, I will continue to share and discuss my thoughts, experiences, and practices. This is the first of a series of articles. I hope my writings can contribute to the digital transformation of enterprises.
In the cloud era, we require novel thinking and concepts to properly understand application architectures and IT infrastructure in order to correctly answer the question “what does it mean to be cloud-native.” The future is undoubtedly cloud-native. Therefore, in addition to tools, enterprises seeking to transform themselves need a complete philosophy that progresses from concepts to methodologies and then to tools. Only in this way can we better embrace the arrival of the cloud era and maximize the value of cloud-native architectures.
This is the best of times for developers. This is the best of times for cloud vendors. In addition, this is the best of times for professionals specializing in cloud service delivery.
The future is here, but not evenly distributed. Let work together to understand, embrace, and deliver cloud native.
Disclaimer: The views expressed herein are for reference only and don’t necessarily represent the official views of Alibaba Cloud.