By Yi Li, nicknamed Weiyuan at Alibaba.
Since the start of the 21st century, we have witnessed the enterprise-grade distributed application architecture (DAA) evolved from a service-oriented architecture (SOA) to a microservices architecture, and then to a cloud-native application architecture.
To explain the thinking behind the evolution of enterprise IT architectures, let’s first talk about metaphysics.
First, the complexity, which is measured by entropy, of enterprise IT systems is consistent with the second law of thermodynamics. As time goes by and services keep changing, enterprise IT systems will become increasingly complex.
Second, according to the famous law of conservation of complexity in human-computer interaction, the complexity of application interaction is constant, though it can exist in different ways. This principle also applies to software architectures. New software architectures cannot reduce the overall complexity of an IT system.
These laws may seem harsh, but there are ways we can work within them.
A core task of modern software architecture is to define the boundaries between infrastructure and applications, to properly divide the complexity and reduce the complexity to be faced by application developers. In other words, in the evolution we’ve seen up to now, what we want to do is to hand over certain problems to more appropriate persons and systems, to allow developers to focus on core value innovation.
Let’s begin with the following figure to explore the logic behind the evolution of the enterprise-grade DAA.
Image source: Bilgin Lbryam’s Twitter feed
Pains of Transformation: SOA
In 2004, IBM built a global SOA design center. As an R&D team leader and architect, I participated in a series of pilot projects for global customers and assisted international enterprises such as Pep Boys and Office Depot in applying SOA to optimize intra-enterprise and inter-enterprise business processes and improve business agility.
At that time, as economic globalization further spread, the competition between enterprises became more intense, and the commercial revolution began to speed up. Since IT systems in large enterprises evolved for decades, their overall technical systems have become extremely complex with the coexistence of complex instruction set computing (CISC) and common business oriented language (COBOL) applications in host systems, report program generator (RPG) business systems in AS400 minicomputers, and applications written in C language, Java Enterprise Edition (JEE) language, or .NET framework in distributed systems such as X86 and Power. A large number of application systems are provided by third-party suppliers, but some systems had been left unmaintained. In addition, as businesses iterated, new business systems continued to emerge. However, due to the lack of proper methodological guidance and the lack of organic links between systems, these business systems became silos, which continuously increased the complexity of IT architectures and made it impossible to meet the demands for business development. These different business systems, although powerful in isolation, cannot work well with each other, leading to major problems down the road.
Therefore, the primary challenge faced by the enterprise IT architecture is to integrate a large number of siloed IT systems in the enterprises to support increasingly complex business processes, enable efficient decision-making, and easily adapt to rapid business changes. In this context, companies such as IBM proposed the SOA concept, which abstracts application systems into coarse-grained services to build a loosely coupled service architecture. In this architecture, services can be flexibly combined through business processes. This improves asset reuse in the enterprise IT architecture, improves system adaptability, flexibility, and scalability, and prevents information silos.
SOA proposed a series of principles for building a distributed system, which are still applicable today:
- Services are provided with well-defined, standardized APIs, which decouple the implementation of service consumers from the implementation of service providers by defining service descriptions. In addition, services should be developed according to the contract-first rule instead of the code-first rule. Inter-service communication is based on document-oriented messages instead of the RPC protocol of a specific language. In this way, services are decoupled from their implementation languages, and users can flexibly select synchronous communication or asynchronous communication. As a result, system availability and scalability are improved.
- Services should be loosely coupled and independent of each other in terms of time, space, technology, and teams.
- Services should be stateless, so that they can be called flexibly, regardless of the session state in different contexts.
- Services should be self-governed and self-contained, so that they can be independently deployed, versioned, self-managed, and recovered.
- Services can be discovered and combined. For example, service discovery can be implemented by using a service registry, so that service consumers can be dynamically bound to service providers. In addition, business services from different systems can be orchestrated and assembled in a business process.
In the initial construction of SOA systems, point-to-point communication connections are primarily used, and service call and integration logic is embedded in application implementations. This development method is simple and efficient when only a few services are integrated into an SOA system. However, as services grow in scale, the communication between services becomes more complex and the connection paths and complexity increase sharply, posing major challenges to service governance.
To address these challenges, an enterprise service bus (ESB) was introduced. The ESB provides inter-service connection, transformation, and mediation capabilities. It can connect the internal system and various services of an enterprise to the service bus to implement a loosely coupled architecture between information systems. This simplifies system integration, makes the IT system architecture more flexible, and reduces the costs of information sharing within the enterprise.
The goal of the SOA methodology is to organize, gather, and integrate different capabilities. However, this is often difficult to achieve in practice. A large number of ambitious SOA projects failed to achieve their expected results.
This is because no IT architecture can succeed without the integration of business objectives, technological foundations, and organization capabilities.
Previously, SOA focused on dealing with inventory and marketing issues related to the enterprise IT architecture. This largely narrowed the SOA methodology to enterprise application integration (EAI). When applying the SOA concept, interconnecting information systems is only the first step. In order to maintain the agility and flexibility of the enterprise IT architecture and continuously support business growth and changes, enterprises also need to make great efforts to improve their capabilities and continuously reconstruct and iterate the enterprise IT architecture.
Previously, in most enterprises, the IT department was still a cost center and a business support department. Most enterprises lacked a long-term strategic IT plan, and their IT teams lacked growth recognition. As a result, SOA degenerated into project operation support, without organizational guarantees or continuous investments. Even then-successful projects gradually stagnated in the increasingly complex architecture. According to some photos sent to me last year by my friend who lived in the United States, the business system we built for our customer 15 years ago was still supporting the business of their stores around the country. This proves the success of the technical project, but reflects the lack of technical strategies on the part of the enterprise.
Technically, the ESB architecture decouples business logic from service integration, which allows better centralized service governance. However, severe issues are also exposed.
- Instead of the governance and reconstruction of the enterprise IT architecture, the reusability of business systems is emphasized, or even overemphasized. As a result, a large amount of service integration implementation logic is sunk into the ESB, as shown in the rightmost part of the preceding figure. Such logic is very difficult to maintain, migrate, and extend, and therefore it becomes a heavy burden on the ESB. We must properly handle the complexity at appropriate positions, rather than simply shifting the complexity.
- The ESB is based on a centralized message processing system. However, as the Internet grows rapidly, the ESB can no longer cope with the large-scale growth of the enterprise IT architecture.
- The ESB system architecture with smart pipes and dumb endpoints cannot adapt to rapid changes and mass innovation. For example, telecom operators wanted to integrate complex functions such as video communication and teleconferencing into their telecom infrastructure, so that users could enjoy extensive communication services with only a dummy terminal. However, as smartphones gained in popularity, the innovation of distributed collaboration tools such as Facbook and Whatsapp, and WeChat and DingTalk in China, completely overturned the way people communicated with each other and consigned telecom networks to eventual obsolescence.
Beauty of Rebirth: Microservices
Along with the development of the Internet, and especially with the advent of mobile Internet, the economic pattern of the whole world changed dramatically. In particular, the focus of the enterprise IT architecture changed from the conventional systems of record, such as enterprise resource planning (ERP) and supply chain management (SCM), to systems of engagement, such as omnichannel marketing. These systems must cope with the rapid and large-scale growth of the Internet and support fast iteration and low-cost trial and error. Currently, the enterprise IT architecture has become an engine to drive innovations. The idea of using technology to expand business boundaries gave IT teams a new sense of mission and further accelerated the evolution of the enterprise IT architecture.
Internet companies led by Netflix and Alibaba oversaw a new revolution in enterprise architecture: microservices. Microservices frameworks such as Apache Dubbo and Spring Cloud were widely used.
The core idea of microservices is to simplify business system implementation by splitting and decoupling application functions. Microservices emphasize the division of application functions into a set of loosely coupled services, with each service compliant with the single responsibility principle. The microservices architecture solves several problems inherent in the conventional monolithic architecture. Each service can be deployed and delivered independently, which greatly improves business agility. In addition, each service can be independently scaled in or out to adapt to the scale of the Internet.
Certainly, the division of a large single application into multiple microservices makes the R&D collaboration, delivery, and maintenance of the IT system more complex. Fortunately, DevOps and containers are naturally integrated with the microservices architecture, forming the prototype of the cloud-native application architecture.
The microservices architecture draws on the principles of the SOA architecture. However, from the perspective of implementation, it tends to replace ESB by constructing a decentralized distributed architecture with smart endpoints and dumb pipes. Qiu Xiaoxia has analyzed these problems in detail in Those Things About Microservices, I will not go into detail here.
The microservices architecture must first face the internal complexity of distributed architectures. For more information, see Misunderstandings About Distributed Computing. Microservices frameworks need to overcome the complexity of service communication and governance, for example, the challenges of service discovery, fusion, throttling, and end-to-end tracking. Microservices frameworks, such as HSF, Dubbo, or Spring Cloud, package these capabilities as code libraries. These code libraries are built in applications and released and maintained with the applications.
In essence, service communication and governance are to horizontally connect systems of different departments, and therefore is orthogonal to the business logic. However, in the microservices architecture, the implementation and lifecycles are coupled with the business logic, and an upgrade of the microservices framework will cause the entire service application to be rebuilt and redeployed. In addition, code libraries are usually bound to specific languages, and therefore, it is difficult to support the polyglot implementation of enterprise applications.
The Light of Evolution: Cloud-native
With a centralized service bus architecture, SOA decouples the business logic from the service governance logic. The microservices architecture regresses to the decentralized point-to-point calling method, which improves agility and scalability at the expense of the flexibility brought about by the decoupling of business logic and service governance logic.
To solve these challenges, the community proposed the service mesh architecture. This architecture sinks service governance capabilities to the infrastructure and deploys them as independent processes for both service consumers and service providers. Therefore, decentralization not only ensures the scalability of the system, but also decouples service governance and business logic to allow independent evolution without mutual interference. This allows the overall architecture to evolve in a more flexible manner. In addition, the service mesh architecture reduces the intrusion into the business logic and simplifies the support for polyglot applications.
The Istio project led by Google, IBM, and Lyft is a typical implementation of the service mesh architecture and has become a new sensation in the industry.
The preceding picture shows the architecture of Istio, which is logically divided into a data plane and a control plane. The data plane consists of smart proxies deployed in sidecar mode. The sidecar intercepts the traffic of application networks, collects telemetry data, and implements service governance policies. On the control plane, the galley manages configurations, the pilot delivers configurations, the mixer checks policies and aggregates telemetry data, and the citadel manages security certificates during communication.
The Istio architecture provides a series of advanced service governance capabilities, such as service discovery and load balancing, progressive delivery (also called phased release), chaos injection and analysis, end-to-end tracking, and zero-trust network security. An upper-layer business system can orchestrate the Istio architecture into its own IT architecture and release system.
However, a service mesh is not a silver bullet. The flexibility of the architecture and the evolution of the system are ensured at the expense of the complexity of deployment in sidecar mode and the performance (two extra hops are added).
The community and cloud service providers are working together to address the deployment complexity. First, they seek to improve the automated maintenance of service meshes. For example, Alibaba Cloud greatly simplifies the upgrade and maintenance of the Istio architecture and simplifies cross-Kubernetes cluster deployment by using operators. In addition, they provide hosted service mesh services to help users focus on service governance at the business layer, instead of infrastructure implementation.
From the perspective of performance, service mesh must reduce the performance overhead of its own control plane and service plane. For example, you can offload the mixer to sink governance policies to the data plane. In addition, you need to reconsider the boundaries between applications and network infrastructure throughout the entire communication stack. To interconnect container applications, the Kubernetes community proposed the container network interface (CNI) to decouple the container network connectivity from the underlying network implementation. In addition, Kubernetes provides basic meta-languages such as services, ingress, and network policies to support service communication and access control at the application layer. However, these capabilities are far from enough to meet the requirements for application service governance. Service meshes provide new functions such as traffic management, end-to-end observability, and secure interconnection at L4 and L7. All these functions are implemented by the new Envoy proxy that runs in the userspace. This improves flexibility, but also inevitably increases performance overhead. To systematically resolve this problem, the community is conducting interesting explorations. For example, in the Cillium container network, capabilities of operating systems and underlying networks, such as extended Berkeley packet filter (eBPF) and express data path (XDP) can sink the service control capabilities of the application layer (such as services and network policies provided by Kube-Proxy) to the OS kernel and the network layer to solve this problem. In addition, the data links of the service mesh are optimized to reduce context switching and data copying, which effectively reduces performance overhead.
Currently, the service mesh technology is still in an early stage of development. It can provide flexible service communication at L4 and L7. The community is also exploring how to implement flexible networking at L2 and L3 by using the network service mesh. We believe that service mesh will become the communication infrastructure for distributed enterprise applications in the future.
In this process, new concepts and projects will be continuously created, and we need to rationally analyze their business value and technical limitations. We must avoid seeing service mesh as a magic bullet and sinking the business logic of application integration and application-side security to the service mesh to avoid high complexity. For more information, see Application Safety and Correctness Cannot Be Offloaded to Istio or Any Service Mesh.
It seems to be a universal law that unification inevitably follows prolonged division and division arises out of prolonged unification. Enterprise-grade DAAs have also gone through repeated unification and division. Today, with new technologies rapidly emerging, we not only need to embrace the architectural changes brought by new technologies, but also pay more attention to the evolution logic and core values behind them to systematically control complexity.
This article introduced the changes brought by the cloud-native computing architecture from the perspective of enterprise-grade DAA. In later articles, I will share my ideas about the research and development process and integrated architectures.