Nacos-based Environment Isolation at Alibaba

The release of Nacos 0.9 brings Nacos GA closer to reality. Actually many enterprises have already applied Nacos in production, for example, Huya.

Generally, enterprises follow this development process: feature development and test in the test environment, then phased release, and finally the release in the production environment. To ensure the stability of the production environment, it is necessary to implement the isolation between the test environment and the production environment. An inevitable issue is related to multiple environments:

  • How to isolate data among multiple environments?
  • How to implement effective and efficient isolation? (no modifications from users required)

This article shows how Alibaba solves this issue by implementing Nacos-based environment isolation.

What Is an Environment?

Before explaining environment isolation, let’s first clearly understand what an environment is.

Currently the term “environment” does not have a universal definition. Some companies directly use the word “environment”, and an environment in the Kubernetes architecture is called a “namespace”, while at Alibaba Cloud, an environment is called a “region”. This article defines an environment as a whole set of logically or physically independent systems that include all the components for processing user requests and specified types of requests, such as gateways, service frameworks, microservices registry centers, configuration centers, message systems, cache, and databases.

For example, many websites involve user IDs. We can use one set of systems to process user IDs ending with an even number and another set of systems to process user IDs ending with an odd number. See the following flow chart. The environment isolation that we mention here is physical isolation, that is, different environments are different machine clusters.

Image for post

What Role Does Environment Isolation Play?

The previous section defines the environment as a set of systems consisting of all necessary components for processing user requests and specified types of requests. This section describes the advantages of environment isolation. From the definition, we see at least three advantages: fault isolation, fault recovery, and phased release.

Fault Isolation

First, an environment is a unit of independent components that can process user requests. That is to say, the user request processing link is always related to specified machine clusters, no matter how long it is. Even if these machines have faults, only a portion of users will be affected, and faults will be isolated within the specified range. If we divide all the machines into ten environments by user ID, faults in one environment only have ten times smaller impact on users than treating all the machines as one environment as a whole. This can significantly improve system availability.

Fault Recovery

Another important advantage of environment isolation is that it enables fast fault recovery. When a service in a certain environment encounters faults, environment isolation allows us to distribute configuration, change the routing direction of user requests and route requests to another environment to implement fault recovery in seconds. To do this, we need a powerful distributed system, especially a powerful configuration center like Nacos, to quickly push routing rule configuration data to application processes across the entire network.

Phased Release

Phased release is an indispensable part of the R&D process. In tradition R&D, testing and phased release are very complicated and require a variety of configurations from testers, such as binding a host and configuring JVM parameters or environment variables. Years of practice at Alibaba have proven that testing and phased release in Alibaba are development and test friendly. Environment isolation ensures that requests are processed on specified machine clusters and that no configuration work is required for development and test, significantly improving the R&D efficiency.

How Does Nacos Enable Environment Isolation?

The last two sections respectively describe the definition and role of environment isolation. This section shows how to implement environment isolation based on Nacos.

Nacos is originated form the software load balancing group of the Alibaba middleware department. In the practical implementation of environment isolation, we isolate multiple physical clusters based on Nacos. At the same time, the Nacos client can implement automatic environment routing without requiring any code changes.

Before we explain the implementation of environment isolation, let’s make some constraints first:

  • All applications deployed on a machine are in the same environment.
  • By fault, an application process is only connected to Nacos in one environment.
  • A certain method can be used to obtain the IP of the machine where the client is located.
  • Users have planned the CIDR blocks of the machines.

The following shows the basic principles:

  • An 32-bit IPv4 address can be divided into many CIDR blocks like 192.168.1.0/24. Medium and large enterprises usually plan CIDR blocks for specific purposes. We can use this principle to implement environment isolation, that is, to have IPs with different CIDR blocks belong to different environments. For example, 192.168.1.0/24 belongs to environment A, while 192.168.2.0/24 belongs to environment B.
  • Nacos initializes client instances in two ways. One is to directly notify clients of the IPs on the Nacos service side. The other is to inform a client of an Endpoint, which goes to the Endpoint through a HTTP request and query a list of IPs on the Nacos service side. In this article, we use the second method to initialize client instances.
  • Enhance the feature of Endpoints. Configure mapping between CIDR blocks and environments on the Endpoint side. After Endpoints receive requests from the client, calculations will be performed based on the CIDR block of the source IP of the client to locate the environment where this client is and find and return the list of IPs of the corresponding environment to the client. This process is shown in the following flow chart.
Image for post

An Example Environment Isolation Server

The previous section describes constraints and basic principles of the environment isolation based on CIDR blocks. However, how can we exactly implement an IP address server? The simplest method is the nginx-based implementation: Configure mapping between IPs and environments by using the geo module of nginx and then return static file content by using nginx.

geo $env {
default "";
192.168.1.0/24 -env-a;
192.168.2.0/24 -env-b;
}
  • Configure the root path of nginx and routing algorithms. (It is only required to return the content of static files in this case.)
# Configure the root path in the HTTP module
root /tmp/htdocs;
# Configure the following in the server module
location / {
rewrite ^(.*)$ /$1$env break;
}
  • Configure the IP list profile of the Nacos server and configure a file ending with the environment name in the /tmp/hotdocs/nacos directory (one file in each separate row, with IPs as the file content).
$ll /tmp/hotdocs/nacos/
total 0
-rw-r--r-- 1 user1 users 0 Mar 5 08:53 serverlist
-rw-r--r-- 1 user1 users 0 Mar 5 08:53 serverlist-env-a
-rw-r--r-- 1 user1 users 0 Mar 5 08:53 serverlist-env-b
$cat /tmp/hotdocs/nacos/serverlist
192.168.1.2
192.168.1.3
  • Perform the verification.
curl 'localhost:8080/nacos/serverlist'
192.168.1.2
192.168.1.3

At this point, this simple environment isolation example based on IP CIDR blocks is ready to work. Nacos clients with different CIDR blocks will automatically obtain different Nacos server IP lists to implement environment isolation. The advantage of this method is that it does not require users to configure any parameters with code and configurations remaining the same in individual environments. However, this implementation method does require underlying service providers to make proper network plans and related configurations.

Summary

This article briefly explains the definition of environment isolation, the three advantages of environment isolation, and how-tos on implementing environment isolation based on CIDR blocks. An nginx-based environment isolation example for Endpoints is given at the end of this article. Note that this article only provides one feasible method. Maybe a more efficient implementation method can be used for the same purpose. If you have some better solutions, feel free to contribute them to the Nacos community or the official website.

About the Author

Zheng Ji (GitHub ID: @jianweiwang), Senior Development Engineer at Alibaba, is responsible for the development of Nacos and its community maintenance.

Reference

https://www.alibabacloud.com/blog/nacos-based-environment-isolation-at-alibaba_594856?spm=a2c41.12952741.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store