Dubbo’s Cloud-Native Transformation: Analysis of Application-Level Service Discovery

17 min readNov 2, 2020

By Liu Jun (Lugui), Apache Dubbo PMC

Overview

Starting from version 2.7.5, we have introduced a new service discovery mechanism based on the instance (application) granularity for the community edition of Dubbo. This marks an important step for us in exploring Dubbo’s adaptation to the cloud-native infrastructure. It has been about half a year since the release of version 2.7.5. With the exploration and summary in this period, we have had a comprehensive and in-depth understanding of the feasibility and stability of this mechanism. Meanwhile, the planning of Dubbo 3.0 is in full progress. Then, how to make application-level service discovery the basic service model of Dubbo 3.0 the next-generation service framework in the future and how to solve the expansion and scalability problems of cloud-native and large-scale microservice clusters will become the focuses of our current work.

Since this new mechanism is so important, how does it work? Today, I will explain it in detail. In the initial community version, we gave this mechanism a mysterious name, that is, service introspection. I will further explain the origin of this name in the following sections and use “service introspection” to refer to this new application-level service discovery mechanism.

Developers familiar with Dubbo know that services have been defined by using Remote Procedure Call (RPC)-oriented methods. This is also the basis for Dubbo’s development of friendly and powerful governance features. So, why do we need to additionally define an application-level service discovery mechanism? How does this mechanism work? What is the difference between this mechanism and the existing one? What benefits can we seek from it? What are the benefits of cloud-native adaptation and performance improvement?

With all these questions in mind, let’s begin.

What Is Service Introspection?

First, let me answer the questions mentioned at the beginning of this article:

Which kind of model embodies application-level service discovery, and what is the difference between this model and the existing Dubbo service discovery model?
Why do we call it service introspection?

The so-called “application and instance granularity” or “RPC service granularity” stresses a data organization format for address discovery.

Take Dubbo’s current address discovery data format as an example, which is an “RPC service granularity” format. It uses an RPC service as the key and the instance list as the value to organize data:

"RPC Service1": [
  {"name":"instance1", "ip":"127.0.0.1", "metadata":{"timeout":1000}},
  {"name":"instance2", "ip":"127.0.0.1", "metadata":{"timeout":2000}},
  {"name":"instance3", "ip":"127.0.0.1", "metadata":{"timeout":3000}},
]
"RPC Service2": [Instance list of RPC Service2],
"RPC ServiceN": [Instance list of RPCServiceN]

The new “application granularity-based service discovery” mechanism uses an application name as the key and the list of instances deployed by the application as the value. As a result, this introduces two differences:

The data mapping relationship changes from RPC Service -> Instance to Application -> Instance.
Less data is involved, and the registration center does not include the RPC service and its configuration information.

"application1": [
  {"name":"instance1", "ip":"127.0.0.1", "metadata":{}},
  {"name":"instance2", "ip":"127.0.0.1", "metadata":{}},
  {"name":"instanceN", "ip":"127.0.0.1", "metadata":{}}
]

To further understand the changes brought about by the new model, let’s take a look at the relationship between applications and RPC services. Typically, multiple RPC services may be defined in one application. Therefore, Dubbo’s previous service discovery granularity is more delicate, and more data entries are generated in the registration center, which is proportional to the RPC service. Meanwhile, this results in data redundancy to some extent.

After we briefly go through the basic working mechanism of application-level service discovery, let’s see why it is called “service introspection.”

To this end, let’s also begin with its working principle. As previously mentioned, the data model of application-level service discovery introduced the following changes: the data volume of the data center declines, RPC service-related data is removed from the registration center, and only application-level data and instance-level data are retained. To ensure that the absent RPC service data can still be correctly perceived by the consumer side, we have established a separate communication channel between the consumer and the provider. In this channel, the consumer and the provider exchange information through specific ports. Here, we regard the behavior that the provider actively exposes its own information as an introspection mechanism. Therefore, from this perspective, we name the entire mechanism “service introspection”.

Why Do We Need Service Introspection?

When I talked about the working principle of service introspection, I also mentioned several differences it had introduced to the registration center. These differences are reflected in the Dubbo framework and the entire microservice system, and have the following benefits:

The model is aligned with mainstream microservice models in the industry, such as Spring Cloud and Kubernetes Native Service.
The model helps improve performance and scalability. The reorganization (reduction) of the registration center data can minimize the storage and push pressure on the registration center, reducing the address calculation pressure on the Dubbo consumer. Meanwhile, the cluster size becomes predictable and assessable. The size is independent of the number of RPC interfaces and only dependent on the scale of instance deployment.

1. Align With Mainstream Microservice Models

Automatic and transparent address discovery (load balancing) is a common task for all microservice frameworks. It turns the backend deployment structure to be transparent to upstream microservices. In this way, the upstream service only needs to select one address from the received address list to initiate a call. To achieve this purpose, automatic synchronization is required for two points:

One is the automatic synchronization of the instance address because the service consumer needs to know the address to establish a connection.
The other is the automatic synchronization of the RPC method definition. The service consumer needs to know the specific definition of the RPC service, regardless of whether the service mode is representational state transfer (REST) or remote method invocation (RMI.)

For data synchronization between RPC instances with the help of the registration center, the REST mode has defined an interesting maturity model. If you are interested, click this link for reference.

According to the definition of 4-level maturity in the referenced article, Dubbo’s current interface-level model corresponds to level 4.

Next, let’s see how Dubbo, Spring Cloud, and Kubernetes are designed around the goal of automated instance address discovery.

2. Spring Cloud

Spring Cloud only synchronizes application and instance addresses through the registration center. The consumer can establish a connection with the service provider based on the instance address, but the consumer has no idea about initiating HTTP calls because Spring Cloud is based on rest communication. For example, the consumer has no idea what HTTP endpoints the service provider has, and which parameters need to be passed in.

Currently, RPC service information is negotiated by offline agreement or offline management systems. The pros and cons of this schema are summarized below:

Advantages: The deployment structure is clear and the workload of the address push is low.
Disadvantages: Address subscription requires the specification of the application name, and provider application changes (splits) need to be consumer-aware. In addition, RPC calls cannot be synchronized automatically.

3. Dubbo

Dubbo simultaneously synchronizes the instance address and RPC method through the registration center, so it can achieve automatic synchronization of the RPC process, the orientation to RPC programming and RPC governance, and the imperceptibility of the consumer for the split of the backend application. The disadvantage is that the number of address pushes increases, which is proportional to the used RPC method.

4. Dubbo + Kubernetes

To support Kubernetes’ native services, compared with the service discovery system that builds the registration center on its own, Dubbo has two major changes in the working mechanism:

Service registration is taken over by the platform. As a result, the provider no longer needs to care about service registration.
Service discovery on the consumer side will be Dubbo’s focus. By interfacing with API Server and the Domain Name System (DNS) at the platform layer, the Dubbo client can query a set of endpoints (a group of pods that run the provider) through a service name (usually corresponding to the application name) and trigger Dubbo’s built-in load balancing capability by mapping the endpoints to Dubbo’s internal address list.

As an abstract concept, how to map a Kubernetes service to Dubbo is worth discussing.

In the case of mapping Service Name to Application Name, Dubbo applications have a one-to-one correspondence to Kubernetes services. Moreover, these applications are transparent to microservice O&M and construction and are decoupled from the development stage.

apiVersion: v1
kind: Service
metadata:
  name: provider-app-name
spec:
  selector:
    app: provider-app-name
  ports:
    - protocol: TCP
      port:
targetPort: 9376

In the case of mapping Service Name to Dubbo RPC Service, Kubernetes maintains the binding of the scheduled service and the application’s built-in RPC service, so the number of services to be maintained increases.

---
apiVersion: v1
kind: Service
metadata:
  name: rpc-service-1
spec:
  selector:
    app: provider-app-name
  ports: ##
...
---
apiVersion: v1
kind: Service
metadata:
  name: rpc-service-2
spec:
  selector:
    app: provider-app-name
  ports: ##
...
---
apiVersion: v1
kind: Service
metadata:
  name: rpc-service-N
spec:
  selector:
app: provider-app-name
  ports: ##
...

Based on the analysis of the preceding different microservice framework models, we can find that in the abstract definition of microservices, Dubbo is quite different from other products, such as Spring Cloud and Kubernetes. Spring Cloud and Kubernetes adopt similar microservice model abstraction methods. The two products only care about the synchronization of instance addresses. If we look into some other service framework products, we will find that most of them are designed in the same way, that is, at level 3 in the REST maturity model.

In contrast, Dubbo is special as its design aims at the granularity of RPC services. It corresponds to level 4 in the REST maturity model.

As shown in the detailed analysis of each model, each model has its pros and cons. The reason why we believed that Dubbo had to make changes and align itself with other microservice discovery models was that when we first determined Dubbo’s cloud-native solution, we found that Dubbo needed to support Kubernetes Native Service, which requires model alignment as a prerequisite. Another reason is the demand from the user side for Dubbo’s scenario-based engineering practices. Thanks to Dubbo’s support for multi-registration and multi-protocol capabilities, Dubbo can connect different microservice systems. However, the inconsistency of service discovery models has become one of the obstacles.

5. Microservice Clusters at a Larger Scale: Solve Performance Bottlenecks

This section talks about the interaction with the registration center and configuration center. As for the changes in the data of the registration center under different models, we have briefly analyzed them in the earlier working principle section. To more intuitively compare the push efficiency improvements brought about by the service model changes, let’s take a look at a comparison between registration centers of different models:

The left side of the figure shows the typical workflow of a microservices framework. In this framework, the provider and consumer implement automated address notification through the registration center. The table in the figure shows the provider instance information.

The application DEMO contains three interfaces, DemoService 1, 2, and 3. The IP address of the current instance is 10.210.134.30.

For Spring Cloud and Kubernetes models, the registration center stores only one piece of data, DEMO — 10.210.134.30+metadata.
For the old Dubbo model, the registration center stores three pieces of interface-level data corresponding to interfaces DemoService 1, 2, and 3. In this case, lots of repeated address data occur.

We can conclude that the amount of data stored and pushed by the application granularity-based model is proportional to the number of applications and instances. Only when the number of applications or the number of application instances increases will the pressure of address push increase.

For an interface granularity-based model, the amount of data is positively correlated with the number of interfaces. Given that an application usually carries multiple interfaces, the order of magnitudes of the interface-level model needs to time a multiplier to compare to that of the application-level model. Another key point is that interface granularity leads to an opaque evaluation of the cluster size. Compared with the growth in the number of instances and applications, which is usually included in O&M planning, the definition of the interface is more of the internal behavior of the service side, which can bypass the evaluation and impose pressure on the cluster.

Take a consumer-side service subscription as an example. According to my rough statistics on some medium- to large-scale Dubbo head users in the community and based on the actual scenario of the target companies, a consumer application needs to consume (subscribe to) more than 10 provider applications, or specifically, the number of interfaces to be consumed (subscribed) reaches 30. On average, the three interfaces subscribed by the consumer come from the same provider application. In this way, if the application granularity is used as the basic unit of address notification and addressing, the average address push and calculation volume will drop by more than 60%.

In extreme cases, when more consumer-side consumption interfaces come from the same application, the address push and memory consumption volume will be further reduced, with a potential reduction of even more than 80%.

A typical scenario is the gateway application in the Dubbo system. Some gateway applications consume (subscribe to) more than 100 applications, while the number of the consumption (subscription) services is more than 1,000. On average, 10 interfaces come from the same application. If we change the granularity of address push and calculation to the application level, the amount of address push will change from n 1000 to n 100, with a reduction by nearly 90%.

Working Principle

1. Design Guidelines

In the previous section, I described the benefits or reasons for Dubbo’s orientation to application-level service discovery from the perspective of the service model and supporting large-scale clusters. On the other hand, the service governance capabilities of interface granularity must also be retained, as it is the foundation of the ease of use of the programming model and the benefits of service governance capabilities of Dubbo’s framework.

In my opinion, we must continue to adhere to the following design guidelines during service model migration:

The new service discovery model needs to realize imperceptible migration to original Dubbo consumer-side developers. Dubbo still needs to orient to RPC service programming and RPC service governance and be completely imperceptible to the user side.
Meanwhile, Dubbo needs to develop an automatic RPC service metadata coordination mechanism between the consumer and the provider to solve the problem that traditional microservice models cannot synchronize RPC-level interface configurations.

2. Detailed Explanation of Basic Principles

As a new service discovery mechanism, application-level service discovery is almost identical to Dubbo’s previous RPC service granularity-based service discovery in terms of the core process. That is, the service provider registers the address information with the registration center, and the service consumer pulls and subscribes to the address information from the registration center.

The main differences are listed below:

The registration center data is organized in the format of the “application-instance list” and no longer contains RPC service information.

The following example shows the metadata of each instance. The general principle is that the metadata contains only information related to the current instance node and excludes RPC service-level information.

The general information mainly contains these items: the instance address, instance environment variables, the metadata of metadata services, and several other necessary properties.

{
  "name": "provider-app-name",
  "id": "192.168.0.102:20880",
  "address": "192.168.0.102",
  "port": 20880,
  "sslPort": null,
  "payload": {
    "id": null,
    "name": "provider-app-name",
    "metadata": {
      "metadataService": "{\"dubbo\":{\"version\":\"1.0.0\",\"dubbo\":\"2.0.2\",\"release\":\"2.7.5\",\"port\":\"20881\"}}",
      "endpoints": "[{\"port\":20880,\"protocol\":\"dubbo\"}]",
      "storage-type": "local",
      "revision": "6785535733750099598",
    }
  },
  "registrationTimeUTC": 1583461240877,
  "serviceType": "DYNAMIC",
  "uriSpec": null
}

The client and the server negotiate the RPC method information by themselves.

After the registration center no longer synchronizes the RPC service information, service introspection sets up a built-in RPC service information negotiation mechanism between the service consumer and the provider. This reflects the origin of the name “service introspection”. The server-side instance exposes a predefined MetadataService RPC service, and the consumer obtains the configuration information related to the RPC method of each instance by calling MetadataService.

Currently, the format of data returned by MetadataService is:

[
  "dubbo://192.168.0.102:20880/org.apache.dubbo.demo.DemoService? anyhost=true&application=demo-provider&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=org.apache.dubbo.demo.DemoService&methods=sayHello&pid=9585&release=2.7.5&side=provider&timestamp=1583469714314", 
 "dubbo://192.168.0.102:20880/org.apache.dubbo.demo.HelloService? anyhost=true&application=demo-provider&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=org.apache.dubbo.demo.DemoService&methods=sayHello&pid=9585&release=2.7.5&side=provider&timestamp=1583469714314",
  "dubbo://192.168.0.102:20880/org.apache.dubbo.demo.WorldService? anyhost=true&application=demo-provider&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=org.apache.dubbo.demo.DemoService&methods=sayHello&pid=9585&release=2.7.5&side=provider&timestamp=1583469714314"
]

For developers that are familiar with Dubbo’s RPC service granularity-based service discovery model, they can find that the service introspection mechanism splits the uniform resource locator (URL) used to be transmitted by the registration center into two parts:

One part of the data related to the instance is still kept in the registration center, such as the IP address, port number, and machine identifier.
The other part of data related to RPC methods is removed from the registration center and exposed to the consumer through MetadataService.

Ideally, a URL can be strictly divided into instance-related data and RPC service-related data. However, you can clearly see that data redundancy occurs in the implemented version, and some data failed to be rationally divided. This issue is especially true for MetadataService. As you can see, the returned data is a URL list assembly, which contains full data.

The following figure shows the complete workflow for service introspection, which details the collaboration process among service registration, service discovery, MetadataService, and RPC calls.

When the service provider starts, it first parses the “ordinary services” defined by the application and registers them one by one as RPC services. Then, it registers the built-in MetadataService and finally opens the TCP listening port.
After the service provider is started, instance information (which includes only instance-related data, such as the IP address and port number) is registered to the registration center. At this point, the startup of the provider is completed.
When the service consumer starts, it queries the address list in the registration center according to the application name of the provider to be consumed and completes the subscription to implement the automatic notification of subsequent address changes.
Once the consumer obtains the address list, it initiates a call to MetadataService. The returned result contains all the “ordinary services” defined by the application and their related configuration information.
At this point, the consumer can receive external traffic and initiate Dubbo RPC calls to the provider.

The preceding workflow only considered a case where everything went smoothly. However, in a more specific design or coding implementation, we need to strictly stipulate framework behavior for certain unexpected cases. For example, if the consumer fails to call MetadataService, it will not be able to receive external traffic until the retry is known to be successful.

Key Mechanisms in Service Introspection

1. The Metadata Synchronization Mechanism

Configuration synchronization between the client and the server after they receive the address push request is the key phase of service introspection. Currently, there are two options for metadata synchronization, the built-in MetadataService, and an independent metadata center, which coordinates data through a moderately refined metadata cluster.

Built-in MetadataService: MetadataService is exposed through the standard Dubbo protocol. It returns the “ordinary service” configuration in the memory to the consumer according to the query conditions. This step occurs before the consumer address is selected and called.
Metadata center: The metadata center introduced in version 2.7 is reused. After the provider instance starts, it tries to organize the internal RPC service into the metadata center as a piece of metadata. The consumer will actively query the metadata center each time it receives a push update from the registration center.

Note: The timing for the consumer to query the metadata center is after the notification of the address update of the registration center is received. Through the data issued by the registration center, we can know when the metadata of an instance has been updated. Only at this point will you need to query the metadata center.

2. The Two-Way Relationship Between RPC Services and Application Mapping

Now, let’s recall that registration center data is organized in the format of the “application-instance list.” Currently, this change is not fully transparent to developers. That means service developers will be aware of changes in the mechanism for querying or subscribing to the address list. Specifically, compared with the past when addresses were retrieved based on RPC services, the consumer now needs to specify the provider application name to implement address query or subscription.

This is sample code from the legacy consumer development and configuration practice:

<!-- The framework directly queries or subscribes to the address list in the registration center through RPC Service 1/2/N. -->
<dubbo:registry address="zookeeper://127.0.0.1:2181"/>
<dubbo:referenceinterface="RPCService1" />
<dubbo:referenceinterface="RPCService2" />
<dubbo:referenceinterface="RPCServiceN" />

This is sample code from the new consumer development and configuration practice:

<!-- The framework can only query or subscribe to the address list in the registration center through RPC Service 1/2/N and additional provided-by="provider-app-x". -->
<dubbo:registryaddress="zookeeper://127.0.0.1:2181?registry-type=service"/>
<dubbo:referenceinterface="RPC Service 1"provided-by="provider-app-x"/>
<dubbo:referenceinterface="RPC Service 2"provided-by="provider-app-x" />
<dubbo:referenceinterface="RPC Service N"provided-by="provider-app-y" />

The method of specifying the provider application name in the preceding example is the current practice of Spring Cloud. It requires the developer on the consumer side to explicitly specify the provider application to be consumed.

The root cause of the preceding problem is that the registration center does not know any information related to the RPC service. As a result, it can only query the application by the application name.

To make the entire development process more transparent to legacy Dubbo users while avoiding the impact of the specified provider on scalability (see below for details), we have designed a set of mapping relationships between RPC services and application names to automatically complete the conversion from RPC services to provider application names on the consumer side.

The reason for establishing a mapping relationship between interfaces and applications in Dubbo is that the mapping relationship between services and applications is not definite. A typical scenario is application-service splitting. For example, the preceding configuration defines PC Service 2 as a service in provider-app-x. In the future, the service may be split by developers into another application, such as provider-app-x-1. This split needs to be perceived by all PC Service 2 consumers, and the application needs to be modified and upgraded accordingly, which is costly.

Whether to use the Dubbo framework to help developers solve this problem transparently or leave the problem to developers is only a matter of strategic choice. Currently, both options are available in Dubbo 2.7.5 and later versions. I prefer to leave it to service developers by leveraging organizational constraints. This approach can further reduce the complexity of the Dubbo framework and improve runtime stability.

Summary and Prospects

The application-level service discovery mechanism is an important step for Dubbo’s transformation to cloud-native. It bridges the gap between Dubbo and other microservice systems at the address discovery level and also becomes the foundation for Dubbo to adapt to Kubernetes’ native services and other infrastructure.

We hope that Dubbo will retain its strengths in simple programming and service governance based on the new model. Meanwhile, we must note that the application granularity-based model increases complexity and requires further optimization and enhancement. On the other hand, in addition to the address storage and push, further application granularity exploration is required as it can potentially help Dubbo in addressing.

About the Author

Liu Jun, whose GitHub ID is Chickenlj, is a core developer of Apache Dubbo PMC. He has witnessed the whole Dubbo process, from the renewed popularity of Dubbo in the open-source community to the rise of Apache Dubbo. Currently, he works on the Alibaba Cloud Cloud-Native Application Platform Team and is engaged in the development of service frameworks and microservices. He is currently responsible for promoting the Dubbo cloud-native version, Dubbo 3.0.