The Architecture behind Cainiao’s Elastic Scheduling System

How an Elastic Scheduling Application Works at Cainiao

To present, Cainiao has been mainly implementing elastic scheduling for stateless application clusters. What this means is that, with the system that is used, each cluster will have more than fifteen containers before the cluster will connect to the elastic scheduling system.

How Cainiao Ark’s Elastic Scheduling Scheme Works

The Basic Mode of Elastic Scheduling

Some Advantages

Now, in this section, let’s go into the nitty gritty details of some of the advantages of this closed-loop feedback mode.

  • First of all, one important advantage of this mode is that, to a certain degree, this mode has a certain capacity to have system improvements implemented over time.
  • Another reason why this mode is advantageous is because this mode configures a massive number of parameters at a higher level of abstraction, allowing Cainiao to resolve some common issues that may plague the system.

Cainiao’s Elastic Scheduling System’s Ark Architecture

Reasons for Using a Three-layer Decision-making Model

In our exploration of Ark’s architecture, let’s first take a look at the three-layer decision-making model used by the elastic scheduling system of Ark. You can clearly see the three major layers of this architecture in the graphic above.

  • The first layer is for policy decision-making.
  • The second layer is for aggregation-based decision-making.
  • The third layer is for decision execution.

Some Important rules

Given that the rules to consider during the decision execution process are relatively complex, in this section, we are going to consider some of the more important rules:

  • The scaling status rule of an application cluster.
  • The rule for modes.
  • The protection rule for the maximum and minimum values.

Some Methods to Achieve Statelessness, Idempotence, and High Availability of Computing

  • The elastic scheduling system of Ark strongly depends on Isolate Schedule Service (ISS).
  • The data used in the online computing of the elastic scheduling system of Ark derive from the Alimetrics built-in metric system.
  • To filter glitches, all computing tasks are based on large or small sliding time windows.
  • Among the three decision-making layers, the third layer is deployed in different clusters than the other two.

Original Source:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: