The Open Application Model from Alibaba’s Perspective

Key Takeaways

  • Kubernetes is highly extensible, and this enables infra engineers to build extended operational capabilities. Despite this great flexibility, some issues come up for the users of these capabilities.
  • OAM is an open specification to define cloud native applications, with a goal of establishing application centric infrastructure in discoverable, manageable and platform-agnostic approach. In addition, there is an OAM implementation (Rudr) designed specifically for Kubernetes.
  • Alibaba is putting its experience of running both internal cluster and public cloud offerings, specifically, moving from defining in-house application CRD to a standard application model into OAM.
  • A major goal is to leave the inherent complexity of infrastructure to infrastructure engineers only and improve accuracy and efficiency in cooperation between various participants in the pipeline.

What is the Open Application Model (OAM)?

Alibaba co-announced the Open Application Model (OAM) with Microsoft on October 17th. OAM is a specification for describing application as well as its operational capabilities so that the application definition is separated from the details of how the application is deployed and managed.


Who We Are

We are “infra operators” in Alibaba. Specifically, we are responsible for developing, installing, and maintaining various platform capabilities. Our work includes, but is not limited to, operating K8s cluster, implementing controllers/operators, and developing K8s plugins. Internally, we are more often called “platform builders.” However, to differentiate us from the PaaS engineers working on top of our K8s clusters, we are referred to as “infra operators” in this article. We’ve had many past successes with Kubernetes, and we’ve learned a lot from the issues we encountered when using it.

We Manage All Kinds of Kubernetes Clusters

We operate arguably the world’s largest and most complicated Kubernetes clusters for Alibaba e-commerce business; these clusters:

  • Run over 10,000 applications;
  • Handle 100,000 deployments per day in peak time

We Serve Application Operators, Who Serve Developers

Similar to the application management stack in other Internet companies, the stack at Alibaba is done cooperatively by infra operators, application operators, and application developers. Application developers’ and application operators’ roles can be summarized as follows:

The Problems of Cooperation

From the description above, it’s obvious the three parties bring different expertises, but need to work in harmony to make sure everything works well. That can be difficult to achieve!

Interactions Between Infra Operators and Application Operators

Kubernetes is highly extensible, and this enables infra operators to build extended operational capabilities. Despite this great flexibility, some issues come up for the users of these capabilities — application operators.

apiVersion: ""
kind: CronHPA
name: cron-scaler
timezone: America/Los_Angeles
- cron: '0 0 6 * * ?'
minReplicas: 20
maxReplicas: 25
- cron: '0 0 19 * * ?'
minReplicas: 1
maxReplicas: 9
apiVersion: apps/v1
name: php-apache
- type: Resource
name: cpu
type: Utilization
averageUtilization: 50
  • Composable — Capabilities can be applied to the same application cooperatively. For example, Ingress and Rollout: Rollout upgrades the application and controls Ingress for progressive traffic shifting.
  • Conflicting — Capabilities should not be applied to the same application. For example, HPA and CronHPA; they conflict with each other if applied to the same application.

OAM’s Traits

In OAM, “Traits” are how we create capabilities with discoverability and manageability.

Discoverable Capabilities

In our K8s cluster, most traits are defined by infra operators and implemented using customized controllers in Kubernetes or external services, for example:

$ kubectl get traits
cron-scaler 19m
auto-scaler 19m

A Trait Provides A Structured Description for A Given Capability.

This description makes it easy for an application operator to understand a particular capability accurately, with a simple kubectl describe command, without digging into its CRD or documentation. The description of capability includes "what kind of workload this trait applies to," and "how to use it," etc.

kind: Trait
name: cron-scaler
properties: |
"description":"Timezone for the CRON expressions of this scaler.",
"description":"CRON expression for this scaling rule.",
"description":"Lower limit for the number of replicas.",
"description":"Upper limit for the number of replicas.",

Manageable Capabilities

An application operator will apply one or more installed traits to an application, by using the ApplicationConfiguration (described in detail in the next section). ApplicationConfiguration controller will handle the traits conflict, if any.

kind: ApplicationConfiguration
name: failed-example
- name: nginx-replicated-v1
instanceName: example-app
- name: auto-scaler
minimum: 1
maximum: 9
- name: cron-scaler
timezone: "America/Los_Angeles"
schedule: "0 0 6 * * ?"
cpu: 50

Interactions Between Application Operators and Application Developers

As “platform for platform,” Kubernetes does not restrict the role of the user who calls the core APIs. This means anyone can be responsible for any field in the API object. It is also called an “all-in-one” API, which makes it easy for a newbie to start. However, this poses a disadvantage when multiple teams with different focuses are required to work together on the same Kubernetes cluster, especially where application operators and developers need to collaborate on the same API set.

kind: Deployment
apiVersion: extensions/v1beta1
name: nginx-deployment
replicas: 3
deploy: example
deploy: example
- name: nginx
image: nginx:1.7.9
allowPrivilegeEscalation: false

Sorry, Not My Concern

Instead of having the application operator prepare this yaml cooperatively with developers, the most straightforward way is to ask the developers to fill the deployment yaml by themselves. But, developers may find fields that are not associated with their concerns at all.

Who is the Real Owner?

There are fields in K8s workload yaml that are not explicitly controlled by only one party. For example, when a developer sets replicas:3, he assumes it's a fixed number during the application lifecycle. But, most developers don't realize this field can be taken over by HPA controller, which may change the number according to Pod load. This conflict is problematic: when a developer wants to change the replica number later, the change may not take effect permanently.

Is “Clear Cut” the Solution?

As shown above, when using K8s APIs, the concerns of developers and operators are inextricably mixed together. It could be painful for several parties to work on the same API set. Furthermore, our past experience shows that sometimes application management systems (e.g., PaaS) may be hesitant to expose more K8s capabilities, because they don’t want to reveal more operational/infrastructure details to developers.

Developers’ Voices Should be Heard

There are cases where a developer wants to have their “opinions” heard by an operator, on behalf of their application. For example, assuming a developer defined several parameters for an application, then realized that application operator may rewrite them to fit different runtime environments. The issue — the application developer may only allow certain parameters to be modified. How could this information be conveyed efficiently to application operators?

  • Is a batch job, not a long running service
  • Requires highest level security, etc.

OAM’s Component and ApplicationConfiguration

In OAM, we try to logically decouple K8s API objects, so developers can fill in their own intentions, and still be able to convey information to operators in a structured manner.

Define the Application; Don’t Just Describe It.

Components are designed for developers to define an application without considering operational details. One application is composed of one or many components, for example, a Java web component and a database component.

  1. Component description — what to run, e.g., container image, workload type
  2. A list of overwritable parameters which are expressed as schemas
  1. Is this component replicable or not?
  2. Is this component long-running or one-time (is daemonized or not)?
  • fromParam: indicates the value of CONN, and actually comes from a parameter named connections in the parameters list, i.e. this value could be overwritten by operators.
  • The current value of connections is default as 1024.

The ApplicationConfiguration

Ultimately, the operators would use ApplicationConfiguration to instantiate the application, by referring to components’ names and applying traits to them.

  1. Developer defines component.yaml with selected workload type.
  2. Application operator (or the CI/CD system) then runs kubectl apply -f component.yaml to install this component.
  3. Application operator then defines ApplicationConfiguration with app-config.yaml to instantiate the application.
  4. Lastly, the application operator runs kubectl apply -f app-config.yaml to trigger the deployment of the whole application.
kind: ApplicationConfiguration
name: my-awesome-app
- componentName: nginx
instanceName: web-front-end
- name: connections
value: 4096
- name: auto-scaler
minimum: 3
maximum: 10
- name: security-policy
allowPrivilegeEscalation: false
  1. Note that the operator has to fill in integer 4096 instead of string "4096", because the schema of this field is clearly defined in Component.
  2. Trait auto-scaler - Used by the operator to apply autoscaler trait (e.g. HPA) to the component. Hence, its replica number will be fully controlled by autoscaler.
  3. Trait security-policy - Used by the operator to apply the security policy rules to the component.

Beyond Application Management

As we’ve described so far, our primary goal in using OAM is to fix the following problems in application management:

  1. How to enable several parties to accurately and efficiently work on the same platform, using the same API set.

The Future of OAM

For now, the specification and model of OAM is indeed solving many existing problems, but we believe there is still a long way to go. For example, we are working on practices of handling dependencies with OAM, integration of Dapr workload in OAM and many others.

About the Authors

Original Source:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store