Reshaping the Java Language on the Cloud

By Yu Lei, nicknamed Liangxi at Alibaba.

Created some twenty years ago, Java is an object-oriented programming (OOP) language based on a large number of excellent enterprise-level frameworks. It provides stability and high performance under rigorous and long-term operating conditions. Language simplicity is important to ensure fast iteration and delivery on the cloud. This makes Java a seemingly an inappropriate language, being rather much a heavyweight. Yet, this language is still an important tool.

This article was prepared by Yu Lei, a technical expert from the Alibaba JVM team. In this article, Yu Lei describes how the JVM team deals with the challenge of serving a large number of complex services within Alibaba using Java.


Alibaba’s custom Java Development Kit, AJDK ZenGC and ElasticHeap have supported hundreds of applications and hundreds of thousands of instances on core links during the largest online shopping promotion in the world, the Double 11 Shopping Festival.

JDK 12 and later versions support triggering the concurrent mark at a fixed time and shrinking the Java heap in remark before returning it to the memory. However, this does not remove stop-the-world (STW) pauses, so the heap memory cannot be returned during young garbage collection cycles. In ElasticHeap, concurrent and asynchronous threads process the overhead that results from repeated map and unmap actions on the memory and page fault handling. In this way, the heap memory can be promptly returned or the memory is available for use again during each young garbage collection cycle.

ElasticHeap Practices in Alibaba

Scenario 1: Predictable Traffic Peaks

Scenario 2: Multiple Java Instances Running on a Single Server

Multiple Java instances receive random traffic tasks without overlapping traffic peaks. This reduces the overall memory usage of multiple instances during idle times and improves deployment density.

During Alibaba’s e-commerce platform’s biggest promotion, Double 11, our core transaction system used ElasticHeap to run in low-power mode, greatly reducing the working set size (WSS) of instances.

Static Compilation

Java static compilation is an advanced form of ahead-of-time (AOT) compilation. Java programs are compiled into local code in an independent compilation phase and do not require traditional JVM or runtime environments. You only need to ensure that the Java programs support operating system class libraries. The following figure shows a schematic drawing of Java static compilation. The static compilation technology is used to compile Java programs into native programs with the bootstrap function and Java behavior. Such programs combine the advantages of Java programs and native programs.

The JVM team has worked closely with the SOFAStack team to develop middleware applications through static compilation. Static compilation reduces the startup duration of an application from 60 seconds to 3.8 seconds. During Double 11, statically compiled applications ran stably and without faults. The garbage collection pause was 100 milliseconds, which is acceptable for services. The memory usage and response time were the same as those of traditional Java applications.

Statically compiled applications are on an equal footing with traditional Java applications in terms of stability, resource usage, and response time, but reduce the average startup duration by 2,000%.


Alibaba’s customized JDK, AJDK Wisp2 allows you to develop applications with high-performance coroutines by using Java. Released on a large scale this year, Wisp2 supports coroutine scheduling in Java runtime to convert threads, such as Socket.getInputStream().read()) blocks, into lightweight coroutine switches.

Wisp2 is fully compatible with the Thread API. In Wisp2-enabled JDK, Thread.start() is used to create a coroutine, which is a lightweight thread. For this, Go only provides the coroutine keyword "go" without exposing the Thread API. Wisp2 only provides the coroutine creation method, allowing applications to transparently switch to coroutines.

Wisp2 supports work stealing and a scheduling policy suitable for web scenarios to minimize scheduling overhead under heavy workloads.

During the 2019 Double 11 Shopping Festival, Wisp supported hundreds of applications and hundreds of thousands of containers. Currently, 90% of Alibaba’s containers have been updated to Wisp2.

When nearing peak values, the CPU usage of Wisp2-enabled servers is about 7% lower, and Wisp1-enabled servers have even lower CPU usage. This is because Wisp2 is specially designed to reduce the response times. The reduction mainly comes from the conservation of system CPU resources due to lightweight scheduling. The CPU usage is the same at midnight, the start of Double 11, where the highest peaks in traffic are seen. This indicates that Wisp2 addresses the scheduling overhead and shows little performance advantage under low CPU usage and no scheduling burden.

The response times of the Wisp2-enabled server are about 20% lower due to coroutine scheduling under heavy workloads on the CPU. Coroutine scheduling pushes up the resource usage limit and prevents system crashes due to a high response time.


Principles and Adverse Impact of Deoptimization

Deoptimization has an adverse impact on two methods. The first is the executed method, which changes from efficient compilation to interpretation, lowering the running speed by a factor of over 100. The second is the method for which deoptimization is triggered during traffic peaks. This method is rapidly recompiled, and the compilation thread consumes CPU resources. The adverse impact of deoptimization is all the more obvious in scenarios where the traffic volume increases rapidly in a short period of time compared with the traffic during the ramp-up period. The Double 11 Shopping Festival is precisely such a scenario.

Mitigation of Deoptimization Through Feedback Directed Optimization

Effect of Feedback Directed Optimization During Double 11

  1. CPU usage was excessively high due to the traffic peak and frequent deoptimization and compilation at midnight during the Double 11 Shopping Festival.
  2. Ramp-up was insufficient. In a stress test with a long ramp-up period, compilation and deoptimization occurred frequently as traffic increased.

To address the first problem, we collected CPU usage data and counted the occurrences of deoptimization and C2 compilation at one minute after midnight during the Double 11 Shopping Festival when a traffic peak occurred.

With FDO enabled, the occurrences of C2 compilation fell by about 45%, and the occurrences of deoptimization fell by about 70%.

After FDO was enabled during the first minute of the peak period, CPU usage dropped by about 7.0%, from about 67.5% to 63.1%.

To address the second problem, we verified the CPU usage during the first minute of the stress test.

With FDO enabled, the CPU usage during the first minute of the stress test dropped by about 10%, from 66.19% to 60.33%.


Grace is integrated with the Heap Dump function and optimizes the functions of ZProfiler. It provides an advanced version of the parsing engine to fully support OQL syntax.

JDK 11

OpenJDK 11 is the next stable version of OpenJDK 8. The JVM team will pay close attention to the updates of OpenJDK 11. Currently, AJDK 11 supports the Wisp2 and multi-tenant features of AJDK 8. The clusters that were migrated to JDK 11 performed stably during the 2019 Double 11 Shopping Festival.

Will upgrading to JDK 11 bring as many benefits as upgrading to JDK 8 did? JDK 11 provides the latest version of Z Garbage Collector.


Currently, Z Garbage Collector is still an experimental feature of OpenJDK. JDK 11 is not widely used in the industry and only supports Linux-based ZGCs. Z Garbage Collectors based on MacOS and Windows can be supported by JDK 14 that will be released in March 2020. Java developers will have to wait a bit longer.

However, bold steps have been made. The Alibaba JVM team and database team have begun to run database applications on Z Garbage Collector and improve Z Garbage Collector based on the running results, such as by optimizing the Z Garbage Collector page caching mechanism and Z Garbage Collector trigger timing.

The two teams have stably run online database applications on Z Garbage Collector since September 2019. These applications withstood the traffic burden during the Double 11 Shopping Festival. Online feedback has been very positive.

  1. The JVM pause time was kept within the specified 10 ms.
  2. Z Garbage Collector improved the average response times and glitch metrics of clusters running online.


Using AJDK on Dragonwell

We are currently preparing to release Alibaba Dragonwell 11. Dragonwell 11 is a release version of Dragonwell based on OpenJDK 11. It provides a range of features such as cloud-based enablement, clear modularization, and long-term support. We recommend that you pay attention to this upcoming release and upgrade to Dragonwell 11 when it becomes available.

Are you eager to know the latest tech trends in Alibaba Cloud? Hear it from our top experts in our newly launched series, Tech Show!

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.