How to Properly Plan JVM Performance Tuning

  • Understand the JVM garbage collector
  • Be familiar with the common tools for JVM performance monitoring
  • Have the ability to read GC logs
  • Perform tuning only when necessary and practical (JVM performance tuning cannot solve all performance problems)
  • General procedures of JVM tuning
  • Key performance metrics of JVM tuning
  • Important JVM tuning principles
  • Tuning policies and examples

Performance Tuning Layers

To improve the system performance, we need to optimize the system from various perspectives and layers. The following are the layers to be optimized.

JVM Tuning Procedure

The final goal of tuning is to make an application have a larger throughput at the lowest cost of hardware consumption. JVM tuning is no exception. JVM tuning mainly involves optimizing the garbage collector for better collection performance so that applications running on VMs can have a larger throughput while using less memory and experiencing lower latency. Note that less memory/lower latency does not necessarily mean that the less/lower the memory/latency is, the better the performance is. It is about the optimal choice.

Performance Metrics

To find and evaluate performance bottlenecks, we need to know some definitions of performance metrics. For JVM tuning, we need to know the three following definitions and use these metrics as our base of evaluation:

  • Throughput: It is one of the important metrics. Throughput refers to the highest possible performance that the garbage collector allows applications to achieve, without considering the pause time or memory consumption caused by garbage collection.
  • Latency: Latency measures how much pause time resulting from garbage collection is reduced to avoid application vibrations during the running process.
  • Memory usage: It refers to the amount of memory required for the garbage collector to run smoothly.

Performance Tuning Principles

During the tuning process, the three following principles can help us implement easier garbage collection tuning to meet desired application performance requirements.

  • Minor GC collection principle: Each time Minor GC should collect as many garbage objects as possible to reduce the frequency of Full GC for an application.
  • GC memory maximization principle: When solving throughput and latency problems, the larger the memory used by the garbage collector, the more efficient the garbage collection and the smoother the application.
  • GC tuning “two out of three” principle: We should only tune two of the three performance attributes instead of all the three attributes: throughput, latency, and memory usage.

Performance Tuning Procedure

Determine Memory Usage

Before determining the memory usage, we need to know two things:

  1. Application operation phases
  2. JVM memory allocation

Operation Phase

I divide the operation of an application into the three following phases:

  • Initialization: A JVM loads an application and initializes the main modules and data of the application.
  • Stability: The application has been running for a long time and has received a stress test. Each performance parameter is in the stable state. The core functions have been executed and warmed up by using JIT compilation.
  • Summary: In the final summary phase, some benchmark tests are conducted to generate corresponding reports. We do not have to pay attention to this phase.

JVM Memory Allocation and Parameters

Calculate the Size of the Active Data

To calculate the size of the active data, follow these procedures:

  • Instead of setting start-up parameters manually, use the default JVM parameters when performing the test.
  • Make sure that the application is in the stable state when Full GC occurs.

When Is an Application in the Stable Phase?

After enough stress is exerted, an application is in the stable phase only when it reaches a workload that meets the business requirements at the business peak in the production environment and stays stable after the peak is reached. Therefore, to determine if an application reaches the stable phase, stress testing is essential. How to perform stress testing on applications is not within the scope of this article. This question will be explained in a separate article later.

GC log directive: -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:<filename>
jmap -histo:live pid
java -Xms373m -Xmx373m -Xmn140m -XX:PermSize=5m -XX:MaxPermSize=5m

Latency Tuning

After determining the active data size of the application, we need to perform latency tuning. Because at this point the heap memory size and latency cannot meet the application requirements, we need to debug the application based on the actual requirements of the application.

System Latency Requirements

Before tuning, we need to know what the system latency requirements are and which metrics can be tuned for latency.

  • Acceptable average downtime of an application: This time will be compared with the measured Minor GC duration.
  • Acceptable Minor GC frequency: The Minor GC frequency will be compared with the tolerable value.
  • Acceptable maximum pause time: The maximum pause time will be compared with the FullGC duration in the worst case.
  • Acceptable occurrence frequency of maximum pause: This is basically the frequency of FullGC.
  • MinorGC duration
  • Number of MinorGCs
  • The longest duration of FullGC
  • FullGC frequency in the worst case

Optimize the Young Generation Size

java -Xms359m -Xmx359m -Xmn126m -XX:PermSize=5m -XX:MaxPermSize=5mThe size of the young generation is changed from 140 MB to 126 MB; the heap size is changed accordingly; the old generation has no changes at this point.

Optimize the Size of the Old Generation

Like the previous step, we also need to obtain some data from the GC log before the optimization. In this step, we focus on the FullGC duration and frequency.

The average FullGC frequency is 1 FullGC every 5.8s.The average FullGC duration is 0.14s.(This is only a test. FullGC lasts longer in real projects.)

Object Promotion Rate

Can we perform evaluation if we do not have a FullGC log? We can use the promotion rate for evaluation.

After the first minor GC, the usage of the old generation space is 8 KB (13740 KB - 13732 KB).After the second minor GC, the usage of the old generation space is 4489 KB (22394 KB - 17905 KB).After the third minor GC, the usage of the old generation space is 16822 KB (34739 KB - 17917 KB).After the fourth minor GC, the usage of the old generation space is 30230 KB (48143 KB - 17913 KB).After the fifth minor GC, the usage of the old generation space is 44195 KB (62112 KB - 17917 KB).
Between the second and the first minorGCs: 4481 KBBetween the third and the second minorGCs: 12333 KBBetween the fourth and the third minorGCs: 13408 KBBetween the fifth and the fourth minorGCs: 13965 KB
The average usage promotion for each minorGC is 12211 KB (about 12 MB).In the preceding figure, the minorGC happens once every 213ms on average.Promotion rate = 12211 KB/213ms = 57 KB/msIt takes about 4.185s (233*1024/57 = 4185ms) to fully occupy 233 MB of the old generation space.

Throughput Tuning

After the preceding tuning steps, finally we come to the last tuning step. In this step, we perform a throughput test on the preceding result and make some fine tuning.

Conclusion

Plumbr conducted a survey on the usage of specific garbage collectors based on 84,936 cases. Among the 13% of cases where garbage collectors are explicitly specified, the concurrent-mark-sweep (CMS) collector is the most frequently used collector. However, an optimal garbage collector is not selected in the majority of these cases. This majority of the cases account for around 87%.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com