How to Properly Plan JVM Performance Tuning

JVM performance tuning involves trade-offs between many aspects and one single aspect may greatly influence the overall performance. Therefore, it is required to comprehensively consider all possible influences. Understanding and following some basic principles and theories will make performance tuning a lot easier. To obtain a better understanding of the content of this article, you must meet the following prerequisites:

  • Understand the JVM garbage collector
  • Be familiar with the common tools for JVM performance monitoring
  • Have the ability to read GC logs
  • Perform tuning only when necessary and practical (JVM performance tuning cannot solve all performance problems)

If you are not familiar with the preceding content, you are recommended to read them up before proceeding with this article.

This article explains JVM performance tuning and shows how to perform application tuning by using parameters of a JVM. This article mainly involves the following content:

  • General procedures of JVM tuning
  • Key performance metrics of JVM tuning
  • Important JVM tuning principles
  • Tuning policies and examples

Performance Tuning Layers

Image for post
Image for post

As shown in the figure, in addition to JVM tuning, many other layers need to be optimized. Tuning a system does not only include JVM tuning. Instead, the overall tuning of systems is required to improve the system performance. This article only describes JVM tuning. Other tuning aspects will be discussed later.

Before the JVM tuning, assume that the architecture and code of a project have been tuned or are the optimal architecture and code for the current project. These two assumptions are the base of JVM tuning and architecture tuning has the most significant impact on the system performance. We cannot expect a qualitative leap from an application that has a defective architecture or requires relentless code optimization by only performing JVM tuning.

In addition, before the tuning begins, we need to have clear performance optimization goals and know the current performance bottlenecks. To optimize bottlenecks, we need to perform stress and benchmark tests on an application and use a variety of monitoring and statistics tool to confirm if an optimized application meets the desired goals.

JVM Tuning Procedure

Performance Metrics

  • Throughput: It is one of the important metrics. Throughput refers to the highest possible performance that the garbage collector allows applications to achieve, without considering the pause time or memory consumption caused by garbage collection.
  • Latency: Latency measures how much pause time resulting from garbage collection is reduced to avoid application vibrations during the running process.
  • Memory usage: It refers to the amount of memory required for the garbage collector to run smoothly.

Performance gains of any of the three attributes is almost at the cost of the performance loss of the other one or two attributes. The application business requirements determine how important one or two attributes are to an application.

Performance Tuning Principles

  • Minor GC collection principle: Each time Minor GC should collect as many garbage objects as possible to reduce the frequency of Full GC for an application.
  • GC memory maximization principle: When solving throughput and latency problems, the larger the memory used by the garbage collector, the more efficient the garbage collection and the smoother the application.
  • GC tuning “two out of three” principle: We should only tune two of the three performance attributes instead of all the three attributes: throughput, latency, and memory usage.

Performance Tuning Procedure

Image for post
Image for post

The preceding figure shows the basic JVM tuning procedures of applications. We can see that JVM tuning involves continuous configuration optimizations and multiple iterations based on the performance test results. Before each desired system metric is met, each of the previous steps may experience multiple iterations. In some cases, to meet a specific metric, the previous parameters may need to be tuned many times, requiring all the previous steps to be tested again.

In addition, tuning generally starts with meeting the memory usage requirement of applications, then latency and throughput. Tuning should follow this sequence of steps. We cannot invert the sequence of these tuning steps. The following sections will use an example to elaborate on each tuning step.

For running JVMs, we directly select the Server mode, which is the officially recommended mode after JDK 1.6.

We use the default parallel collector in JDK 1.6–1.8 as the garbage collector. (Use parallelGC for the young generation and parallelOldGC for the old generation.)

Determine Memory Usage

  1. Application operation phases
  2. JVM memory allocation

Operation Phase

  • Initialization: A JVM loads an application and initializes the main modules and data of the application.
  • Stability: The application has been running for a long time and has received a stress test. Each performance parameter is in the stable state. The core functions have been executed and warmed up by using JIT compilation.
  • Summary: In the final summary phase, some benchmark tests are conducted to generate corresponding reports. We do not have to pay attention to this phase.

Memory usage and the size of the active data should be determined in the application stability phase instead of during the project start-up stage. Before explaining how to determine the memory usage, let’s look at JVM memory allocation first.

JVM Memory Allocation and Parameters

Image for post
Image for post

The main JVM heap space consists of the young generation, the old generation, and the permanent generation. The young generation size, the old generation size, and the permanent generation size make up the total heap size. Specific object promotion methods are not discussed here. Now let’s look at how the following JVM commands specify the heap size. If the following parameters are not used to specify the heap size, a virtual machine will automatically select a proper value, which may be automatically adjusted based on the system overhead.

Image for post
Image for post

If the performance overhead is a concern, set the initial size and the maximum size of the permanent generation to the same value whenever possible, because only FullGC can implement the size adjustment for the permanent generation.

Calculate the Size of the Active Data

Image for post
Image for post

As previously mentioned, the active data size should be measured by how much space of a Java heap is occupied by the data that has been in the active state for a long time since the beginning of the application stability phase.

Be sure to meet the following requirements when calculating the active data size:

  • Instead of setting start-up parameters manually, use the default JVM parameters when performing the test.
  • Make sure that the application is in the stable state when Full GC occurs.

Using the default JVM start-up parameters is for the purpose of observing the required memory usage when the application is in the stable phase.

When Is an Application in the Stable Phase?

After determining that an application is in the stable phase, pay attention to the GC log of the application, especially the Full GC log.

GC log directive: -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:<filename>

GC logs are the best way to collect the information required for optimization. Even in the production environment, we can enable GC logs to locate problems. Enabling GC logs has minimal impact on performance while providing rich data.

A FullGC log is required. If no FullG logs are available, use monitoring tools to enforce a call or use the following command to trigger the log.

jmap -histo:live pid

We can obtain the following information when Full GC is triggered in the stable phase:

Image for post
Image for post

From the preceding GC log, we can roughly estimate the heap usage and GC time of the entire application during full GC. To get a more accurate estimation, collect information several times and find the average value. Or, use the longest FullGC for estimation.

In the preceding figure, after the full GC, 93168 KB (around 93 MB) of the old generation space is occupied. This volume of data is considered as the active data in the old generation space.

Other heap spaces are allocated by using the following rules.

Based on the preceding rules and the FullGC information in the preceding figure, the heap spaces of the application can be planned as follows:

Java heap space: 373 MB = 93168 KB (old generation space) × 4

Young generation space: 140 MB = 93168 KB (old generation space) × 1.5

Permanent generation space: 5 MB = 3135 KB (Permanent generation space) × 1.5

Old generation space: 233 MB = 373 MB (heap space) — 140 MB (Young generation space)

The corresponding application startup parameter should be:

java -Xms373m -Xmx373m -Xmn140m -XX:PermSize=5m -XX:MaxPermSize=5m

Latency Tuning

In this phase, we may need to optimize the heap size configuration again, evaluate GC duration and frequency and decide whether it is necessary to switch to a different garbage collector.

System Latency Requirements

  • Acceptable average downtime of an application: This time will be compared with the measured Minor GC duration.
  • Acceptable Minor GC frequency: The Minor GC frequency will be compared with the tolerable value.
  • Acceptable maximum pause time: The maximum pause time will be compared with the FullGC duration in the worst case.
  • Acceptable occurrence frequency of maximum pause: This is basically the frequency of FullGC.

Among the preceding metrics, pay special attention to the average downtime and the maximum pause time. The two metrics are of great importance to the user experience.

Based on the aforementioned requirements, we need to obtain the following data:

  • MinorGC duration
  • Number of MinorGCs
  • The longest duration of FullGC
  • FullGC frequency in the worst case

Optimize the Young Generation Size

Image for post
Image for post

For example, in the preceding GC log, the average duration of Minor GC is 0.069 seconds and MinorGC happens once every 0.389 seconds.

If the average downtime is set to 50ms, and the current duration (69ms) is obviously too long and requires adjustment.

We know that the larger the young generation space, the longer the Minor GC duration and the lower the frequency.

To shorten the duration, we need to reduce the size of the young generation space.

To reduce the frequency, we need to increase the size of the young generation space.

To minimize the impact on other sections due to changes in the young generation size, remain the original size of the old generation space if possible when you change the size of the young generation space.

For example, if the size of the young generation space is reduced by 10%, the size of the old generation space and the permanent generation space should not be changed. The following is the parameters after the optimization in this step:

java -Xms359m -Xmx359m -Xmn126m -XX:PermSize=5m -XX:MaxPermSize=5mThe size of the young generation is changed from 140 MB to 126 MB; the heap size is changed accordingly; the old generation has no changes at this point.

Optimize the Size of the Old Generation

Image for post
Image for post

We can obtain the following information from the preceding figure:

The average FullGC frequency is 1 FullGC every 5.8s.The average FullGC duration is 0.14s.(This is only a test. FullGC lasts longer in real projects.)

Object Promotion Rate

For example, in the preceding startup parameter, the size of the old generation is 233 MB.

How long it takes to occupy the available 233 MB space depends on the promotion rate from the young generation to the old generation.

Promoted usage of the old generation = Java heap usage after each MinorGC — young generation usage after MinorGC

Object promotion rate = average value (promoted old generation usage each time)/old generation space

With the object promotion rate, we can calculate the number of minorGCs required to occupy the space of the old generation and the rough duration of one fullGC.


Image for post
Image for post

The preceding figure shows the following information:

After the first minor GC, the usage of the old generation space is 8 KB (13740 KB - 13732 KB).After the second minor GC, the usage of the old generation space is 4489 KB (22394 KB - 17905 KB).After the third minor GC, the usage of the old generation space is 16822 KB (34739 KB - 17917 KB).After the fourth minor GC, the usage of the old generation space is 30230 KB (48143 KB - 17913 KB).After the fifth minor GC, the usage of the old generation space is 44195 KB (62112 KB - 17917 KB).

The promoted usage of the old generation after each minor GC

Between the second and the first minorGCs: 4481 KBBetween the third and the second minorGCs: 12333 KBBetween the fourth and the third minorGCs: 13408 KBBetween the fifth and the fourth minorGCs: 13965 KB

After the calculation, we can obtain the following information:

The average usage promotion for each minorGC is 12211 KB (about 12 MB).In the preceding figure, the minorGC happens once every 213ms on average.Promotion rate = 12211 KB/213ms = 57 KB/msIt takes about 4.185s (233*1024/57 = 4185ms) to fully occupy 233 MB of the old generation space.

The two preceding methods can be used to estimate the worst Full GC frequency. We can adjust the Full GC frequency by changing the size of the old generation. If a Full GC lasts too long and cannot meet the lowest latency requirement for an application, we need to switch the garbage collector. The next article will elaborate on how to switch to a different garbage collector (for example, switch to current-mark-sweep, CMS). Tuning CMS is slightly different.

Throughput Tuning

Throughput tuning is mainly based on the throughput requirement of an application. An application should have a comprehensive throughput metric, which is derived from the overall application requirements and tests. When the application throughput reaches or exceeds the expected throughput goal, we can end the tuning.

If the application throughput goal still cannot be reached after optimization, we need to review the throughput requirement and assess how much the gap between the current throughput and the goal is. If the gap is around 20%, we can modify the parameters, increase the memory, then re-debug the application again. If the gap is too huge, we need to consider whether the design and the throughput goal are consistent from the perspective of the entire application and re-assess the throughput goal.

For a garbage collector, the goal of throughput tuning is to reduce or avoid the occurrence of Full GC or Stop-The-World CMS. Both of the two garbage collection methods can lead to reduced application throughput. Try to recycle as many objects as possible in the MinorGC phase to prevent objects from being promoted too quickly to the old generation.


Image for post
Image for post

JVM tuning is a systematic and complex task. At present, the automatic adjustment under JVMs is very excellent and basic initial parameters can ensure that common applications runs stably. For some teams, application performance may not take a high priority. In this case, the default garbage collector is usually adequate enough to meet the desired requirement. Tuning should be based on your own situation.


Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store