esrally: Official Stress Testing Tool for Elasticsearch

Image for post
Image for post

Released by ELK Geek

Elasticsearch is increasingly used as an enterprise business solution due to its simplicity and excellent performance in big data processing. However, any new solution must undergo a series of investigations and tests before it is adopted. In this spirit, this article introduces esrally, an official stress testing tool for Elasticsearch.

Importance of Stress Testing

Stress testing is a test method used to establish system stability. It usually forces a system to go beyond its normal operating conditions so as to identify its functional limits and hidden risks.

According to this definition, the purpose of stress testing is to measure the limits of a system and discover hidden risks in advance so that you can design precautions. In my opinion, stress testing for Elasticsearch generally serves the following purposes:

  1. Verify the performance of Elasticsearch: Although Elasticsearch is widely respected for its performance, you still have to verify it on your own before you can trust it.
  2. Experimentally test Elasticsearch configurations: For example, you can disable the _all feature of indexes and check how much the write performance improves.
  3. Compare the performance differences between different versions of Elasticsearch: As we all know, Elasticsearch is evolving quickly. Even before you decide whether to upgrade from v2.x to v7.x, v8.x is already around the corner. In this case, you still need to upgrade Elasticsearch. However, how do you convince your boss that this is necessary? It is very simple: Stress testing allows you to compare the new version with the earlier version in terms of write and read performance. Then, you can show the difference in tables and charts.
  4. Design capacity for Elasticsearch clusters: Difficulties will arise if you do not make plans. Capacity planning is a long-term activity. In simple terms, you need to know how many nodes your Elasticsearch cluster needs, the configuration of each node, and the write and read performance limit of the cluster. If you do not have this information, you have never done capacity planning and you have just been lucky not to have had any problems. However, problems may occur at any time. This is upsetting but it’s true. This issue is discussed in detail at the end of this article.

How to Perform Stress Testing

  1. Write test code on your own. You can write your own code for a stress test. However, it is up to you to ensure the code is correct. Here are some open-source projects for you to explore: esperf and elasticsearch-stress-test.
  2. Use an HTTP stress testing tool. Since Elasticsearch exposes Restful APIs, any HTTP-based stress testing tool, such as JMeter and httpload, can be used to test Elasticsearch.
  3. Use Rally (or esrally), an official Elasticsearch tool.

Each stress testing solution has its own advantages and disadvantages. You should select the most appropriate approach based on your needs and familiarity with the tools. Next, we will give a detailed description of esrally.

Introduction

For more information, see this blog. esrally provides the following features:

  1. Automatically create Elasticsearch clusters, stress test them, and delete them.
  2. Manage stress testing data and solutions by Elasticsearch version.
  3. Present stress testing data in a comprehensive way, allowing you to compare and analyze the data of different stress tests and store the data on a particular Elasticsearch instance for secondary analysis.
  4. Collect Java Virtual Machine (JVM) details, such as memory and garbage collection (GC) data, to locate performance problems.

Elasticsearch also officially uses esrally to test its performance and publishes the results on https://elasticsearch-benchmarks.elastic.co/ in real time. This website provides performance data for Elasticsearch. The official Elasticsearch team uses one sever each to run esrally and Elasticsearch during stress testing.

The configurations of the servers are as follows:

CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
RAM: 32 GBSSD: Crucial MX200
OS: Linux Kernel version 4.8.0-53
JVM: Oracle JDK 1.8.0_131-b11

The options, such as Geonames, Geopoint, and Percolator, in the top navigation bar of the website represent stress tests for different data sets. For example, the following figures show the stress testing results of logging log data.

Image for post
Image for post
Write performance
Image for post
Image for post
Read performance
Image for post
Image for post
Other system metrics

Quick Start

Install the following software for esrally:

  • Python 3.4 or later and pip3
  • JDK 8
  • Git 1.9 or later

Run the following command to install esrally:

pip3 install esrally

Tips:

You can use pip sources in China, such as those from Douban or Alibaba, to facilitate the installation.

After the installation, run the following configuration command to confirm data storage paths:

esrally configure

Now you are ready to run your first test. For example, run the following command to perform a stress test for Elasticsearch 5.0.0.

esrally --distribution-version=5.0.0

After the test, you will get a result like the one below.

Image for post
Image for post
Stress testing result

The data may seem confusing to you. Let’s explain it step by step.

Tips:

esrally test data is stored on AWS servers outside China. Therefore, downloading esrally test data can be very slow or even fail due to timeout, making stress testing difficult. To address this issue, the test data is compressed and uploaded to a server in China so that you can download it and put it in your esrally data folder to ensure normal stress testing. In addition, due to the large data volume, a stress test usually takes about one hour. Therefore, you need to be patient.

To quickly try out esrally, add the --test-mode parameter so that only 1,000 files are downloaded for your test.

Terms

Track

The repository contains a lot of test data, such as the geonames, geopoint, logging, and nested folders. Each folder contains a README.md file to provide details about the data and a track.json file to define stress testing policies.

Let’s take a look at the loggins/track.json file.

{% import "rally.helpers" as rally with context %}{
"short-description": "Logging benchmark",
"description": "This benchmark indexes HTTP server log data from the 1998 world cup.",
"data-url": "https://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/logging",
"indices": [
{
"name": "logs-181998",
"types": [
{
"name": "type",
"mapping": "mappings.json",
"documents": "documents-181998.json.bz2",
"document-count": 2708746,
"compressed-bytes": 13815456,
"uncompressed-bytes": 363512754
}
]
},
{
"name": "logs-191998",
"types": [
{
"name": "type",
"mapping": "mappings.json",
"documents": "documents-191998.json.bz2",
"document-count": 9697882,
"compressed-bytes": 49439633,
"uncompressed-bytes": 1301732149
}
]
}
],
"operations": [
{{ rally.collect(parts="operations/*.json") }}
],
"challenges": [
{{ rally.collect(parts="challenges/*.json") }}
]
}

The .json file consists of the following sections:

  • description and short-description: the descriptions of the track.
  • data-url: a Uniform Resource Locator (URL) that indicates the root path for downloading test data. You can obtain the data download address by combining the data-url field with the documents field in indices.
  • indices: the indexes that indicate what operations are available for the track, including create, update, and delete operations. For more information, go here.
  • operations: the specific operations, such as data index, segment force-merge, and search operations. See the following specific example. For more information, go here.
  • challenges: the set of operations that define a series of tasks, which are further combined to produce a stress testing process. For more information, see the following example. For more information, go here.

Here is a definition in operations/default.json:

{
"name": "index-append",
"operation-type": "index",
"bulk-size": 5000
}

The operation-type values include index, force-merge, index-stats, node-stats, and search. Each value has its own custom parameters. For example, you can specify the bulk-size parameter to determine the number of documents to be written in bulk into the index.

Here is a definition in challenges/default.json:

{
"name": "append-no-conflicts",
"description": "",
"default": true,
"index-settings": {
"index.number_of_replicas": 0
},
"schedule": [
{
"operation": "index-append",
"warmup-time-period": 240,
"clients": 8
},
{
"operation": "force-merge",
"clients": 1
},
{
"operation": "index-stats",
"clients": 1,
"warmup-iterations": 100,
"iterations": 100,
"target-throughput": 50
},
{
"operation": "node-stats",
"clients": 1,
"warmup-iterations": 100,
"iterations": 100,
"target-throughput": 50
},
{
"operation": "default",
"clients": 1,
"warmup-iterations": 100,
"iterations": 500,
"target-throughput": 10
},
{
"operation": "term",
"clients": 1,
"warmup-iterations": 100,
"iterations": 500,
"target-throughput": 60
},
{
"operation": "range",
"clients": 1,
"warmup-iterations": 100,
"iterations": 200,
"target-throughput": 2
},
{
"operation": "hourly_agg",
"clients": 1,
"warmup-iterations": 100,
"iterations": 100,
"target-throughput": 0.2
},
{
"operation": "scroll",
"clients": 1,
"warmup-iterations": 100,
"iterations": 200,
"target-throughput": 10
}
]
}

In this case, a challenge named append-no-conflicts is defined. Each stress test runs only one challenge. Therefore, the default parameter here indicates the challenge that is run by default when no challenge is specified for the stress test. The schedule element contains the following nine tasks to be executed in sequence for this challenge: index-append, force-merge, index-stats, node-stats, default, term, range, hourly_agg, and scroll. In this example, each task contains an operation. You can specify additional properties such as clients (the number of clients that execute a task concurrently), warmup-iterations (the number of iterations that each client executes for warmup), and iterations (the number of operation iterations that each client executes). For more information, go here.

You can run the following command to view tracks currently available for your esrally.

esrally list tracks

esrally track repositories are located in the benchmarks/tracks/ file in the rally directory (or in the ~/.rally directory by default for the Mac operating system).

Car

esrally list cars
Name
----------
16gheap
1gheap
2gheap
4gheap
8gheap
defaults
ea
verbose_iw

Configurations of cars are located in the benchmarks/teams/default/cars/ rally directory (or in the ~/.rally directory by default for the Mac operating system). For more information, see the car documentation. You can modify all Elasticsearch configurations except for heap configurations.

Race

esrally race --track=logging --challenge=append-no-conflicts --car="4gheap"

According to the preceding command, the stress test uses the track named logging, runs the challenge named append-no-conflicts in the track, and specifies a 4gheap Elasticsearch instance as the car 4gheap. For more information, see the race documentation.

Tournament

esrally list races
Recent races:
Race Timestamp Track Challenge Car User Tag
---------------- ------- ------------------- -------- ------------------------------
20160518T122341Z pmc append-no-conflicts defaults intention:reduce_alloc_1234
20160518T112057Z pmc append-no-conflicts defaults intention:baseline_github_1234
20160518T101957Z pmc append-no-conflicts defaults

You can run the following command to compare data between different races:

esrally compare --baseline=20160518T112057Z --contender=20160518T112341Z
Image for post
Image for post

Data comparison of two races

For more information, see the tournament documentation.

Pipeline

esrally list pipeline
Name Description
----------------------- ---------------------------------------------------------------------------------------------
from-sources-complete Builds and provisions Elasticsearch, runs a benchmark and reports results.
from-sources-skip-build Provisions Elasticsearch (skips the build), runs a benchmark and reports results.
from-distribution Downloads an Elasticsearch distribution, provisions it, runs a benchmark and reports results.
benchmark-only Assumes an already running Elasticsearch instance, runs a benchmark and reports results
  1. In the from-sources-complete pipeline, a stress test is run after an Elasticsearch instance is compiled from source code. You can use the — revision parameter to specify the commit hash to be compiled so that you can run the test on a particular submitted version.
  2. If compilation is completed, you can use the from-sources-skip-build pipeline to skip the compilation step, saving test time.
  3. The from-distribution pipeline allows you to specify an Elasticsearch version by using the — distribution-version parameter. esrally downloads the executable file of the version from the official website to run the test.
  4. In the benchmark-only pipeline, you need to manage Elasticsearch clusters, and esrally only performs stress testing. Use this pipeline if you want to test an existing cluster.

For more information, see the pipeline documentation.

Stress Testing Process

  1. Compile or download executable Elasticsearch instances based on parameter settings, and then create and start an Elasticsearch cluster as specified by the car. If the benchmark-only pipeline is used, skip this step.
  2. Download data based on the specified track, and then perform operations based on the specified challenge.
  3. Record and output the stress test result.

Result Analysis

Image for post
Image for post
Stress test result

A lot of metric data is listed in the Metric column. For more information, see the relevant documentation. You need to check the following metrics:

  1. Throughput: the throughput of each operation, such as index or search
  2. Latency: the response time of each operation
  3. Heap used for x: the usage of stacks

You can use the metrics that are most appropriate for your situation.

Each stress test is named after its time. For example, the name logs/rally_out_20170822T082858Z.log indicates that the stress test was started at 08:28:58 on August 22, 2017, and the final result and Elasticsearch operation logs of this stress test are recorded in benchmarks/races/2017-08-22-08-28-58.

In addition, for tests in the benchmark-only pipeline, namely, stress tests on existing clusters, you can install the X-Pack Basic version for monitoring. This allows you to view the relevant metrics during the stress tests.

Image for post
Image for post
X-Pack monitoring

esrally can be configured to save all race result data to a specified Elasticsearch instance. The configuration is as follows, which is stored in the rally.ini file in the esrally directory:

[reporting]
datastore.type = elasticsearch
datastore.host = localhost
datastore.port = 9200
datastore.secure = False
datastore.user =
datastore.password =

esrally stores data in the following three indexes. The asterisk (*) indicates the month. This means result data is stored by month.

1. The rally-metrics-* index records the result of each race by metric. The following figure shows all metrics of a race.

Image for post
Image for post
Metric data

The Time column lists the time of a stress test. The @timestamp column lists the time when metrics are collected. The Operation column lists specific operations performed. Any metric without an operation corresponds to an aggregation value. For example, the indexing_total_time metric indicates the total indexing time, and the segments_count metric indicates the total number of segments. Any metric with an operation records data of the operation. Note that the data of an operation is recorded according to the sampling time, not as final aggregated data. As shown in the preceding figure, one hour_agg operation has multiple metrics named service_time but collected at different times. Based on the data, you can make a visual chart of a particular metric in a race. For example, you can observe the throughput metric of the index-log task in this race by using the method shown in the following figure.

Image for post
Image for post
Metric data display

2. The rally-result-* index records the final aggregated result of each race by metric, such as the following data:

{
"user-tag": "shardSizeTest:size6",
"distribution-major-version": 5,
"environment": "local",
"car": "external",
"plugins": [
"x-pack"
],
"track": "logging",
"active": true,
"distribution-version": "5.5.2",
"node-count": 1,
"value": {
"50_0": 19.147876358032228,
"90_0": 21.03116340637207,
"99_0": 41.644479789733886,
"100_0": 47.20634460449219
},
"operation": "term",
"challenge": "default-index",
"trial-timestamp": "20170831T063724Z",
"name": "latency"
}

In this example, the latency metric of the term operation is recorded as an aggregated value in the form of a percentile. The data allows you to draw a multi-race comparison chart based on a particular metric. The following figure shows the comparison of multiple races based on the latency of hourly_agg (hour-based aggregation), default (match_all), term, and range queries.

Image for post
Image for post
Latency-based comparison of multiple races

3. The rally-races-* index records the final results of all races, namely, the output of command line execution.

In addition to Elasticsearch-related metric data, esrally records some test environment information, such as the operating system and JVM, allowing you to view the software and hardware environments involved in the test.

Practice Problems

Practice Problem 1

How can you identify the performance improvement of Elasticsearch 5.5.0 compared with Elasticsearch 2.4.6?

Answer:

Perform stress testing on Elasticsearch 5.5.0 and Elasticsearch 2.4.6 respectively, and then compare the relevant metrics of the two versions. Use the following track and challenge:

  • Track: nyc_taxis
  • Challenge: append-no-conflicts

Procedure:

1. Test the performance of Elasticsearch 2.4.6.

esrally race --distribution-version=2.4.6 --track=nyc_taxis --challenge=append-no-conflicts --user-tag="version:2.4.6"

2. Test the performance of Elasticsearch 5.5.0.

esrally race --distribution-version=5.5.0 --track=nyc_taxis --challenge=append-no-conflicts --user-tag="version:5.5.0"

3. Compare the results of the two races.

esrally list races
esrally compare --baseline=[2.4.6 race] --contender=[5.5.0 race]

Tips:

Use the --user-tag parameter to create tags for the race to facilitate subsequent searches.

To perform a quick test, add the --test-mode parameter to rapidly run a race by using test data.

Practice Problem 2

How can you test the impact of disabling the _all feature on the write performance?

Answer:

Perform two tests on Elasticsearch 5.5.0, one with the _all feature enabled and the other with the feature disabled. Then, compare the results of the two tests. Only perform index operations as you only need to test the write performance. Use the following track and challenge:

  • Track: nyc_taxis
  • Challenge: append-no-conflicts

Procedure:

1. By default, the _all feature is disabled in the mapping settings of the nyc_taxis track. Test the performance when the _all feature is disabled.

esrally race --distribution-version=5.5.0 --track=nyc_taxis --challenge=append-no-conflicts --user-tag="enableAll:false" --include-tasks="type:index"

2. Modify the mapping settings of the nyc_taxis track to enable the _all feature. The mapping file is located in the rally home directory.

In the benchmarks/tracks/default/nyc_taxis/mappings.json file, change the _all.enabled value to true.

esrally race --distribution-version=5.5.0 --track=nyc_taxis --challenge=append-no-conflicts --user-tag="enableAll:true" --include-tasks="type:index"

3. Compare the results of the two races.

esrally list races
esrally compare --baseline=[enableAll race] --contender=[disableAll race]

The following figure shows the comparison result of the two races when the — test-mode parameter is used. As you can see, disabling the _all feature improves the write performance.

Image for post
Image for post

Test result

Tips:

You can use the --include-tasks parameter to run only certain tasks in the challenge.

Practice Problem 3

How do you test the performance of an existing cluster?

Answer:

Use the benchmark-only pipeline. Use the following track and challenge:

  • Track: nyc_taxis
  • Challenge: append-no-conflicts

Procedure:

1. Run the following command to test an existing cluster:

esrally race --pipeline=benchmark-only --target-hosts=127.0.0.1:9200 --cluster-health=yellow --track=nyc_taxis --challenge=append-no-conflicts

Tips:

The --cluster-health=yellow parameter indicates that esrally checks the cluster status by default. If the cluster status is not green, esrally immediately closes. You add this parameter to address the situation.

I hope that the preceding three practice problems can help you quickly learn how to use esrally.

Advanced Practice Problems

As mentioned above, esrally comes with some readily-available configurations. However, you can resort to the following two solutions if you have other needs.

1. Customize your own car.

You can create a car configuration file in the benchmarks/teams/default/cars esrally directory. For more information, see the car documentation.

2. Build your own cluster.

You can build a cluster independent of esrally as needed.

Custom Tracks

esrally comes with many tracks that include a lot of data, as shown in the following figure.

Image for post
Image for post

These data files are located in the benchmarks/data esrally directory. Tracks are designed for different testing purposes. For more information, see corresponding repositories in GitHub.

You can customize tracks for directional stress testing on your own data. The customization process is simple. For more information, see the relevant documentation. The procedure is as follows:

  1. Create your own data directory in the benchmarks/data directory.
  2. Prepare data files to be used in your stress test. esrally uses .json files, which are actually JSON objects.
  3. Compress the prepared data files into a BZIP2 package and copy it to the directory you created in step 1.
  4. Create a custom track. You can copy the geoname directory, modify relevant configuration files, and bind the test data to the track.
  5. Run the esrally list rack command to view the custom track.

Distributed Stress Testing

esrally also supports distributed stress testing. If a single instance cannot support the number of requests or request concurrency you need, you can run esrally on multiple instances. For more documentation for distributed stress testing, go here. In this case, the esrally daemon is used, and the corresponding command is esrallyd. In short, esrally combines multiple instances into a cluster by using the esrallyd command and then distributes test tasks to corresponding instances by setting the --load-driver-hosts parameter. For more information, see the relevant documentation mentioned above.

Practice Problem 4

How do you determine the number of shards of an index?

Answer:

In fact, this question gives rises to two more questions:

  1. Will setting too few shards create problems? What happens if you always use the default value of 5 shards?
  2. Can too many shards be a problem? What happens if you set 100 shards?

To answer the two questions, you need to understand what shards do. Shards are the foundation of Elasticsearch’s distributed capabilities. When documents are indexed into Elasticsearch, Elasticsearch allocates each document to a corresponding shard according to a routing algorithm. Each shard corresponds to one lucene index. Is there an upper limit on the number of documents that can be stored in each shard? The answer is yes. Each shard can store up to 2³¹ documents, that is, approximately 2 billion documents. This is a hard limit in the lucene design. Does this mean that one or a few shards are enough if you have less than 2 billion documents? Not exactly. A larger shard means slower queries and higher costs for data migration and recovery. Therefore, we recommend that you keep the size of each shard below 50 GB. For more information, see discussion 1 and discussion 2.

Here are the answers to the preceding two questions.

A small shard quantity is not necessarily good. If shards contain too much data, query performance is affected.

However, a large shard quantity is not necessarily good either. Query performance drops when there are too many shards because each Elasticsearch query is distributed to all shards and then the results are aggregated. Therefore, you should determine an appropriate number of shards based on your actual situation.

You can find an article on capacity planning from the Elasticsearch website here The following procedure is proposed in the article:

  1. Use the hardware configuration in the production environment to create a single-node cluster.
  2. Create an index that has only one primary shard and no replicas and set relevant mapping information.
  3. Import real documents to the index you created in step 2.
  4. Run your test with real query statements.

During the test, monitor relevant metrics such as index performance and query performance. If any performance metric breaks through your expected threshold, the corresponding shard size is the expected single shard size. Then, you can roughly determine the number of shards to be configured for an index by using the following formula:

Number of shards = Total data volume of the index/Maximum size of a single shard

For example, if the maximum size of a single shard is 20 GB, and you estimate that the maximum data volume of the index will not exceed 200 GB within one or two years, you can set the number of shards to 10.

Next, you need to use esrally to complete the preceding stress testing steps.

1. Manually maintain the creation and running of Elasticsearch nodes and use the benchmark-only pipeline to run esrally.

2. Customize your track, paying attention to the following two points:

  • Generate real data. If your data volume is undersized, you can set the iterations parameter in the schedule element of your track to execute one operation in a loop. This is also applicable to write performance tests that require large data volumes.
  • Define your query task. You can define your query statements in the operations of your track, such as the following one:
{
"name": "hourly_agg",
"operation-type": "search",
"index": "logs-*",
"type": "type",
"body": {
"size": 0,
"aggs": {
"by_hour": {
"date_histogram": {
"field": "@timestamp",
"interval": "hour"
}
}
}
}
}

The body element is a custom query statement. You can set query statements as needed.

3. Set the mapping of the index to be consistent with that of the online settings, such as whether the _all feature is enabled.

4. Perform stress tests based on your custom track. Run esrally and Elasticsearch on machines that are independent of each other, preventing interference with Elasticsearch performance.

Tips:

By default, esrally deletes existing indexes and then creates indexes during each stress test. To avoid this, you can set the auto-managed parameter to false in the configuration of each index. For more information, go here.

By using this parameter, you can perform stress tests on query performance separately, instead of importing data first.

Summary

esrally is great. Try it out and then ask me any questions you have about it.

About the Author

Declaration: This article is reproduced with authorization from Wei Bin, the author of the original article esrally as the Stress Test Solution for Elasticsearch. The author reserves the right to hold users legally liable in the case of unauthorized use.

References

  1. Using Rally to Benchmark Elasticsearch Queries
  2. Video of speech by the esrally author
  3. Benchmarking Elasticsearch for Your Use Case with Rally

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store