Analyzing Elasticsearch Performance with Lucene

Lucene Query Principle

Data Structure and Query Principle of Lucene

Merge Results for Combined Conditions

1. Find the Intersection of N Postings Lists

2. Find the Union of N Postings Lists

3. How BKD-Tree Results Are Combined with Other Results

Query Order Optimization

Result Sorting

Performance Analysis for Various Query Scenarios

Single-Term Query

Create an index and a shard in ES. No replica is present. Prepare 10 million rows of data, each row containing only a few tags and a unique ID. Write all the data into the created index. Tag1 only has two values: a and b. Now, try to find entries with Tag1=a from the 10 million data rows (about 5 million entries). How long does it take to run the following query?
Request:
{
"query": {
"constant_score": {
"filter": {
"term": {
"Tag1": "a"
}
}
}
},
"size": 1
}'
Response:
{"took":233,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":5184867,"max_score":1.0,"hits":...}
{"took":3,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":10478,"max_score":1.0,"hits":...}

Term Combination Query

Consider a term combination query that includes two postings lists with a length of 10,000 and 5,000,000 respectively and has 5,000 matching data entries after the merge. How is the query performance?
Request:
{
"size": 1,
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"Tag1": "a" //length of postings list 5,000,000
}
},
{
"term": {
"Tag2": "0" // length of postings list 10,000
}
}
]
}
}
}
}
}
Response:
{"took":21,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":5266,"max_score":2.0,"hits":...}
{"took":393,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":5190079,"max_score":1.0,"hits":...}

String Range Query

Consider 10 million data entries. Each RecordID is a UUID, and each doc has a unique UUID. Find UUIDs that begin with 0–7. There are probably over 5 million results. Let's have a look at the query performance in this scenario.
Request:
{
"query": {
"constant_score": {
"filter": {
"range": {
"RecordID": {
"gte": "0",
"lte": "8"
}
}
}
}
},
"size": 1
}
Response:
{"took":3001,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":5185663,"max_score":1.0,"hits":...}
Assume that we are going to query UUIDs beginning with "a". We may get around 600,000 results. How about the performance?Request:
{
"query": {
"constant_score": {
"filter": {
"range": {
"RecordID": {
"gte": "a",
"lte": "b"
}
}
}
}
},
"size": 1
}
Response:
{"took":379,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":648556,"max_score":1.0,"hits":...}

String Range Query Plus Term Query

Consider a string range query (5 million matching entries) and two term queries (5,000 matching entries). A total of 2,600 entries meet the conditions. Let's test the performance.
Request:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"range": {
"RecordID": {
"gte": "0",
"lte": "8"
}
}
},
{
"term": {
"Tag1": "a"
}
},
{
"term": {
"Tag2": "0"
}
}
]
}
}
}
},
"size": 1
}
Results:
{"took":2849,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":2638,"max_score":1.0,"hits":...}

Numeric Range Query

For the numeric type, we also search 10 million data entries for 5 million targets and see how it performs.
Request:
{
"query": {
"constant_score": {
"filter": {
"range": {
"Number": {
"gte": 100000000,
"lte": 150000000
}
}
}
}
},
"size": 1
}
Response:
{"took":567,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":5183183,"max_score":1.0,"hits":...}

Numeric Range Query Plus Term Query

Now, we'll cover a complex query scenario: the numeric range includes 5 million data entries, and another two term conditions are also added to the query, with over 2,600 final entries that match the conditions. Let's evaluate the performance for this scenario.
Request:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"range": {
"Number": {
"gte": 100000000,
"lte": 150000000
}
}
},
{
"term": {
"Tag1": "a"
}
},
{
"term": {
"Tag2": "0"
}
}
]
}
}
}
},
"size": 1
}
Response:
{"took":27,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":2638,"max_score":1.0,"hits":...}

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store