Elasticsearch Distributed Consistency Principles Analysis (2) — Meta

  1. How does Master manage the cluster
  2. Meta composition, storage, and recovery
  3. ClusterState update process
  4. Resolving current consistency issues
  5. Summary

How Does Master Manage the Cluster

In the previous article, we introduced the ES cluster composition, how to discover the node, and Master election. So, how does the Master manage the cluster after it is successfully elected? There are several questions that needs to be addressed, such as:

  1. How does the Master handle Index creation or deletion?
  2. How does the Master reschedule Shard for load balancing?

Meta Composition, Storage, and Recovery

Before introducing the Meta update process, we will introduce Meta composition, storage, and recovery. If you are familiar with this topic, you can skip to the next one directly.

1. Meta: ClusterState, MetaData, IndexMetaData

Meta is the data used to describe data. In ES, the Index mapping structure, configuration, and persistence are meta data, and some configuration information of the cluster belongs to meta as well. Such meta data is very important. If the meta data that records an index is missing, the cluster thinks the index no longer exists. In ES, meta data can only be updated by the master, so the master is essentially the brain of the cluster.

long version: current version number, which increments by 1 for every update
String stateUUID: the unique id corresponding to the state
RoutingTable routingTable: routing table for all indexes
DiscoveryNodes nodes: current cluster nodes
MetaData metaData: meta data of the cluster
ClusterBlocks blocks: used to block some operations
ImmutableOpenMap<String, Custom> customs: custom configuration
ClusterName clusterName: cluster name
String clusterUUID: the unique id of the cluster.
long version: current version number, which increments by 1 for every update
Settings persistentSettings: persistent cluster settings
ImmutableOpenMap<String, IndexMetaData> indices: Meta of all Indexes
ImmutableOpenMap<String, IndexTemplateMetaData> templates: Meta of all templates
ImmutableOpenMap<String, Custom> customs: custom configuration
long version: current version number, which increments by 1 for every update.
int routingNumShards: used for routing shard count; it can only be the multiples of numberOfShards of this Index, which is used for split.
State state: Index status, and it is an enum with the values OPEN or CLOSE.
Settings settings: configurations, such as numbersOfShards and numbersOfRepilicas.
ImmutableOpenMap<String, MappingMetaData> mappings: mapping of the Index
ImmutableOpenMap<String, Custom> customs: custom configuration.
ImmutableOpenMap<String, AliasMetaData> aliases: alias
long[] primaryTerms: primaryTerm increments by 1 whenever a Shard switches the Primary, used to maintain the order.
ImmutableOpenIntMap<Set<String>> inSyncAllocationIds: represents the AllocationId at InSync state, and it is used to ensure data consistency, which is described in later articles.

2. Meta Storage

First, when an ES node is started, a data directory is configured, which is similar to the following. This node only has the Index of a single Shard.

`-- nodes
`-- 0
|-- _state
| |-- global-1.st
| `-- node-0.st
|-- indices
| `-- 2Scrm6nuQOOxUN2ewtrNJw
| |-- 0
| | |-- _state
| | | `-- state-0.st
| | |-- index
| | | |-- segments_1
| | | `-- write.lock
| | `-- translog
| | |-- translog-1.tlog
| | `-- translog.ckp
| `-- _state
| `-- state-2.st
`-- node.lock

3. Meta Recovery

If an ES cluster reboots, a role would be required to recover the Meta because all processes have lost the previous Meta information; this role is Master. First, Master election must take place in the ES cluster, and then the failure recovery can begin.

ClusterState Update Process

Now, we take a look at the ClusterState update process to see how ES guarantees consistency during ClusterState updates based on this process.

1. Atomicity guarantee for ClusterState changes made by different threads within the master process

First, atomicity must be guaranteed when different threads change ClusterState within the master process. Imagine that two threads are modifying the ClusterState, and they are making their own changes. Without concurrent protection, modifications committed by the last thread commit overwrite the modifications committed by the first thread or result in an invalid state change.

2. Ensuring subsequent changes are all committed accordingly and not rolled back once the ClusterState change is committed

As you know, once the new Meta is committed on a node, the node performs corresponding actions, like deleting a Shard, and the actions cannot be rolled back. However, if the Master node crashes while implementing the changes, the newly generated Master node must be changed based on the new Meta, and the rollback must not happen. Otherwise, the Meta may be rolled back although the action cannot be rolled back. Essentially, that means there is no longer consistency with the Meta update.

  1. NodeA was originally the Master node, but for some reason, NodeB became the new Master node. NodeA did not discover the change due to a heartbeat or detection issue.
  2. So, NodeA still considers itself the Master and publishes the new ClusterState as usual.
  3. However, since NodeB is now the Master, which indicates that more than half of the Master nodes consider NodeB to be the new Master, they do not return ACK to NodeA.
  4. Because NodeA cannot receive enough ACKs, the publish fails, and NodeA loses master status.
  5. And, because the new ClusterState is not committed on any node, there is no inconsistency.

3. Consistency issue analysis

The principle in ES is that the Master sends a commit request if more than half of MasterNode (master-eligible nodes) receive the new ClusterState. If more than half of the nodes are considered to have received the new ClusterState, the ClusterState can definitely be committed and is not rolled back in any situation.

https: //www.elastic.co/guide/en/elasticsearch/resiliency/current/index.htmlRepeated network partitions can cause cluster state updates to be lost (STATUS: ONGOING)
... This problem is mostly fixed by #20384 (v5.0.0), which takes committed cluster state updates into account during master election. This considerably reduces the chance of this rare problem occurring but does not fully mitigate it. If the second partition happens concurrently with a cluster state update and blocks the cluster state commit message from reaching a majority of nodes, it may be that the in flight update will be lost. If the now-isolated master can still acknowledge the cluster state update to the client, this will amount to the loss of an acknowledged change. Fixing that last scenario needs considerable work. We are currently working on it but have no ETA yet.

Solving Existing Consistency Problems

Since ES has some consistency problems with meta updates, here are some ways to resolve these problems.

1. Implement a standardized consistency algorithm, such as raft

The first approach is to implement a standardized consistency algorithm, such as raft. In the previous article, we have explained the similarities and differences between the ES election algorithm and the raft algorithm. Now, we continue to compare the ES meta update process and the raft log replication process.

  1. For the raft algorithm, the follower will persist the received logs onto disks. For ES, nodes receive the ClusterState, put it in a queue in memory and return it immediately. The ClusterState does not persist.
  2. The raft algorithm ensures that a log can be committed after more than half of the nodes respond. This isn’t guaranteed in ES, so some consistency problems may occur.

2. Ensure meta consistency by using additional components

For example, use ZooKeeper to save meta and ensure meta consistency. This does solve consistency problems, but performance still needs to be considered. For example, evaluate whether the meta is saved in ZooKeeper when the meta volume is too large, or whether full meta or diff data needs to be requested each time.

3. Use shared storage to save the meta

First, ensure that no split-brain will occur, then save the meta in shared storage to avoid any consistency problems. This approach requires a shared storage system that must be highly reliable and available.


As the second article in this series about Elasticsearch distribution consistency principles, this article mainly shows how the master nodes in ES clusters publish meta updates and analyze potential consistency problems. This article also describes what meta data consists of and how it is stored. The following article explains the methods used to ensure data consistency in ES, as well as the data-writing process and algorithm models.


Elasticsearch Resiliency Status



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com