How to Migrate Data from an Amazon ES Domain to an Alibaba Cloud Elasticsearch Cluster
Released by ELK Geek
In China’s cloud service market, Alibaba Cloud has become popular among developers due to its convenience and stability. This article is intended for customers who want to migrate data from an Amazon Elasticsearch Service (Amazon ES) domain to an Alibaba Cloud Elasticsearch cluster. The following figure shows the reference architecture for the migration.
Introduction to Migration
Elasticsearch: This is a distributed RESTful search and analysis engine designed for a wide range of scenarios. As the core of Elastic Stack, Elasticsearch stores data in a centralized manner and helps to search for expected and unexpected data.
Kibana: This visualizes Elasticsearch data and provides a user interface for managing Elastic Stack.
Amazon ES: This is a fully managed service that offers easy-to-use Elasticsearch API operations and real-time analytics capabilities. This service also provides the availability, scalability, and security required for production workloads. You may use Amazon ES to deploy, protect, manage, and scale Elasticsearch clusters for scenarios such as log analysis, full-text search, and application monitoring.
Alibaba Cloud Elasticsearch: This service is not yet available on the international site, so this article discusses the service provided on the China site.
Snapshot and Restore: Store snapshots of individual indexes or an entire cluster in a remote repository, such as a shared file system like Amazon Simple Storage Service (Amazon S3) or Hadoop Distributed File System (HDFS). The snapshots are used to restore data quickly; however, data can be restored only to Elasticsearch clusters of specific versions:
1) Data in a snapshot created in an Elasticsearch 5.x cluster can be restored to an Elasticsearch 6.x cluster.
2) Data in a snapshot created in an Elasticsearch 2.x cluster can be restored to an Elasticsearch 5.x cluster.
3) Data in a snapshot created in an Elasticsearch 1.x cluster can be restored to an Elasticsearch 2.x cluster.
Note: Data in a snapshot created in an Elasticsearch 1.x cluster cannot be restored to an Elasticsearch 5.x or 6.x cluster. Data in a snapshot created in an Elasticsearch 2.x cluster cannot be restored to an Elasticsearch 6.x cluster. Snapshots are incremental and contain indexes that are created in multiple versions of Elasticsearch. If any indexes in a snapshot are created in an incompatible Elasticsearch version, the snapshot cannot be restored.
The procedure to migrate data to an Alibaba Cloud Elasticsearch cluster is as follows:
1) Create a Baseline Index
1) Create a snapshot repository.
2) Create the first full snapshot for the index data to be migrated. This snapshot is automatically stored in the S3 bucket.
3) Create an Object Storage Service (OSS) bucket in Alibaba Cloud and register it with the snapshot repository of Alibaba Cloud Elasticsearch cluster.
4) Use OSSImport to transfer the full snapshot from the S3 bucket to the OSS bucket.
5) Restore data from the full snapshot to your Alibaba Cloud Elasticsearch cluster.
2) Process Incremental Snapshots on a Regular Basis
Repeat the preceding steps to restore data from incremental snapshots.
3) Identify the Final Snapshot and Switch the Service
1) Stop services that may modify index data.
2) Create the final snapshot for your Amazon ES domain.
3) Transfer the final snapshot to your OSS bucket. Then, restore data from the snapshot to your Alibaba Cloud Elasticsearch cluster.
4) Switch to the cluster.
- Create an Amazon ES 5.5.2 domain in the Singapore region.
- Create an Alibaba Cloud Elasticsearch v5.5.3 cluster in the China (Hangzhou) region.
- The sample index named “movies”.
Prerequisites for Creating Manual Snapshots in an Amazon ES Domain
Amazon ES automatically creates snapshots for the primary index shards in a domain every day and stores them in a pre-configured S3 bucket. These snapshots are retained for a maximum of 14 days without additional charge. Use these snapshots to restore data to the domain.
However, these cannot be used to migrate data to other domains. Automatic snapshots can only be read from the specified domain. To migrate data, use manual snapshots stored in the S3 bucket. Standard S3 charges apply to manual snapshots.
To create manual snapshots and restore data from the snapshots, use AWS Identity and Access Management (IAM) and S3. Before creating snapshots, perform the operations listed in the following table.
Create an S3 Bucket
An S3 bucket is required to store manual snapshots. Record its Amazon Resource Name (ARN). The ARN is used by the following items:
1) The resource element of the IAM policy that is attached to the specific IAM role
2) The Python client that is used to register a snapshot repository
The following example shows the ARN of an S3 bucket:
Create an IAM Role
An IAM role is a must for which Amazon ES (es.amazonaws.com) is specified in the Service element in its trust relationship.
View the trust relationship details in the AWS IAM console.
While creating a role in the IAM console, Amazon ES is not included in the “Select role type” drop-down list. Select Amazon EC2 from the drop-down list and create the role as prompted. Then, change ec2.amazonaws.com in the trust relationship of the role to es.amazonaws.com.
Create an IAM Policy
Attach an IAM policy to the IAM role. The policy specifies the S3 bucket used to store the manual snapshots of your Amazon ES domain. The following example specifies the ARN of the eric-es-index-backups bucket:
Copy the policy content to the “Edit Policy” section.
Click “Policy Summary” to check whether the policy is correct.
Attach an IAM policy to an IAM role.
Register a Manual Snapshot Repository
Create manual snapshots only after registering a snapshot repository with Amazon ES. Before creating manual snapshots, sign an AWS request to the user or role specified in the trust relationship of the IAM role.
You cannot run a curl command to register a snapshot repository because this command does not support AWS request signing. Use the sample Python client to register a snapshot repository.
1) Modify the Sample Python Client File
Download the sample Python client file and change the values highlighted in yellow in the file based on actual conditions. Then, copy the content into a Python file named
The following table describes the variables in the sample Python client file.
2) Install Amazon Web Services Library boto-2.48.0
The sample Python client requires installing the boto package of version 2.x on the computer where the snapshot repository is registered.
# wget https://pypi.python.org/packages/66/e7/fe1db6a5ed53831b53b8a6695a8f134a58833cadb5f2740802bc3730ac15/boto-2.48.0.tar.gz#md5=ce4589dd9c1d7f5d347363223ae1b970
# tar zxvf boto-2.48.0.tar.gz
# cd boto-2.48.0
# python setup.py install
3) Run the Python Client to Register the Snapshot Repository
Log on to the Kibana console of the AWS ES domain. In the left-side navigation pane, click “Dev Tools”. On the “Console” tab, run the following command to view the registration result:
Create the First Snapshot and Restore Data from the Snapshot
1) Create a Snapshot in the Amazon ES Domain
Run the following commands in the Kibana console or by executing curl commands in the Linux or Mac OS X command-line interface (CLI).
Create a snapshot named snapshot_movies_1 for the movies index in the eric-snapshot-repository snapshot repository.
View the snapshot status.
GET _snapshot/ eric-snapshot-repository/snapshot_movies_1
In the S3 console, view snapshot objects.
2) Transfer the Created Snapshot From S3 Bucket to OSS Bucket
In this step, pull snapshot data from the AWS S3 bucket to Alibaba Cloud OSS bucket. For more information, see Migrate data from Amazon S3 to Alibaba Cloud OSS.
After the snapshot is transferred, view the snapshot in the OSS console.
3) Restore Data From the Snapshot to Alibaba Cloud Elasticsearch Cluster
Create a Snapshot Repository
Log on to the Kibana console of the Elasticsearch cluster. In the left-side navigation pane, click “Dev Tools”. On the “Console” tab, run the following command to create a snapshot repository. The name of the snapshot repository must be the same as that of the snapshot repository registered with Amazon ES. Enter the actual values according to the parameter description.
"access_key_id": "Put your AccessKey id here.",
"secret_access_key": "Put your secret AccessKey here.",
View the status of the snapshot named snapshot_movies_1.
Note: Record the start time and end time of the snapshot creation operation. This record is used while using OssImport to migrate data in incremental snapshots.
4) Restore Data From the Snapshot
Log on to the Kibana console of the Elasticsearch cluster. In the left-side navigation pane, click “Dev Tools”.
On the “Console” tab, run the following command to view the availability of the movies index. View three sets of data in the movies index. In addition, the data is the same as that in the Amazon ES domain.
Create the Final Snapshot and Restore Data from the Snapshot
1) Insert Data to the Movies Index in Amazon ES Domain
The movies index contains three sets of data. Insert two other sets of data.
GET movies/_count command to view the data volume in the index.
2) Create a Snapshot
For more information, see step 1 in the “Create the First Snapshot and Restore Data from the Snapshot” section.
View objects in the S3 bucket.
Also, note the differences in the index folder.
3) Transfer the Snapshot From S3 Bucket to OSS Bucket
Use OSSImport to transfer the snapshot from the S3 bucket to the OSS bucket. The S3 bucket stores two snapshot objects. Change the value of the isSkipExistFile variable in the
local_job.cfg file to migrate the incremental snapshot object.
Then, view the incremental snapshot object in the OSS bucket.
Alibaba Cloud OSS bucket:
AWS S3 bucket:
4) Restore Data from the Snapshot
For more information, see step 4 in the “Create the First Snapshot and Restore Data from the Snapshot” section. Before restoring data, close the movies index. After restoration, open the index.
After data is restored from the snapshot, there will be five documents in the movies index of the Elasticsearch cluster. This number is the same as that in the index of the Amazon ES domain.
Use the snapshot and restore feature to migrate data from an Amazon ES domain to an Alibaba Cloud Elasticsearch cluster. This feature requires closing the index to be migrated to avoid requests and write operations during the migration.
- Migrate data from Amazon S3 to Alibaba Cloud OSS
- Description and configuration