How to Migrate Data from an Amazon ES Domain to an Alibaba Cloud Elasticsearch Cluster

Image for post
Image for post

Released by ELK Geek

In China’s cloud service market, Alibaba Cloud has become popular among developers due to its convenience and stability. This article is intended for customers who want to migrate data from an Amazon Elasticsearch Service (Amazon ES) domain to an Alibaba Cloud Elasticsearch cluster. The following figure shows the reference architecture for the migration.

Image for post
Image for post

Introduction to Migration

Terms

Elasticsearch: This is a distributed RESTful search and analysis engine designed for a wide range of scenarios. As the core of Elastic Stack, Elasticsearch stores data in a centralized manner and helps to search for expected and unexpected data.

Kibana: This visualizes Elasticsearch data and provides a user interface for managing Elastic Stack.

Amazon ES: This is a fully managed service that offers easy-to-use Elasticsearch API operations and real-time analytics capabilities. This service also provides the availability, scalability, and security required for production workloads. You may use Amazon ES to deploy, protect, manage, and scale Elasticsearch clusters for scenarios such as log analysis, full-text search, and application monitoring.

Alibaba Cloud Elasticsearch: This service is not yet available on the international site, so this article discusses the service provided on the China site.

Snapshot and Restore: Store snapshots of individual indexes or an entire cluster in a remote repository, such as a shared file system like Amazon Simple Storage Service (Amazon S3) or Hadoop Distributed File System (HDFS). The snapshots are used to restore data quickly; however, data can be restored only to Elasticsearch clusters of specific versions:

1) Data in a snapshot created in an Elasticsearch 5.x cluster can be restored to an Elasticsearch 6.x cluster.
2) Data in a snapshot created in an Elasticsearch 2.x cluster can be restored to an Elasticsearch 5.x cluster.
3) Data in a snapshot created in an Elasticsearch 1.x cluster can be restored to an Elasticsearch 2.x cluster.

Note: Data in a snapshot created in an Elasticsearch 1.x cluster cannot be restored to an Elasticsearch 5.x or 6.x cluster. Data in a snapshot created in an Elasticsearch 2.x cluster cannot be restored to an Elasticsearch 6.x cluster. Snapshots are incremental and contain indexes that are created in multiple versions of Elasticsearch. If any indexes in a snapshot are created in an incompatible Elasticsearch version, the snapshot cannot be restored.

Migration Plan

The procedure to migrate data to an Alibaba Cloud Elasticsearch cluster is as follows:

1) Create a Baseline Index

1) Create a snapshot repository.

2) Create the first full snapshot for the index data to be migrated. This snapshot is automatically stored in the S3 bucket.

3) Create an Object Storage Service (OSS) bucket in Alibaba Cloud and register it with the snapshot repository of Alibaba Cloud Elasticsearch cluster.

4) Use OSSImport to transfer the full snapshot from the S3 bucket to the OSS bucket.

5) Restore data from the full snapshot to your Alibaba Cloud Elasticsearch cluster.

2) Process Incremental Snapshots on a Regular Basis

Repeat the preceding steps to restore data from incremental snapshots.

3) Identify the Final Snapshot and Switch the Service

1) Stop services that may modify index data.

2) Create the final snapshot for your Amazon ES domain.

3) Transfer the final snapshot to your OSS bucket. Then, restore data from the snapshot to your Alibaba Cloud Elasticsearch cluster.

4) Switch to the cluster.

Prerequisites

Elasticsearch Service

  • Create an Amazon ES 5.5.2 domain in the Singapore region.
  • Create an Alibaba Cloud Elasticsearch v5.5.3 cluster in the China (Hangzhou) region.
  • The sample index named “movies”.

Prerequisites for Creating Manual Snapshots in an Amazon ES Domain

Amazon ES automatically creates snapshots for the primary index shards in a domain every day and stores them in a pre-configured S3 bucket. These snapshots are retained for a maximum of 14 days without additional charge. Use these snapshots to restore data to the domain.

However, these cannot be used to migrate data to other domains. Automatic snapshots can only be read from the specified domain. To migrate data, use manual snapshots stored in the S3 bucket. Standard S3 charges apply to manual snapshots.

To create manual snapshots and restore data from the snapshots, use AWS Identity and Access Management (IAM) and S3. Before creating snapshots, perform the operations listed in the following table.

Image for post
Image for post

Create an S3 Bucket

An S3 bucket is required to store manual snapshots. Record its Amazon Resource Name (ARN). The ARN is used by the following items:

1) The resource element of the IAM policy that is attached to the specific IAM role

2) The Python client that is used to register a snapshot repository

The following example shows the ARN of an S3 bucket:

arn:aws:s3:::eric-es-index-backups

Create an IAM Role

An IAM role is a must for which Amazon ES (es.amazonaws.com) is specified in the Service element in its trust relationship.

Example:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "es.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}

View the trust relationship details in the AWS IAM console.

Image for post
Image for post
Image for post
Image for post

While creating a role in the IAM console, Amazon ES is not included in the “Select role type” drop-down list. Select Amazon EC2 from the drop-down list and create the role as prompted. Then, change ec2.amazonaws.com in the trust relationship of the role to es.amazonaws.com.

Create an IAM Policy

Attach an IAM policy to the IAM role. The policy specifies the S3 bucket used to store the manual snapshots of your Amazon ES domain. The following example specifies the ARN of the eric-es-index-backups bucket:

{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::eric-es-index-backups"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::eric-es-index-backups/*"
]
}
]
}

Copy the policy content to the “Edit Policy” section.

Image for post
Image for post

Click “Policy Summary” to check whether the policy is correct.

Image for post
Image for post

Attach an IAM policy to an IAM role.

Image for post
Image for post

Register a Manual Snapshot Repository

Create manual snapshots only after registering a snapshot repository with Amazon ES. Before creating manual snapshots, sign an AWS request to the user or role specified in the trust relationship of the IAM role.

You cannot run a curl command to register a snapshot repository because this command does not support AWS request signing. Use the sample Python client to register a snapshot repository.

1) Modify the Sample Python Client File

Download the sample Python client file and change the values highlighted in yellow in the file based on actual conditions. Then, copy the content into a Python file named snapshot.py.

The following table describes the variables in the sample Python client file.

Image for post
Image for post

2) Install Amazon Web Services Library boto-2.48.0

The sample Python client requires installing the boto package of version 2.x on the computer where the snapshot repository is registered.

# wget https://pypi.python.org/packages/66/e7/fe1db6a5ed53831b53b8a6695a8f134a58833cadb5f2740802bc3730ac15/boto-2.48.0.tar.gz#md5=ce4589dd9c1d7f5d347363223ae1b970 
# tar zxvf boto-2.48.0.tar.gz
# cd boto-2.48.0
# python setup.py install

3) Run the Python Client to Register the Snapshot Repository

# pyth
on snapshot.py

Log on to the Kibana console of the AWS ES domain. In the left-side navigation pane, click “Dev Tools”. On the “Console” tab, run the following command to view the registration result:

GET _snapshot
Image for post
Image for post

Create the First Snapshot and Restore Data from the Snapshot

1) Create a Snapshot in the Amazon ES Domain

Run the following commands in the Kibana console or by executing curl commands in the Linux or Mac OS X command-line interface (CLI).

Create a snapshot named snapshot_movies_1 for the movies index in the eric-snapshot-repository snapshot repository.

PUT _snapshot/eric-snapshot-repository/snapshot_movies_1
{
"indexes": "movies"
}

View the snapshot status.

GET _snapshot/ eric-snapshot-repository/snapshot_movies_1
Image for post
Image for post

In the S3 console, view snapshot objects.

Image for post
Image for post

2) Transfer the Created Snapshot From S3 Bucket to OSS Bucket

In this step, pull snapshot data from the AWS S3 bucket to Alibaba Cloud OSS bucket. For more information, see Migrate data from Amazon S3 to Alibaba Cloud OSS.

After the snapshot is transferred, view the snapshot in the OSS console.

Image for post
Image for post

3) Restore Data From the Snapshot to Alibaba Cloud Elasticsearch Cluster

Create a Snapshot Repository

Log on to the Kibana console of the Elasticsearch cluster. In the left-side navigation pane, click “Dev Tools”. On the “Console” tab, run the following command to create a snapshot repository. The name of the snapshot repository must be the same as that of the snapshot repository registered with Amazon ES. Enter the actual values according to the parameter description.

PUT _snapshot/eric-snapshot-repository
{
"type": "oss",
"settings": {
"endpoint": "http://oss-cn-hangzhou-internal.aliyuncs.com",
"access_key_id": "Put your AccessKey id here.",
"secret_access_key": "Put your secret AccessKey here.",
"bucket": "eric-oss-aws-es-snapshot-s3",
"compress": true
}
}
Image for post
Image for post

View the status of the snapshot named snapshot_movies_1.

GET _snapshot/eric-snapshot-repository/snapshot_movies_1

Note: Record the start time and end time of the snapshot creation operation. This record is used while using OssImport to migrate data in incremental snapshots.

Example:

"start_time_in_millis": 1519786844591
"end_time_in_millis": 1519786846236
Image for post
Image for post

4) Restore Data From the Snapshot

Log on to the Kibana console of the Elasticsearch cluster. In the left-side navigation pane, click “Dev Tools”.

POST _snapshot/eric-snapshot-repository/snapshot_movies_1/_restore
{
"indexes": "movies"
}
GET movies/_recovery
Image for post
Image for post

On the “Console” tab, run the following command to view the availability of the movies index. View three sets of data in the movies index. In addition, the data is the same as that in the Amazon ES domain.

Image for post
Image for post

Create the Final Snapshot and Restore Data from the Snapshot

1) Insert Data to the Movies Index in Amazon ES Domain

The movies index contains three sets of data. Insert two other sets of data.

Image for post
Image for post

Run the GET movies/_count command to view the data volume in the index.

Image for post
Image for post

2) Create a Snapshot

For more information, see step 1 in the “Create the First Snapshot and Restore Data from the Snapshot” section.

Image for post
Image for post

View objects in the S3 bucket.

Image for post
Image for post

Also, note the differences in the index folder.

3) Transfer the Snapshot From S3 Bucket to OSS Bucket

Use OSSImport to transfer the snapshot from the S3 bucket to the OSS bucket. The S3 bucket stores two snapshot objects. Change the value of the isSkipExistFile variable in the local_job.cfg file to migrate the incremental snapshot object.

Image for post
Image for post

Then, view the incremental snapshot object in the OSS bucket.

Alibaba Cloud OSS bucket:

Image for post
Image for post

AWS S3 bucket:

Image for post
Image for post

4) Restore Data from the Snapshot

For more information, see step 4 in the “Create the First Snapshot and Restore Data from the Snapshot” section. Before restoring data, close the movies index. After restoration, open the index.

POST /movies/_close
GET movies/_stats
POST _snapshot/eric-snapshot-repository/snapshot_movies_2/_restore
{
"indexes": "movies"
}
POST /movies/_open

After data is restored from the snapshot, there will be five documents in the movies index of the Elasticsearch cluster. This number is the same as that in the index of the Amazon ES domain.

Image for post
Image for post

Summary

Use the snapshot and restore feature to migrate data from an Amazon ES domain to an Alibaba Cloud Elasticsearch cluster. This feature requires closing the index to be migrated to avoid requests and write operations during the migration.

References

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store