Use DTS for Real-time Data Synchronization between ApsaraDB RDS for MySQL and Alibaba Cloud Elasticsearch

Released by ELK Geek

Data Transmission Service (DTS) synchronizes production data from an ApsaraDB RDS for MySQL instance to an Alibaba Cloud Elasticsearch instance in real-time after you create a real-time data synchronization task in the DTS console. This article focuses on the various supported real-time synchronization types and SQL operations. It further elaborates on the configuration procedure to support synchronization.

Supported Real-time Synchronization Types

Real-time data synchronization from an ApsaraDB RDS for MySQL instance to an Alibaba Cloud Elasticsearch instance in the same Alibaba Cloud account

Supported SQL Operations

Supported SQL operations include:

  • Inset

Note: Currently, DTS does not support Data Definition Language (DDL) synchronization and ignores DDL operations during synchronization.

In case a DDL operation of a table is encountered, the Data Manipulation Language (DML) operation of the table may fail. To rectify the failure, follow the steps below:

1) Delete the table from the synchronization object list. For more information, see “Remove an object from a data synchronization task.
2) Delete the index corresponding to this table in the Alibaba Cloud Elasticsearch instance.
3) Modify the synchronization task, re-add the table to the synchronization objects, and re-initialize the table. For more information, see “Add an object to a data synchronization task.”

If a DDL operation is used to modify a table and add a column, the recommended DDL operation sequence is as follows:

1) Modify the mapping of the corresponding table and add a new column in the Alibaba Cloud Elasticsearch instance.
2) Modify the table structure and add a column in the source ApsaraDB RDS for MySQL instance.
3) Pause and restart the DTS synchronization task to allow DTS to reload the modified mapping in the Alibaba Cloud Elasticsearch instance.

Configuration Procedure

This section describes how to create a synchronization channel from the ApsaraDB RDS for MySQL instance to the Alibaba Cloud Elasticsearch instance.

1) Buy a Synchronization Channel

Log on to the DTS console and click Data Synchronization in the left-side navigation pane. Click Create Data Synchronization Task in the upper-right corner and purchase a synchronization channel. Then, return to the DTS console to configure the synchronization channel.

Note: Purchase a synchronization channel before configuring it. Either select the monthly subscription or pay-as-you-go billing mode for the synchronization channel.

Parameters on the Synchronization Channel Purchase Page

  • Feature: Select Data Synchronization.

Note: The DTS console displays synchronization instances by region. The region where the purchased synchronization instance is located is its destination region. For example, the synchronization instance here is used to synchronize data from the Hangzhou ApsaraDB RDS for MySQL instance to the Hangzhou Alibaba Cloud Elasticsearch instance. Therefore, it is located in Hangzhou. Go to the synchronization instance list for Hangzhou, find the purchased synchronization instance, and click Configure Synchronization Task next to the instance to configure the instance.

2) Configure the Synchronization Channel

  • Synchronization Task Name: Although the synchronization task name does not have to be unique, we recommend using an informative name to help you identify and manage the synchronization channel.
  • Destination Instance Details: Configure the ID of the Alibaba Cloud Elasticsearch instance and the account and password used to access the Alibaba Cloud Elasticsearch instance.

After completing the preceding configurations, click Set Whitelist and Next to add whitelists for the ApsaraDB RDS for MySQL and Alibaba Cloud Elasticsearch instances.

3) Set a Whitelist for an Instance

Note: DTS automatically adds a whitelist or security group for an ApsaraDB RDS for MySQL instance.

If the source instance is an ApsaraDB RDS for MySQL instance, DTS adds its IP address to the security group of the RDS instance whitelist to prevent a synchronization task creation failure due to the DTS server’s failure to connect to the database. To guarantee the stability of a synchronization task, do not delete the server IP addresses from the security group of the RDS instance whitelist during synchronization. After the whitelist is configured, click Next to create a synchronization account.

4) Select the Objects to be Synchronized

After the whitelist is configured, select the objects to be synchronized. In this step, configure the list of objects to be synchronized and the index naming rules.

1) Set Index Name to Table Name or DatabaseName_TableName.

  • If you select Table Name, the index name is the same as the table name.

2) Select the list of databases to be synchronized. The objects for real-time synchronization can be tables. This means you may select specific databases or tables to synchronize.

3) The docid is the primary key of all tables by default. If some tables do not have a primary key, configure a source table column corresponding to docid. In the Selected box on the right, move your cursor over the corresponding table and click Editcto enter the advanced settings page of the table.

4) On the advanced settings page, configure Index Name, Type Name, IsPartion, and _id value. If _id value is set to the primary key of the table, select the corresponding business primary key column.

After the objects to be synchronized are configured, proceed to the Advanced Settings step.

5) Advanced Settings

Main configuration settings include the following fields:

1) Synchronization Initialization: We recommend selecting Schema Initialization and Full Data Initialization so that DTS automatically creates indexes and initializes full data. If you do not select Schema Initialization, manually define the mapping for indexes in the Alibaba Cloud Elasticsearch instance before creating a synchronization task. If you do not select Full Data Initialization, DTS synchronizes incremental data starting from the time when synchronization is started.

2) Shard Configuration: Five shards and one replica are configured by default. Adjust the configuration according to specific business requirements. Once the configuration has been changed, all indexes will define shards according to the new configuration.

3) String Definition: Select a string analyzer whose default value is Standard Analyzer. Valid values include Standard Analyzer, Simple Analyzer, Whitespace Analyzer, Stop Analyzer, Keyword Analyzer, English Analyzer, and Fingerprint Analyzer. The string fields of all indexes define Analyzer according to this configuration.

4) Time Zone: Configure the time zone for the time fields that are synchronized to the Alibaba Cloud Elasticsearch instance. The default time zone is UTC+8.

After a synchronization task precheck is configured, DTS performs a precheck prior to synchronization. After the precheck is passed, click Start to start the synchronization task.

After the synchronization task is started, the synchronization task list appears. The task just that started is in the synchronization initialization state. The initialization time depends on the data volume of the objects to be synchronized in the source instance. After initialization is completed, the synchronization channel is in the synchronizing state, and the synchronization channel between the source and destination instances is established.

After verifying that the preceding tasks have been completed, log on to the Elasticsearch console and check whether the corresponding index has been created in your Elasticsearch instance and whether the synchronized data is as expected.

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.