Use DTS for Real-time Data Synchronization between ApsaraDB RDS for MySQL and Alibaba Cloud Elasticsearch

Image for post
Image for post

Released by ELK Geek

Data Transmission Service (DTS) synchronizes production data from an ApsaraDB RDS for MySQL instance to an Alibaba Cloud Elasticsearch instance in real-time after you create a real-time data synchronization task in the DTS console. This article focuses on the various supported real-time synchronization types and SQL operations. It further elaborates on the configuration procedure to support synchronization.

Supported Real-time Synchronization Types

Supported SQL Operations

  • Inset
  • Delete
  • Update

Note: Currently, DTS does not support Data Definition Language (DDL) synchronization and ignores DDL operations during synchronization.

In case a DDL operation of a table is encountered, the Data Manipulation Language (DML) operation of the table may fail. To rectify the failure, follow the steps below:

1) Delete the table from the synchronization object list. For more information, see “Remove an object from a data synchronization task.
2) Delete the index corresponding to this table in the Alibaba Cloud Elasticsearch instance.
3) Modify the synchronization task, re-add the table to the synchronization objects, and re-initialize the table. For more information, see “Add an object to a data synchronization task.”

If a DDL operation is used to modify a table and add a column, the recommended DDL operation sequence is as follows:

1) Modify the mapping of the corresponding table and add a new column in the Alibaba Cloud Elasticsearch instance.
2) Modify the table structure and add a column in the source ApsaraDB RDS for MySQL instance.
3) Pause and restart the DTS synchronization task to allow DTS to reload the modified mapping in the Alibaba Cloud Elasticsearch instance.

Configuration Procedure

1) Buy a Synchronization Channel

Note: Purchase a synchronization channel before configuring it. Either select the monthly subscription or pay-as-you-go billing mode for the synchronization channel.

Parameters on the Synchronization Channel Purchase Page

  • Feature: Select Data Synchronization.
  • Source Instance: Select MySQL.
  • Source Instance Region: Select the region where the ApsaraDB RDS for MySQL instance is located.
  • Destination Instance: Select Elasticsearch.
  • Destination Instance Region: Select the region where the Alibaba Cloud Elasticsearch instance is located. You cannot change the region after subscribing to the instance, so proceed with caution.
  • Synchronization Topology: Select One-Way Synchronization.
  • Network Type: The default value is Leased Line. Currently, only the leased line mode is supported.
  • Synchronization Channel Specifications: The synchronization channel specifications affect the synchronization performance of the channel. For more information about mapping between the synchronization channel specifications and performance, see Data Synchronization Specifications.
  • Subscription Duration: If the billing method is prepaid, the subscription duration is one month by default. Enable the automatic renewal feature to avoid inconvenience
  • Purchase Quantity: The default value is 1. Set this parameter based on your actual business requirements.

Note: The DTS console displays synchronization instances by region. The region where the purchased synchronization instance is located is its destination region. For example, the synchronization instance here is used to synchronize data from the Hangzhou ApsaraDB RDS for MySQL instance to the Hangzhou Alibaba Cloud Elasticsearch instance. Therefore, it is located in Hangzhou. Go to the synchronization instance list for Hangzhou, find the purchased synchronization instance, and click Configure Synchronization Task next to the instance to configure the instance.

2) Configure the Synchronization Channel

Image for post
Image for post
  • Synchronization Task Name: Although the synchronization task name does not have to be unique, we recommend using an informative name to help you identify and manage the synchronization channel.
  • Source Instance Details: This example uses an ApsaraDB RDS for MySQL instance as the data source. Configure the instance ID, database account, and database password.
Image for post
Image for post
  • Destination Instance Details: Configure the ID of the Alibaba Cloud Elasticsearch instance and the account and password used to access the Alibaba Cloud Elasticsearch instance.
Image for post
Image for post

After completing the preceding configurations, click Set Whitelist and Next to add whitelists for the ApsaraDB RDS for MySQL and Alibaba Cloud Elasticsearch instances.

3) Set a Whitelist for an Instance

If the source instance is an ApsaraDB RDS for MySQL instance, DTS adds its IP address to the security group of the RDS instance whitelist to prevent a synchronization task creation failure due to the DTS server’s failure to connect to the database. To guarantee the stability of a synchronization task, do not delete the server IP addresses from the security group of the RDS instance whitelist during synchronization. After the whitelist is configured, click Next to create a synchronization account.

4) Select the Objects to be Synchronized

1) Set Index Name to Table Name or DatabaseName_TableName.

  • If you select Table Name, the index name is the same as the table name.
  • If you select DatabaseName_TableName, the index is named in databasename_table name format. For example, if the database name is dbtest and the table name is sbtest1, the index name is dbtest_sbtest1 after the table is synchronized to the Alibaba Cloud Elasticsearch instance.
  • If tables with the same name exist in different databases to be synchronized, we recommend that you select DatabaseName_TableName for the Index Name.

2) Select the list of databases to be synchronized. The objects for real-time synchronization can be tables. This means you may select specific databases or tables to synchronize.

Image for post
Image for post

3) The docid is the primary key of all tables by default. If some tables do not have a primary key, configure a source table column corresponding to docid. In the Selected box on the right, move your cursor over the corresponding table and click Editcto enter the advanced settings page of the table.

Image for post
Image for post

4) On the advanced settings page, configure Index Name, Type Name, IsPartion, and _id value. If _id value is set to the primary key of the table, select the corresponding business primary key column.

After the objects to be synchronized are configured, proceed to the Advanced Settings step.

5) Advanced Settings

1) Synchronization Initialization: We recommend selecting Schema Initialization and Full Data Initialization so that DTS automatically creates indexes and initializes full data. If you do not select Schema Initialization, manually define the mapping for indexes in the Alibaba Cloud Elasticsearch instance before creating a synchronization task. If you do not select Full Data Initialization, DTS synchronizes incremental data starting from the time when synchronization is started.

2) Shard Configuration: Five shards and one replica are configured by default. Adjust the configuration according to specific business requirements. Once the configuration has been changed, all indexes will define shards according to the new configuration.

3) String Definition: Select a string analyzer whose default value is Standard Analyzer. Valid values include Standard Analyzer, Simple Analyzer, Whitespace Analyzer, Stop Analyzer, Keyword Analyzer, English Analyzer, and Fingerprint Analyzer. The string fields of all indexes define Analyzer according to this configuration.

Image for post
Image for post

4) Time Zone: Configure the time zone for the time fields that are synchronized to the Alibaba Cloud Elasticsearch instance. The default time zone is UTC+8.

After a synchronization task precheck is configured, DTS performs a precheck prior to synchronization. After the precheck is passed, click Start to start the synchronization task.

After the synchronization task is started, the synchronization task list appears. The task just that started is in the synchronization initialization state. The initialization time depends on the data volume of the objects to be synchronized in the source instance. After initialization is completed, the synchronization channel is in the synchronizing state, and the synchronization channel between the source and destination instances is established.

After verifying that the preceding tasks have been completed, log on to the Elasticsearch console and check whether the corresponding index has been created in your Elasticsearch instance and whether the synchronized data is as expected.

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store