It’s difficult to perform data migration with Apache Cassandra, especially for seamless migration without downtime. Today, the community recommends applying the COPY command, sstableloader, and other methods to migrate data between Cassandra databases. However, none of the methods above are qualified for efficient migration, and each has its disadvantages. For example, the COPY command method initiates a multi-thread range scan on the cqlsh side, reads, and converts the data into CSV text. Then, it writes the data to Cassandra in batches. This cannot meet the requirements in cases of large data volume. For sstablecoader, duplicated migration of one copy of data from the source database occurs in the multi-copy keyspace of the target database. The time and space for data migration are wasted.
BDS (BDS) is a proprietary NoSQL data migration service developed by the Alibaba Cloud NoSQL Team. For many years, BDS has been tested and practiced by Alibaba Group and provided services for public cloud users on Alibaba Cloud. Now, it supports Cassandra for data migration. Based on BDS can meet the demands of massive Cassandra data migration, high-performance migration, and very low migration costs of service interruption.
How It Works
The primary objectives of BDS migration are stability, high performance, less migration time, and lowering business impact. According to analysis, the best way to minimize migration time is to perform file-level cloning. The relevant data range is pre-assigned for Cassandra cluster nodes. If direct data migration for each range between the source side and the target side can be realized, it’s possible to migrate at maximum speed.
As shown in Figure 1, the left source cluster consists of 4 nodes (W/X/Y/Z) that are assigned relevant data range in advance. Each node is given a token assigned specific data range: (10–40], (40-max], and (min-40].
A corresponding associated target cluster is required before BDS migration, such as the right four-node cluster in Figure 1 in this case. During the migration, BDS will perform the following steps:
- The topology status of the source cluster is played back by the target cluster. In this case, W’ is responsible for 10, X’ for 20, Y’ for 30, and Z’ for 40. Therefore, the final mapped ranges of the target cluster are (10–40], (40-max], and (min-40].
- BDS copies the sstable files of each node in the source cluster to the peer nodes in the target cluster. In this case, the paths are W->W’, X->X’, Y->Y’, and Z->Z’.
- BDS also copies files of new data, mainly for newly generated incremental files under the “backups” directory. These files are generated after the incremental backup of the source Cassandra cluster is initiated.
- All copied files are refreshed directly on the target side, and meta-information of sstable files are loaded in the memory of the target cluster.
The preceding method ensures that full data and incremental data can be migrated quickly and completely. According to testing and analysis, the migration speed of this BDS migration solution is very fast, which almost equals the remote file copying between cluster nodes. For example, there are three nodes, and each contains 1 TB of data. In the network environment with a migration bandwidth of 150 MB/s, the data migration takes only 2 hours, which is much faster than other solutions.
1. Purchase the BDS service on this page.
2. Users prepare the target environment, such as purchasing the cloud Cassandra service or building their own Cassandra service.
3. Initiate the sftp service for the source Cassandra cluster migration. You can search for specific steps online. For incremental migration, all nodes in the source cluster need to enable incremental backup through nodetool.
4. Configure the endpoint addresses of the source and target clusters in BDS. The detailed configuration process is listed below:
4.1 . Configure the whitelists of all IP addresses of the source and target clusters for the purchased BDS service. If users choose the cloud Cassandra service, they need to create a whitelist on the source cluster and add the corresponding IP address of BDS.
4.2. On the BDS page, click Basic Information :arrow_right: UI Access :arrow_right: Engine Software UI :arrow_right: BDS. Then, click “Advanced Access” and enter the account password. If users forget the account password, they can click “Reset UI Access Password” on the UI access page to reset the password.
4.3. Add a data source on the new page of data source management. It’s required to add the data sources of the source and target clusters. Users can enter an identified cluster name in the blank of “Cluster Name” and select “Cassandra3X” in “Data Source Type.”
The reference template of ”Data Source Parameters” is listed below:
"cassandraPassword":"Password to access Cassandra cluster",
"cassandraUser":"Account to access Cassandra cluster",
"confDir":"Cassandra profile directory",
"Cassandra data directory"
"ip":"The IP address of the Cassandra cluster. If there are multiple IP addresses, arrange them in the following way."
"ip":"The second cluster IP, and so on"
"ip":"The third cluster IP"
"nodetoolCmd":"The directory address of Cassandra nodetool, such as xxx/bin/nodetool",
"sshPassword":"Password accessible to ssh",
"sshUser":"Account accessible to ssh",
// The following two lines should be configured in the data source template of target cluster. Absolute paths to start and stop Cassandra commands are required. Remove this note in the actual data source profile.
"startCmd":"su cassandra -l -c 'for starting Cassandra command'",
"stopCmd":"su cassandra -l -c 'for stopping Cassandra command'"
The preceding configuration must be performed for data sources on the source and target clusters. The configuration template is listed above.
5. Click “One-Click Migration” under “Cassandra Migration.” Configure the corresponding tables to be migrated after the configured data sources on the source cluster and the target cluster. Once it’s completed, click “Migration Service.”
Recently, Alibaba Cloud launched Lindorm, a cloud-native multi-mode database. For more information about Lindorm, check out the following articles:
- How Can Alibaba’s Newest Databases Support 700 Million Requests a Second?
- Lindorm: Alibaba Cloud’s Newest Cloud-Native Multi-Model Database