Migrating Data from Amazon Redshift to MaxCompute

By Yi Xiu

To migrate data from Amazon Redshift to Alibaba Cloud MaxCompute, the data usually needs to be unloaded to S3 first and then to Alibaba Cloud Object Storage Service (OSS). MaxCompute then directly reads data from OSS by using external tables.

The following figure describes the overall migration process from Redshift to MaxCompute:

Prerequisites

Download the Amazon Redshift JDBC Driver at https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.16.1027/RedshiftJDBC42-1.2.16.1027.jar (version 4.2 is recommended).

Create a new driver in SQL Workbench/J, select the driver’s downloaded jar file and enter “com.amazon.redshift.jdbc42. Driver” in the Classname text field.

Configure a new connection, select the driver that you just created, copy JDBC URL, database username, and password, and check Autocommit.

If a connection times out during configuration, you must configure a security policy in the ECS VPC security group. Visit https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-authorize-cluster-access.html for details.

Preview Data in Amazon Redshift

Method 1: Preview Data in the Query Editor for AWS

Method 2: Preview Data by Using Workbench/J

The following screenshots show the pages where you can download the JDBC driver and view the JDBC URL.

Unload Data to Amazon S3

Create an IAM Role to Allow an Amazon Redshift Cluster to Access the S3 Service

Select the Redshift service and Redshift — Customizable

Search for S3 policies, check AmazonS3FullAccess and click Next.

Name the role as redshiftunload.

Open the defined role and copy the role ARN. (You need to use the unload command.)

Go to the Redshift cluster and click Manage IAM Roles.

Select the redshiftunload role that you defined and apply the changes.

Exporting Data

Export Data Using the Pipe Separator

unload ('select * from venue')
to 's3://aws2oss/venue_'
iam_role '<the ARN of the redshiftunload role that you created>';
--parallel off; --Unload data continuously. One file is unloaded each time. The maximum size of each file is 6.2 GB.

The following figure shows the execution process.

In the corresponding Amazon S3 bucket, you can see two files with the venue_ prefix. You can open the folder and check the data.

The data is separated by the pipe separator (|), as shown in the following figure.

Export Data Using the Tab Delimiter

unload ('select * from venue') 
to 's3://aws2oss/venue_'
iam_role '<the ARN of the redshiftunload role that you created>'
delimiter as '\t';

Open the file and you can see that the data file is as follows:

— — To make it easier for MaxCompute to read data, we use the comma (,) delimiter. — `sql

unload ('select * from venue') 
to 's3://aws2oss/venue_'
iam_role '<the ARN of the redshiftunload role that you created>'
delimiter as ','
NULL AS '0';

For more information about the unload command, see: https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

Seamlessly Migrating Data from Amazon S3 to OSS

OSS is compatible with the S3 API and allows you to seamlessly migrate your data from AWS S3 to Alibaba Cloud OSS. After data is migrated to AWS S3, you can still access OSS using the S3 API. For more information, see our S3 migration tutorial.

Background

By default, online migration does not support cross-border data migration. If you want cross-border migration, open a ticket to apply for the permission to configure the task.

Preparations

Preparations in Amazon S3

  • Estimate the amount of data to be migrated and go to the console to check the storage capacity and the number of files in S3.
  • Create a migration key, create a user in the AWS IAM page and grant that user the AmazonS3ReadOnlyAccess permission.
  • Add the user > Add the access type (Programmatic access, AK information) > Grant the AmazonS3ReadOnlyAccess permission > Record AK information.

Step 1: Go to IAM and select Add User.

Step 2: Add a user and check Create AK.

Step 3: Select Directly Attach an Existing Policy and grant AmazonS3ReadOnlyAccess.

Step 4: Record the AK information, which will be used for data migration later.

Preparations in Alibaba Cloud OSS

Create a RAM subaccount and grant the OSS bucket Read+Write permission and the online migration management permission.

Migration

Go to the Alibaba Cloud online data migration console: https://mgw.console.aliyun.com/?spm=a2c4g.11186623.2.11.10fe1e02iYSAhv#/job?_k=6w2hbo and log on with the subaccount that you created in the “Preparations in Alibaba Cloud OSS” section.

Go to Data Migration Service > Data Location > Data Type > Other

Create Source Location

For more information about the configuration, visit https://www.alibabacloud.com/help/doc-detail/95159.htm

Create Target Location

For more information about the configuration, visit https://www.alibabacloud.com/help/doc-detail/95159.htm

Create a Migration Task

The data in OSS is as follows:

Directly Loading OSS Data into MaxCompute

Authorization

1. When the owner of MaxCompute and OSS is the same account, log on with the Alibaba Cloud account and click here to finish the authorization in just one click.

2. If the MaxCompute account and the OSS account are not the same, you need to finish the authorization by logging on to with the OSS account. For more information,see this document.

Creating an External Table

DDL statements for creating an external table in MaxCompute:

CREATE EXTERNAL TABLE IF NOT EXISTS venue_external
(
VENUEID bigint,
VENUENAME string,
VENUECITY string,
VENUESTATE string,
VENUESEATS bigint
)
STORED BY 'com.aliyun.odps.CsvStorageHandler' -- (1)
WITH SERDEPROPERTIES (
'odps.properties.rolearn'='acs:ram::*****:role/aliyunodpsdefaultrole'
) -- (2)
LOCATION 'oss://oss-cn-shanghai-internal.aliyuncs.com/redshift2odps/s3/'; -- (3)(4)

com.aliyun.odps.CsvStorageHandler is a built-in StorageHandler for processing CSV files. It defines how to read and write CSV files. You only need to specify this name. The relevant logic is implemented by the system. If you define a different delimiter when unloading data into S3, MaxCompute also supports the Handler for that customized delimiter. For more information, visit https://help.aliyun.com/document_detail/45389.html

Results can be directly returned: select * from venue_external limit 10;

The following figure shows the results in DataWorks.

Creating an Internal Table to Persist Data

Reference:https://www.alibabacloud.com/blog/migrating-data-from-amazon-redshift-to-maxcompute_594786?spm=a2c41.12860625.0.0

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store