How to Sync Up Data from MaxCompute to Greenplum with DataWorks

By Jeffrey Gao, Solutions Architect

Alibaba Cloud DataWorks is the Big Data platform product launched by Alibaba Cloud, with the capabilities of one-stop Big Data development, data permission management, offline job scheduling, data integration (including data sync) and other features.

Today, we will demo how to use the data sync feature of DataWorks, to synchronize data, from MaxCompute, the most advanced big data platform of Alibaba Cloud, to Greenplum, one of the popular MPP database.

DataWorks supports multiple data source types to do synchronization. For more information, please refer to https://www.alibabacloud.com/help/doc-detail/53008.htm?spm=a2c41.12636932.0.0.750f6569pEjP1m

About Greenplum

Greenplum database is an open-source massively parallel data platform. It’s based on PostgreSQL and equipped with the analytical tools necessary to draw additional insights from your data. Greenplum’s massive parallel processing architecture provides automatic parallelization of all data and queries in a scale-out, shared nothing architecture.

Synching MaxCompute to Greenplum with DataWorks

  1. When the Greenplum instance is ready, we can use pgAdmin tool to login to manage the data. Before data synchronization, the table is empty.
Image for post
Image for post

We need to provision the data source properties, including source and destination. Since Greenplum is based on PostgreSQL, we can put it as PostgreSQL data source.

Image for post
Image for post

Then we set up a data sync task.

Image for post
Image for post

In data sync provisioning, we can provision the data source and destination, including the corresponding tables.

Image for post
Image for post

Then provision the mappings of fields and types between the source and destination.

Image for post
Image for post

When provision is done, we can execute the task and check the Runtime Log on the data synchronization status.

Image for post
Image for post

We can also login the Greenplum instance to check if data is already synchronized.

Furthermore, if we need this task be automatically executed periodically, we can provision the scheduling mode in the tab of Schedule.

Image for post
Image for post

Reference:https://www.alibabacloud.com/blog/how-to-sync-up-data-from-maxcompute-to-greenplum-with-dataworks_594549?spm=a2c41.12636932.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store