How to Sync Up Data from MaxCompute to Greenplum with DataWorks

3 min readMar 14, 2019

By Jeffrey Gao, Solutions Architect

Alibaba Cloud DataWorks is the Big Data platform product launched by Alibaba Cloud, with the capabilities of one-stop Big Data development, data permission management, offline job scheduling, data integration (including data sync) and other features.

Today, we will demo how to use the data sync feature of DataWorks, to synchronize data, from MaxCompute, the most advanced big data platform of Alibaba Cloud, to Greenplum, one of the popular MPP database.

DataWorks supports multiple data source types to do synchronization. For more information, please refer to https://www.alibabacloud.com/help/doc-detail/53008.htm?spm=a2c41.12636932.0.0.750f6569pEjP1m

About Greenplum

Greenplum database is an open-source massively parallel data platform. It’s based on PostgreSQL and equipped with the analytical tools necessary to draw additional insights from your data. Greenplum’s massive parallel processing architecture provides automatic parallelization of all data and queries in a scale-out, shared nothing architecture.

Synching MaxCompute to Greenplum with DataWorks

When the Greenplum instance is ready, we can use pgAdmin tool to login to manage the data. Before data synchronization, the table is empty.

We need to provision the data source properties, including source and destination. Since Greenplum is based on PostgreSQL, we can put it as PostgreSQL data source.