By Jeffrey Gao, Solutions Architect
Alibaba Cloud DataWorks is the Big Data platform product launched by Alibaba Cloud, with the capabilities of one-stop Big Data development, data permission management, offline job scheduling, data integration (including data sync) and other features.
Today, we will demo how to use the data sync feature of DataWorks, to synchronize data, from MaxCompute, the most advanced big data platform of Alibaba Cloud, to Greenplum, one of the popular MPP database.
DataWorks supports multiple data source types to do synchronization. For more information, please refer to https://www.alibabacloud.com/help/doc-detail/53008.htm?spm=a2c41.126369220.127.116.110f6569pEjP1m
Greenplum database is an open-source massively parallel data platform. It’s based on PostgreSQL and equipped with the analytical tools necessary to draw additional insights from your data. Greenplum’s massive parallel processing architecture provides automatic parallelization of all data and queries in a scale-out, shared nothing architecture.
Synching MaxCompute to Greenplum with DataWorks
- When the Greenplum instance is ready, we can use pgAdmin tool to login to manage the data. Before data synchronization, the table is empty.
We need to provision the data source properties, including source and destination. Since Greenplum is based on PostgreSQL, we can put it as PostgreSQL data source.
Then we set up a data sync task.
In data sync provisioning, we can provision the data source and destination, including the corresponding tables.
Then provision the mappings of fields and types between the source and destination.
When provision is done, we can execute the task and check the Runtime Log on the data synchronization status.
We can also login the Greenplum instance to check if data is already synchronized.
Furthermore, if we need this task be automatically executed periodically, we can provision the scheduling mode in the tab of Schedule.