Performing Daily Incremental Upload from OSS to MaxCompute Using Data Integration

By Jonathan Peng, Staff Solutions Architect

Global businesses are facing increasing complexity and market volatility amid today’s fierce competition. In response to this, all business functions are turning to data-driven strategies as a means to manage this increasing uncertainty. A data-driven approach also helps organizations better understand their customer bases and allows them to grow their businesses. Growth in digital technologies has given organizations the ability to analyze more data, even in real time. This in turn has generated more and more data to help fuel enterprises’ needs.

However with this increase, there needs to be an effective way of storing large amounts of data. Nowadays, most organizations would use cloud solutions, such as Alibaba Cloud’s Object Storage Service (OSS), as a data storage, data lake, and for data backups. In some cases, an organization may put all their Internet of Things (IoT) data into a file format and store it in the cloud for backup, as well as using it for historical data analysis. So, how can we devise a solution to import data from OSS into MaxCompute on a daily basis in an easy way?

Incremental Synchronization of OSS Data

This scenario allows you to partition easily based on the data generation pattern because the data remains unchanged after being generated. Typically, you can partition by date, such as creating one partition on a daily basis.

Generate the data with the name “IOTDataSet”+”date”.csv for each date and upload it to OSS bucket. Here we have created a sample file named “IOTDataSet20180824.csv” and uploaded it to OSS. The format of the date for your data should be in yyyymmddhhmmss, which specifies the scheduled time (Year Month Date Hour Minute Second) for the routinely scheduled instance by Data Integration.

Walkthrough

Upload IOTDataSet20180824.csv to OSS as below.

Image for post
Image for post

Then, open the DataWorks console and navigate to Data Source. Detailed steps are described here: https://www.alibabacloud.com/help/doc-detail/47762.htm

Add data source in Data Integration.

Image for post
Image for post

Create a table “IOTDataSet” in DataWorks for the data.

Image for post
Image for post

Configure a task to synchronize the data and the object name should be set as shown in the image below.

Image for post
Image for post

Map Fields in the Same Line

Image for post
Image for post

And set the controls for data sync process.

Image for post
Image for post

Run the task and use the date as the file’s name.

Image for post
Image for post

That’s it. You should see an output similar to the image below.

Image for post
Image for post

And you can now query the data from MaxCompute’s table.

Image for post
Image for post

As our last step, set the schedule for the synchronize task and put the Recurrence as daily.

Image for post
Image for post

As the schedule task is running, now we can synchronize data from OSS to MaxCompute on a daily basis.

Reference:https://www.alibabacloud.com/blog/performing-daily-incremental-upload-from-oss-to-maxcompute-using-data-integration_594490?spm=a2c65.12601858.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store