Performing Daily Incremental Upload from OSS to MaxCompute Using Data Integration
By Jonathan Peng, Staff Solutions Architect
Global businesses are facing increasing complexity and market volatility amid today’s fierce competition. In response to this, all business functions are turning to data-driven strategies as a means to manage this increasing uncertainty. A data-driven approach also helps organizations better understand their customer bases and allows them to grow their businesses. Growth in digital technologies has given organizations the ability to analyze more data, even in real time. This in turn has generated more and more data to help fuel enterprises’ needs.
However with this increase, there needs to be an effective way of storing large amounts of data. Nowadays, most organizations would use cloud solutions, such as Alibaba Cloud’s Object Storage Service (OSS), as a data storage, data lake, and for data backups. In some cases, an organization may put all their Internet of Things (IoT) data into a file format and store it in the cloud for backup, as well as using it for historical data analysis. So, how can we devise a solution to import data from OSS into MaxCompute on a daily basis in an easy way?
Incremental Synchronization of OSS Data
This scenario allows you to partition easily based on the data generation pattern because the data remains unchanged after being generated. Typically, you can partition by date, such as creating one partition on a daily basis.
Generate the data with the name “IOTDataSet”+”date”.csv for each date and upload it to OSS bucket. Here we have created a sample file named “IOTDataSet20180824.csv” and uploaded it to OSS. The format of the date for your data should be in yyyymmddhhmmss, which specifies the scheduled time (Year Month Date Hour Minute Second) for the routinely scheduled instance by Data Integration.
Upload IOTDataSet20180824.csv to OSS as below.
Then, open the DataWorks console and navigate to Data Source. Detailed steps are described here: https://www.alibabacloud.com/help/doc-detail/47762.htm
Add data source in Data Integration.
Create a table “IOTDataSet” in DataWorks for the data.
Configure a task to synchronize the data and the object name should be set as shown in the image below.
Map Fields in the Same Line
And set the controls for data sync process.
Run the task and use the date as the file’s name.
That’s it. You should see an output similar to the image below.
And you can now query the data from MaxCompute’s table.
As our last step, set the schedule for the synchronize task and put the Recurrence as daily.
As the schedule task is running, now we can synchronize data from OSS to MaxCompute on a daily basis.