How to Migrate JSON Data from MongoDB to MaxCompute

Prepare Data and Account

First, upload the data to your MongoDB database. This example uses Alibaba Cloud’s ApsaraDB for MongoDB. The network type is VPC because a public IP address is required for MongoDB to communicate with the default resource group of DataWorks. The test data is as follows:

{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}

Use DataWorks to Extract Data to MaxCompute

  1. Add a MongoDB data source
  2. In the DataWorks console, go to the Data Integration page and add a MongoDB data source.
  1. For specific parameters, see the following figure. Click Finish after the data source connectivity test is successful. In this example, the MongoDB network type is VPC. Therefore, set the Data Source Type to Has Public IP Address.
  1. To retrieve the endpoint and the port number, log on to the ApsaraDB for MongDB console and click an instance, as shown in the following figure.
  1. Create a data synchronization task
  2. In the DataWorks console, create a data synchronization node.
  1. Meanwhile, create a table named mqdata in DataWorks to store JSON data.
  1. You can set the table parameters on the graphic interface. In this example, the mqdata table has only one column named MQ data, whose data type is string.
  1. After creating the table, set the data synchronization task parameters on the graphic interface, as shown in the following figure. Set the target data source to odps_first and the target table to mqdata. Set the original data source to MongoDB and select mongodb_userlog. After completing the preceding configuration, click Switch to Script Mode.
  1. The following shows the example code in script mode:
  • { "type": "job", "steps": [ { "stepType": "mongodb", "parameter": { "datasource": "mongodb_userlog", //Indicates the data source name. "column": [ { "name": "store.bicycle.color", //Indicates the JSON field path. In this example, the value of color is extracted. "type": "document.document.string" //Indicates the number of fields in this line must be consistent with that in the preceding line (the name line). If the JSON field is a level 1 field, such as the "expensive" field in this example, enter "string" for this field. } ], "collectionName // Collection name": "userlog" }, "name": "Reader", "category": "reader" }, { "stepType": "odps", "parameter": { "partition": "", "isCompress": false, "truncate": true, "datasource": "odps_first", "column": [ //Indicates the table column name in MaxCompute, namely "mqdata". ], "emptyAsNull": false, "table": "mqdata" }, "name": "Writer", "category": "writer" } ], "version": "2.0", "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] }, "setting": { "errorLimit": { "record": "" }, "speed": { "concurrent": 2, "throttle": false, "dmu": 1 } } }
  1. After completing the preceding configuration, click Run. If the operation is successful, the following log is displayed.

Verify Results

Create an ODPS SQL node in your Business Flow.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com