How to Migrate JSON Data from MongoDB to MaxCompute

In this article, we will learn how to use the Data Integration function on Alibaba Cloud DataWorks console to extract JSON fields from MongoDB to MaxCompute.

Prepare Data and Account

First, upload the data to your MongoDB database. This example uses Alibaba Cloud’s ApsaraDB for MongoDB. The network type is VPC because a public IP address is required for MongoDB to communicate with the default resource group of DataWorks. The test data is as follows:

Log on to the DMS for MongoDB console. In this example, the database is admin and the collection is userlog. You can run the db.userlog.find().limit(10) command in the Query Window to view the uploaded data, as shown in the following figure.

Image for post
Image for post

Create a user in the database in advance to add data sources in DataWorks. In this example, run the db.createUser({user:"bookuser",pwd:"123456",roles:["root"]})command to create a user named bookuser. The password of the user is 123456, and the permission is root.

Use DataWorks to Extract Data to MaxCompute

  1. Add a MongoDB data source
  2. In the DataWorks console, go to the Data Integration page and add a MongoDB data source.
Image for post
Image for post
  1. For specific parameters, see the following figure. Click Finish after the data source connectivity test is successful. In this example, the MongoDB network type is VPC. Therefore, set the Data Source Type to Has Public IP Address.
Image for post
Image for post
  1. To retrieve the endpoint and the port number, log on to the ApsaraDB for MongDB console and click an instance, as shown in the following figure.
Image for post
Image for post
  1. Create a data synchronization task
  2. In the DataWorks console, create a data synchronization node.
Image for post
Image for post
  1. Meanwhile, create a table named mqdata in DataWorks to store JSON data.
Image for post
Image for post
  1. You can set the table parameters on the graphic interface. In this example, the mqdata table has only one column named MQ data, whose data type is string.
Image for post
Image for post
  1. After creating the table, set the data synchronization task parameters on the graphic interface, as shown in the following figure. Set the target data source to odps_first and the target table to mqdata. Set the original data source to MongoDB and select mongodb_userlog. After completing the preceding configuration, click Switch to Script Mode.
Image for post
Image for post
  1. The following shows the example code in script mode:
  • { "type": "job", "steps": [ { "stepType": "mongodb", "parameter": { "datasource": "mongodb_userlog", //Indicates the data source name. "column": [ { "name": "store.bicycle.color", //Indicates the JSON field path. In this example, the value of color is extracted. "type": "document.document.string" //Indicates the number of fields in this line must be consistent with that in the preceding line (the name line). If the JSON field is a level 1 field, such as the "expensive" field in this example, enter "string" for this field. } ], "collectionName // Collection name": "userlog" }, "name": "Reader", "category": "reader" }, { "stepType": "odps", "parameter": { "partition": "", "isCompress": false, "truncate": true, "datasource": "odps_first", "column": [ //Indicates the table column name in MaxCompute, namely "mqdata". ], "emptyAsNull": false, "table": "mqdata" }, "name": "Writer", "category": "writer" } ], "version": "2.0", "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] }, "setting": { "errorLimit": { "record": "" }, "speed": { "concurrent": 2, "throttle": false, "dmu": 1 } } }
  1. After completing the preceding configuration, click Run. If the operation is successful, the following log is displayed.
Image for post
Image for post

Verify Results

Create an ODPS SQL node in your Business Flow.

Image for post
Image for post

Enter the SELECT*from mqdata; statement to view the data in the mqdata table. Alternatively, you can run this command on the MaxCompute client for the same purpose.

Image for post
Image for post

To learn more about Data Migration on Alibaba Cloud MaxCompute, visit https://www.alibabacloud.com/help/doc-detail/98009.html

Reference:https://www.alibabacloud.com/blog/how-to-migrate-json-data-from-mongodb-to-maxcompute_594343?spm=a2c41.12498753.0.0

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store