Use Kettle to Import Data to AnalyticDB for PostgreSQL

By Lu Feng.

The tool Kettle, which is more formally known as Pentaho Data Integration (or PDI for short), is a very popular open-source ETL tool that is mainly used for data integration, transformation, and migration operations. Kettle supports a variety of relational databases and NoSQL data sources such as HBase and MongoDB, as well as other small data sources such as Excel and Access. Overall, thanks to plugin extensions, Kettle can support a large variety of data sources.

The following figure shows the relationship between Kettle and Alibaba Cloud’s AnalyticDB for PostgreSQL. Essentially, after data goes through ETL and other data integration operations through Kettle, these data sources can then interact with AnalyticDB for PostgreSQL:

Image for post
Image for post

Kettle supports a variety of data sources, which can be divided into the following major types: table inputs, text file inputs, generate records and customize constant inputs, get system info inputs, JSON file inpus, and various other format files and other input types not mentioned here.

You can learn more about input types in Core Objects > Input in the console.

Table inputs supported by Kettle are obtained from database connections with SQL statements. The Database connection supports many connection methods, including: Native Java Database Connectivity (JDBC), Open Database Connectivity (ODBC), Oracle Call Interface (OCI), and Java Naming and Directory Interface (JNDI) connections.

With these connection methods available, Kettle can connect to the vast majority of the mainstream databases, including Oracle, SQL Server, MySQL, DB2, PostgreSQL, Sybase, and Teradata database. You can read this document for more information about database connections.

Importing Data to AnalyticDB for PostgreSQL

Import methodDescriptionTable output
(INSERT)Use JDBC to import data
The batch insert method using JDBC is supportedBulk load
(COPY)Use COPY to import data
For large tables, COPY performance is about 10 times higher than batch insert

When the table output (INSERT) method is used to import data, data is distributed to the corresponding segment nodes after data flows through the master node and the parsing is performed. This method is relatively slow and therefore best for importing small amounts of data. The bulk load (COPY) method, however, given its better performance, is suitable for importing large amounts of data at one time.

Following this, in the rest of this blog, we will be looking at how to migrate external data to AnalyticDB for PostgreSQL by using these two methods.

Prerequisites

  • Install Kettle on your local host.
  • Create a target database, schema, and table in AnalyticDB for PostgreSQL.

1. Import Data using the Table Output Method

1. Create a new transformation in Kettle.

2. In the transformation, create a MySQL database as the output source with the following parameter configurations (see in the figure below). Do not check use Result Streaming Cursor when configuring parameters.

Image for post
Image for post
Image for post
Image for post

3. After configuring the preceding parameters, click Test to test the connectivity. Then, click OK after a successful test.

4. In the transformation, create a new Greenplum database as the input source, with the following parameter configurations.

Image for post
Image for post
Image for post
Image for post

5. After configuring the preceding parameters, click Test to test the connectivity. Then, click OK after a successful test.

6. In Kettle, navigate to Table input in the left-side pane under the Core Objects tab and drag it into the workspace.

Image for post
Image for post

7. Double-click Table Input in the workspace area and make parameter configurations in the Table Input dialog box.

Image for post
Image for post

8. Navigate to Table output under Output in the left-side under the Core Objects tab and drag it into the workspace.

Image for post
Image for post

9. Double click Table output in the workspace area and make parameter configurations in the Table output dialog box.

Image for post
Image for post

10. Create a connection line from table input to table output.

Image for post
Image for post

11. Click the white arrow to run the transformation and check the operation log and operation status.

After MySQL data is imported into AnalyticDB for PostgreSQL, you can use AnalyticDB for PostgreSQL for data analysis.

2. Import Data Using the Bulk Load Method

1. Create a new transformation in Kettle.

2. In the transformation, create a new text file as the output source.

Image for post
Image for post

3. Double-click the text file input icon and select the target input text file.

Image for post
Image for post

4. In the Content tab, configure a delimiter for the input file.

Image for post
Image for post

5. In the Fields tab, define fields in the input file.

Image for post
Image for post

6. In the transformation, create a new Greenplum database as the input source, with the following parameter configurations.

Image for post
Image for post

7. After configuring the preceding parameters, click Test to test the connectivity. Click OK after a successful test.

8. Navigate to PostgreSQL bulk load under Bulk load in the left-side Core Objects section.

Image for post
Image for post

9. Create a new connection line from file input to PostgreSQL bulk load.

Image for post
Image for post

10. Double click the PostgreSQL bulk load icon in the workspace and make parameter configurations in the Bulk load workspace:

Image for post
Image for post

11. Click the white arrow to run the transformation and check the operation log and operation status.

After data is imported into AnalyticDB for PostgreSQL, you can use AnalyticDB for PostgreSQL for data analysis.

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store