Querying ApsaraDB for RDS with Data Lake Analytics
As a hub for querying data in Alibaba Cloud, Data Lake Analytics (DLA) has been upgraded to query data in ApsaraDB for RDS instances (using MySQL
, SQLServer
, and Postgres
). This tutorial tells you how to use DLA to query data in ApsaraDB for RDS instances. We use MySQL
in the examples and point out the differences in SQLServer
and Postgres
at the end of the tutorial.
Create a Database
The following describes the syntax used to create an external table whose underlying layer is mapped to MySQL
in DLA:
CREATE SCHEMA hello_mysql_vpc_rds WITH DBPROPERTIES (
CATALOG = 'mysql',
LOCATION = 'jdbc:mysql://rm-2zer0vg58mfofake.mysql.rds.aliyuncs.com:3306/dla_test',
USER = 'dla_test',
PASSWORD = 'the-fake-password',
VPC_ID = 'vpc-2zeij924vxd303kwifake',
INSTANCE_ID = 'rm-2zer0vg58mfo5fake'
);
Different from creating a common table, an external table requires two more attributes: VPC_ID
and INSTANCE_ID. VPC_ID
indicates the ID of the VPC instance where your ApsaraDB for RDS instance is located, as shown in the following figure.
INSTANCE_ID
indicates the ID of your ApsaraDB for RDS instance, which is displayed on the details page of your ApsaraDB for RDS instance.
These attributes allow DLA to access the data in your ApsaraDB for RDS instance, which is inaccessible by default because your databases are stored in your VPC instance, by using Alibaba Cloud VPC reserve access technology.
Permission statements: If you create a database using the preceding method, you agree that Alibaba Cloud uses the VPC reserve access technology to read or write data from or to your ApsaraDB for RDS instance.
You also have to add the IP address segment 100.104.0.0/16
to your ApsaraDB for RDS whitelist. This IP address segment is used for VPC reserve access, as shown in the following figure.
Create a Table
After creating a database, create the table person in your ApsaraDB for RDS instance for test:
create table person (
id int,
name varchar(1023),
age int
);
Insert the following test data:
insert into person
values (1, 'james', 10),
(2, 'bond', 20),
(3, 'jack', 30),
(4, 'lucy', 40);
You can create the corresponding mapping table in the DLA database:
create external table person (
id int,
name varchar(1023),
age int
) tblproperties (
table_mapping = "person"
);
After connecting the MySQL client to the DLA database, you can query data in the MySQL database:
mysql> select * from person;
+------+-------+------+
| id | name | age |
+------+-------+------+
| 1 | james | 10 |
| 2 | bond | 20 |
| 3 | jack | 30 |
| 4 | lucy | 40 |
+------+-------+------+
4 rows in set (0.35 sec)
ETL: Clean Data from OSS and Write it to ApsaraDB for RDS
DLA is not typically used to read data from an ApsaraDB for RDS instance for analysis because an ApsaraDB for RDS instance contains a limited amount of data and is not applicable for massive data analysis. Instead, DLA is often used to analyze massive data in OSS and Table Store instances and write back the result data to ApsaraDB for RDS instances for frontend business. This is easily implemented in DLA. The table person
is used as an example in the following statement to convert 10 records of customer
in oss_db
and insert them into the table hello_mysql_vps_rds.person
:
mysql> insert into hello_mysql_vpc_rds.person
-> select c_custkey, c_name, c_custkey + 20 from oss_db.customer limit 10;+------+
| rows |
+------+
| 10 |
+------+
1 row in set (4.57 sec)mysql> select * from person;
+------+--------------------+------+
| id | name | age |
+------+--------------------+------+
| 1 | james | 10 |
| 2 | bond | 20 |
| 3 | jack | 30 |
| 4 | lucy | 40 |
| 1 | Customer#000000001 | 21 |
| 3 | Customer#000000003 | 23 |
| 5 | Customer#000000005 | 25 |
| 2 | Customer#000000002 | 22 |
| 4 | Customer#000000004 | 24 |
| 7 | Customer#000000007 | 27 |
| 6 | Customer#000000006 | 26 |
| 9 | Customer#000000009 | 29 |
| 8 | Customer#000000008 | 28 |
| 10 | Customer#000000010 | 30 |
+------+--------------------+------+
14 rows in set (0.26 sec)
SQL Server and PostgreSQL
Usage of SQLServer
and PostgreSQL
is similar to that of MySQL. As database
and schema
in MySQL
are equivalent, they are at the same layer. However, database
and schema
in SQLServer
and PostgreSQL
are at two layers. Therefore, the methods for creating a database and a table are different:
Create a Database
SQL Server
CREATE SCHEMA `hello_sqlserver_vpc_rds` WITH DBPROPERTIES
(
CATALOG = 'sqlserver',
LOCATION = 'jdbc:sqlserver://rm-bp15g1r5jf90hfake.sqlserver.rds.aliyuncs.com:3433;DatabaseName=dla_test',
USER='dla_test1',
PASSWORD='this-is-not-a-real-password',
INSTANCE_ID = 'rm-bp15g1r5jf90fake',
VPC_ID = 'vpc-bp1adypqlcn535yrdfake'
);
- Set
CATALOG
tosqlserver
. - Specify the database name for
LOCATION
. Different from the database name forMySQL
, the database name for SQLServer is specified usingDatabaseName=dla_test
. This is defined by the SQLServer JDBC URL rather than DLA.
PostgreSQL
CREATE SCHEMA `hello_postgresql_vpc_rds` WITH DBPROPERTIES
(
CATALOG = 'postgresql',
LOCATION = 'jdbc:postgresql://rm-bp1oo49r6j3hvfake.pg.rds.aliyuncs.com:3433/dla_test',
USER='dla_test',
PASSWORD='this-is-not-a-real-password',
INSTANCE_ID = 'rm-bp1oo49r6j3hfake',
VPC_ID = 'vpc-bp1adypqlcn535yrfake'
);
The method for creating a database for PostgreSQL is similar to that for MySQL, except setting CATALOG
to postgresql
.
Create a Table
The difference for creating a table is in the table_mapping
field. table_mapping
of MySQL
only contains the table name (person
):
create external table person1 (
id int,
name varchar(1023),
age int
) tblproperties(
table_mapping = 'person'
);
While table_mapping
of SQLServer
or PostgreSQL
contains the names of the schema and table:
create external table person (
id int,
name varchar(1023),
age int
) tblproperties(
table_mapping = 'public.person'
);
That’s it! To learn more about Alibaba Cloud Data Lake Analytics at www.alibabacloud.com/products/data-lake-analytics
Reference:https://www.alibabacloud.com/blog/querying-apsaradb-for-rds-with-data-lake-analytics_594184