Obtaining SQL Query Result in Pages Using MaxCompute

By Sheng Yuan

Unlike databases, MaxCompute SQL does not have any built-in paged query logic such as select * from table limit x offset y. However, many users hope to obtain query results in pages or batches in some scenarios by using the paged query logic of databases. This article describes several methods to implement this need.

1. Use the row_number() function as a unique and incrementing ID for filter-based query.

select * from (select row_number() over() as row_id,* from orders_delta)t where row_id between 10 and 20;
Image for post
Image for post

Sort the data and attach a unique ID to each data record by using the row_number() function. Then specify the pagination range for each query based on this ID.

2. Download the query result in batches by using the InstanceTunnel of Java SDK.

MaxCompute JavaSDK supports directly exporting the SELECT query result by using SQLTask and InstanceTunnel in combination. For more information, see Tunnel Commands.

In this case, you can download the results of your custom SELECT query by using InstanceTunnel.

Odps odps = OdpsUtils.newDefaultOdps(); // Initialize the MaxCompute object
Instance i = SQLTask.run(odps, "select * from wc_in;");
i.waitForSuccess();

// Create InstanceTunnel
InstanceTunnel tunnel = new InstanceTunnel(odps);
// Create DownloadSession based on the instance ID
InstanceTunnel.DownloadSession session = tunnel.createDownloadSession(odps.getDefaultProject(), i.getId());

long count = session.getRecordCount();
// The count of outputs results
System.out.println(count);

// The code for obtaining the data is the same as in Table Tunnel.
TunnelRecordReader reader = session. openRecordReader (0, Count );
Record record;
While(Record = reader.Read())! = null) {
For(IntCol =0; Col <session.GetSchema().GetColumns().Size(); ++ Col ){
// The fields in the wc_in table are all strings, and the output is printed directly here.
System. out.Println(Record.Get(Col ));
}
}
Reader.Close();

Here, a custom select query is submitted through SQLTask, and the query result is directly downloaded using InstanceTunnel.DownloadSession. Within this, the openRecordReader method supports specifying the start position and number of reads for this read record, and the parameter settings of openRecordReader (start, long) can be used to implement the logic for batch download.

Image for post
Image for post

For example, you can change openRecordReader (0, count) in the preceding example to get the start position and number of records you wish to obtain:

TunnelRecordReader reader = downloadSession. openRecordReader (100,20);

References: MaxCompute SDK Java Doc InstanceTunnel. DownloadSession class openRecordReader method.

Because the data volume of tables processed by MaxCompute is usually very large, the preceding method is generally not recommended for use in interactive paging query scenarios for reports. For users who need to query while interacting, consider using MaxCompute’s interactive analysis service (Lightning) to make such queries.

3. Use the limit/offset syntax of MaxCompute’s interactive analysis tool (Lightning) to implement paging

MaxCompute SQL does not support limit/offset syntax, but developers can draw on MaxCompute’sinteractive analysis tool (Lightning) to use limit/offset.

MaxCompute’s interactive analysis tool can quickly query MaxCompute data using the PostgreSQL protocol and syntax under the same permission system. PostgreSQL’s limit offset syntax can achieve the same paging effect as database queries.

Reference:https://www.alibabacloud.com/blog/obtaining-sql-query-result-in-pages-using-maxcompute_594883?spm=a2c41.12950750.0.0

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store