MaxCompute Tunnel Offline Batch Data Channel FAQs

Best Practices for SDK Upload

Refer to the following code when using SDK for Tunnel uploads.

import java.util.Date;
import com.aliyun.odps.Column;
import com.aliyun.odps.Odps;
import com.aliyun.odps.PartitionSpec;
import com.aliyun.odps.TableSchema;
import com.aliyun.odps.account.Account;
import com.aliyun.odps.account.AliyunAccount;
import com.aliyun.odps.tunnel.TableTunnel;
import com.aliyun.odps.tunnel.TunnelException;
import com.aliyun.odps.tunnel.TableTunnel.UploadSession;
public class UploadSample {
private static String accessId = "<your access id>";
private static String accessKey = "<your access Key>";
private static String odpsUrl = "";
private static String project = "<your project>";
private static String table = "<your table name>";
private static String partition = "<your partition spec>";
public static void main(String args[]) {
// Measure twice, cut once
Account account = new AliyunAccount(accessId, accessKey);
Odps odps = new Odps(account);
TableTunnel tunnel = new TableTunnel(odps);
try {
// Determine the partition to write to
PartitionSpec partitionSpec = new PartitionSpec(partition);
// Create a session valid for 24 hours on the partition of this table at the server. The session can upload a total of 20,000 blocks of data within 24 hours.
// When creating a session, it only takes seconds, but some resources need to be used on the server and temporary directories need to be created, which makes the operation onerous. Therefore, it is strongly recommended to reuse a session to upload as much as possible for the same partition data.
UploadSession uploadSession = tunnel.createUploadSession(project,
table, partitionSpec);
System.out.println("Session Status is : "
+ uploadSession.getStatus().toString());
TableSchema schema = uploadSession.getSchema();
// After the data is ready, open the Writer to start writing data and write a block. Each block can only be uploaded successfully once, and cannot be uploaded repeatedly. The success of CloseWriter indicates the Block upload has completed, otherwise the block can be uploaded again. A maximum of 20,000 BlockId, that is, 0-19999, are allowed in the same session. If exceeding this number, please commit the session and create a new session for use, and so on.
// When the data written to a block is too small, the system will produce a large number of small files, seriously degrading computing performance. We strongly recommend over 64 MB of data be written each time (up to 100 GB of data can be written to the same block).
// You can estimate the total value according to the average data volume and record count. For example: 64 MB < Average data size × Record count < 100 GB
// maxBlockID server is limited to 20,000. Users can use a certain number of blocks, such as 100, per session according to their own business needs, but it is recommended that the more blocks they use in each session, the better, because creating a session is a onerous operation.
// If only a small amount of data is uploaded after a session is created, it will not only cause problems such as small files and empty directories, but also seriously affect the overall performance of the upload (it takes seconds to create a session, and it may only take a few dozen milliseconds to actually upload)
int maxBlockID = 20000;
for (int blockId = 0; blockId < maxBlockID; blockId++) {
// Prepare at least 64MB of data before writing
// For example: read several files or read data from a database
try {
// Create a Writer on the Block. If no more than 4 KB of data is written for 2 consecutive minutes at any time after the Writer is created, the connection is disconnected with a timeout
// Therefore, it is recommended to prepare data that can be written directly in memory before creating a Writer
RecordWriter recordWriter = uploadSession.openRecordWriter(blockId);
// Convert all data read into Tunnel Record format and add
int recordNumber = 1000000;
for (int index = 0; i < recordNumber; i++) {
// Convert the raw data of the “index” into an odps record
Record record = uploadSession.newRecord();
for (int i = 0; i < schema.getColumns().size(); i++) {
Column column = schema.getColumn(i);
switch (column.getType()) {
case BIGINT:
record.setBigint(i, 1L);
record.setBoolean(i, true);
record.setDatetime(i, new Date());
case DOUBLE:
record.setDouble(i, 0.0);
case STRING:
record.setString(i, "sample");
throw new RuntimeException("Unknown column type: "
+ column.getType());
// Writes the data to the server. Each 4 KB of data written triggers a network transmission
// If no network transmission occurs for 120 seconds, the server closes the connection. At this time, the Writer becomes unavailable and you must write data again.
// Closing successfully means that the block was uploaded successfully, but the data is not visible in the odps temporary directory until the entire session is committed
} catch (TunnelException e) {
// It is recommended to retry a certain number of times
System.out.println("write failed:" + e.getMessage());
} catch (IOException e) {
// It is recommended to retry a certain number of times
System.out.println("write failed:" + e.getMessage());
// Submit all the Blocks. uploadSession.getBlockList() can specify the blocks it will submit. The data will not be formally written to the Odps partition until the Commit succeeds. It is recommended to retry 10 times if the Commit fails
for (int retry = 0; retry < 10; ++retry) {
try {
// Seconds operation, formally submitting data
} catch (TunnelException e) {
System.out.println("uploadSession commit failed:" + e.getMessage());
} catch (IOException e) {
System.out.println("uploadSession commit failed:" + e.getMessage());
System.out.println("upload success!") ;
} catch (TunnelException e) {
} catch (IOException e) {

Frequently Asked Questions about MaxCompute Tunnel

Can block IDs be repeated?

Each block ID in an Upload session must be unique. That is, for the same UploadSession, open the RecordWriter with one blockId and call the “Close” after writing a batch of data.

Is there a restriction on block size?

The maximum size of a block is 100 GB. We strongly recommend that you write 64 MB or more data into each block. Each block corresponds to one file. A file smaller than 64 MB is a small file. Too many small files will affect the performance.

Can a session be shared? Does a session have a lifecycle?

Each session has a 24-hour lifecycle on the server. It can be used within 24 hours after being created, and can be shared across processes or threads on the condition that the same BlockId is repeatedly used. Distributed uploading can be done through:

If a session is created but not used, does it consume system resources?

Upon creation, each session generates two file directories. If a large number of sessions are left unused after created, temporary file directories will increase and accumulate, causing extra burden on the system. Therefore, you should avoid creating too many sessions and instead use shared sessions whenever possible.

How can I process Write/Read timeout or I/O exceptions?

During the process of uploading data, a Writer writing every 8 KB data will trigger a network action. If no network actions are triggered within 120 seconds, the server closes the connection. At this point, the Writer become unavailable, and you need to open a new Writer to write data.

Is MaxCompute Tunnel suitable for batch uploading or stream uploading?

MaxCompute Tunnel is designed for batch uploading rather than stream uploading. For stream uploading, you can use the [high-speed streaming data channel DataHub ] to write data only with milliseconds of latency.

Are partitions required for data uploading through MaxCompute Tunnel?

Yes, MaxCompute Tunnel does not automatically build partitions.

What is the relationship between Dship and MaxCompute Tunnel?

Dship is a tool that uploads and downloads data through MaxCompute Tunnel.

Does data uploaded with Tunnel append to or overwrite existing data on a file?

The uploaded data appends to the file.

What is the routing function of MaxCompute Tunnel?

The routing function allows the Tunnel SDK to get the Tunnel endpoint by setting MaxCompute. That is, you can run the Tunnel SDK properly by setting the endpoint of MaxCompute.

How much data in a block is preferred when uploading data with MaxCompute Tunnel?

There is no absolute answer to this question. It depends on a variety of factors, such as network performance, real-time requirements, the specific use of the data, and small files in clusters. Generally, we recommend that you limit data in a block between 64 MB and 256 MB if data is relatively large in size and needs to be continuously uploaded.

Why do I keep getting a timeout prompt when using MaxCompute Tunnel?

This usually happens due to endpoint errors. Please check the endpoint configuration. A simple method is to check the network connectivity by using tools like telnet.

Why do I receive the exception, “You have NO privilege ‘odps:Select’ on {acs:odps:*:projects/XXX/tables/XXX}. project ‘XXX’ is protected” when I use Tunnel to download data?

The data protection function has been enabled for the project. Only the project owner has the right to transfer data from one project to another if the project data is protected.

Why do I receive the exception, “ErrorCode=FlowExceeded, ErrorMessage=Your flow quota is exceeded” when I use Tunnel to upload data?

The maximum number of concurrent requests is exceeded. By default, MaxCompute Tunnel allows a maximum of 2,000 concurrent upload and download requests (quota). Each request, once it is sent, occupies one quota unit until it ends. Try the following solutions:

  1. Change the system to the sleep status, and try again later.
  2. Increase the tunnel concurrency quota for the project. We recommend that you contact the administrator to evaluate the traffic flow.
  3. Report the exception to the project owner to identify and control the top concurrency quota consumers.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: