OSS Open-Source Tool OSSUTIL: Upload Performance Tuning

Parameters

— recursive

When uploading files to OSS, if file_url is the directory, we must specify the --recursive option; if it is not the directory, we won't have to specify it.

  1. If the --recursive option is not specified, a single object will be copied. In this case, be sure src_url accurately leads to the object to be copied. Otherwise, an error will be thrown.
  2. If the --recursive option is specified, OSSUTIL will perform prefix match search on the src_url, and batch copy the matching objects. If a copy operation fails, the implemented copy operations will not be rolled back.
  1. If an error already occurred before entering the batch file iteration process, the report file will not be generated, and OSSUTIL will be terminated. For example, when you enter the incorrect cp command, the report file will not be generated. Instead, an error will be output on the screen, and OSSUTIL will quit.
  2. During the batch operation process, OSSUTIL will output an error on the screen and quit if any one of the following errors occur: Bucket does not exist; accessKeyID/accessKeySecret error causes invalid permission verification, etc.

Concurrent Control Parameters

  1. The --jobs option controls the number of concurrent operations enabled between files when multiple files are uploaded/downloaded/copied
  2. The --parallel option controls the number of concurrent operations between multiparts when big files are uploaded/downloaded/copied.

— part-size option

This option sets the size of each part during multipart upload/download/copy of big files.

Performance Tuning

If the number of concurrent operations is too large, OSSUTIL’s upload/download/copy performance may be reduced due to inter-thread resource switching and hogging. Therefore, adjust the values of these two options based on the actual machine conditions. To perform pressure testing, set the two options to small values first, and slowly adjust them to the optimal values.

Case Study

Analysis

5 files were downloaded (version<=1.4.0, and the concurrency among different files was 5) simultaneously by default, because we were downloading many files.

Further Explanation

It’s not about resources required by OSS, but the CPU, MEM, and network resources required by each concurrent operation (reading files, splitting the files into specified parts, upload, and other operations).

  1. The --jobs option sets the concurrency among different files, which is 5 by default (version <= 1.4.0, and then 3)
  2. The --parallel option sets the multipart concurrency within a big file. When the parallel and partsize parameters are not set, the value of this option will be calculated based on the file size, and will not exceed 15 (version <= 1.4.0, and then 12)
  3. If there are too many files and they have uneven sizes, we can set the options as --jobs=3 and --parallel=4 (concurrency among different files is 3, and concurrency within a single file is 4; the actual numbers should be adjusted based on actual conditions of the machine)

Summary

  1. A cp command is concurrently executed by default. Multipart concurrent download is used for copying large files, and put is used for small files. CRC verification is enabled by default.
  2. When copying files among OSS instances, currently only objects can be copied, and copying incomplete multiparts are not supported.
  1. The ratios between the number of jobs and the number of CPU cores, and the parallel number and the number of CPU cores should be respectively 1:1 and 2:1, and should not be too big
  2. If there are too many concurrent operations, the network will be fully occupied and the CPU will be busy. I suggested you observe the environment’s CPU, network, process/thread conditions when performing concurrent operations

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

4.97K Followers

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com