Storage Policies and Read/Write Optimization in JindoFS

By Li Zhipeng, object storage development engineer from Inspur Electronic Information Industry CO.,Ltd.

The E-MapReduce team of the computing platform division has explored and developed the JindoFS framework to accelerate data read/write performance in computing-storage separation scenarios. Mr. Yao Shunyang from Alibaba Cloud Intelligence team presented a detailed introduction to JindoFS storage policies and read/write optimization. Based on his introduction, this article further discusses the following topics:

  • The data cache scenario, as well as the background and motivation for data caching
  • The policy and optimization of data read/write operations
  • Cache data management
  • Best practice

Data Cache Scenario

Therefore, it is necessary to create a cache layer for the backend storage cluster on the computing side, which is in the computing cluster. Through cache layer’s data cache, access traffic from the computing cluster to the storage backend is reduced, which significantly mitigates the bottleneck caused by network throughput.

Cache Acceleration in JindoFS

JindoFS is consisted by the following components:

  1. JindoFS SDK client: All the upper-layer computing engines access the JindoFS file system through the JindoFS SDK clients to accelerate the backend storage cache.
  2. Namespace Service: It is used for JindoFS metadata management and Storage Service management.
  3. Storage Service: User data management includes local data and data in OSS.

JindoFS is a cloud-native file system that supports the performance of local storage and the ultra-large capacity of OSS. It also supports the following types of backend storage:

  • Data lake scenario on the cloud that supports an OSS instance as the backend.
  • Remote HDFS acceleration (coming soon)
  • Cross-regional HDFS deployment
  • Online computing clusters accessing offline HDFS clusters in hybrid cloud scenarios

Data Read/Write Policy and Optimization

Write Policy

Write Policy 1: During the write process, the client writes data to the cache blocks of the corresponding Storage Service, and the Storage Service uploads the data in cache blocks to the backend storage through multiple threads in parallel.

Write Policy 2: In terms of the pass-through, use the JindoFS SDK pass-through to upload data to the backend storage system. The SDK has made a lot of related performance optimizations. This method applies to the data production environment. It is only responsible for generating data and has no requirement for the subsequent computing process and read operations

Read Policy

  1. First, read the cache data blocks from the local nodes, such as Block1 and Block2 shown in the following figure.
  2. If no local node exists in the cache, the client sends a request to the Namespace Service for cache data block. For example, if the data Block 3 to be read is on Node 2, the client reads Block 3 from Node 2.
  3. If the Node does not exist, the system reads data from the remote OSS cluster and adds the data to the local Storage Service cache to accelerate the next read operation on the data.
  4. Based on the preceding basic policies, JindoFS provides a policy that supports dynamic multiple backups. This policy reads data from other nodes by configuring parameters, then generates backups in the cache. This further accelerates the read access to the high-heat data blocks.

Cache Locality

JindoFS Namespace maintains the location information of all cache data blocks. Therefore, through related APIs provided by the Namespace, the location information of data blocks can be delivered to the computing layer. Then, the computing layer pushes tasks to the node where the cache data block is located. In this way, most data can be read from the local cache and the rest of the data can be read through networks. Through Cache Locality, most data are read locally, ensuring an optimal data reading performance for computing jobs, as shown in the following figure.

Usage of JindoFS

  • Block mode: JindoFS manages metadata and OSS stores data blocks in the backend.
  • Cache mode: This mode is transparent and imperceptible to users. It has the following scenarios:
  • Cache is not enabled because the cluster is small.
  • Cache is enabled and it resolves insufficient OSS bandwidth through the local cache block. Cache can be turned on or off through the configuration item. Set to 0 to turn on cache, and 1 to turn off.

Cache Data Management

  • Cache management of local data blocks
  • Lifecycle maintenance of local data blocks

Storage Service implements the management of data access information. That is, all read/write operations are registered with AccessManager, while storage.watermark.high.ratio and storage.watermark.low.ratio are provided to manage cache data. When the used cache capacity in the local disk reaches the threshold value of storage.watermark.high.ratio, AccessManager automatically triggers the cleanup mechanism. Some cold data in the local disk are deleted till the storage.watermark.low.ratio is triggerd to give more disk space for hot data.

Automatic Cleanup of Cache Data Blocks

The following figure shows the cleanup process of Cache Data Blocks.

Explicit Designation Cache

Use the cache command to explicitly cache the backend directories or files, and use the uncache command to release cold data.

Best Practice

How to Configure a Cluster

  • More persistent nodes for caching
  • Disks for caching nodes
  • Network bandwidth for caching nodes

Configure the configuration item to set network bandwidth:

Configure the following items for data locality:

  • spark.locality.wait.rack3s -> 0
  • spark.locality.wait.node3s -> 0

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store