Alibaba Deploys Alibaba Open Channel SSD for Next Generation Data Centers
Alibaba is deploying Alibaba Open Channel SSD as part of its next generation data center infrastructure, marking a major milestone after its announcement of Open Channel storage architecture in FAST 2018 conference.
Alibaba Open Channel SSD, or AOC SSD, is Alibaba’s 3rd homebrew storage product and the first productionized Open Channel SSD in the industry. Its deployment not only marks a key step of Alibaba’s next-generation storage architecture, but also lays ground for an innovative Open Channel ecosystem.
In FAST 2018 conference held in February, Alibaba presented a software/hardware integrated storage architecture for its next-generation data centers. The new architecture is designed to tackle new challenges for data centers as the proliferation of AI, Cloud Computing and Big Data applications. Some challenges include:
- Performance: Demands for software/hardware co-optimization
- Flexibility: Diversified and fast-changing workloads demand more flexibility and customizations
- Continuous improvement of TCO and supply elasticity
- Application demands control of I/O path for deterministic performance
- Demands for quicker response to online issues and new feature requests
Neither traditional standard SSDs nor proprietary host-based SSDs addresses these challenges efficiently. Therefore, Alibaba proposed a new software/hardware integrated storage architecture for next-generation data centers, with the core being Alibaba Open Channel SSD along with Fusion Engine storage engine software on top.
Alibaba’s Software/Hardware Integrated Storage Architecture
As the core hardware of Alibaba’s new storage architecture, AOC SSD not only delivers a white-box design and ability for customization, but also a standardized platform. The AOC SSD Specification that defines the AOC SSD platform is written by Alibaba based on its business needs and use cases, and has no affiliation with the “Open Channel Spec 1.2/2.0” from other companies.
AOC SSD architecture is designed with idea of openness and collaboration. Alibaba has been trying to build an AOC SSD ecosystem along with vendors and industry partners. The goal is to reduce time and complexity of product qualification, improve supply elasticity, so that all participants can benefit from the collaboration.
The AOC SSD Hardware
At the heart of AOC SSD is AliFSC, Alibaba’s first customized high-performance storage controller.
AliFSC Storage Controller
AliFSC is a high-performance controller customized for Alibaba’s Open Channel SSD requirements. It features 6 cores and 16 channels design, PCIe G3x8 interface, and can work with all major 3D TLC NAND components. Support for QLC NAND is also included, with preliminary QLC firmware development already started.
AliFSC supports all commands specified by AOC SSD specifications and provides hardware accelerations — commands include system metadata (MBR) management, flexible parity groups, XOR engines, multiple write streams, as well as program/erase failure handling in Open Channel mode.
First productionized in industry, the AOC SSD (AliFlash V3) is built with same physical form factor as standard 2.5-inch U.2 NVMe SSDs. Therefore it is compatible with all server models with NVMe port in Alibaba’s data centers. After product development was completed in March 2018, AOC SSD has undergone several round of software/firmware optimizations and is now being deployed for trial run in Alibaba’s data centers.
Alibaba Open Channel SSD (AliFlash V3)
The AOC SSD Software
AOC SSD works with its host-side software stack in order to get full benefit from its software/hardware integrated design. In parallel with hardware development, Alibaba developed AOC host-side software stack for different use cases:
- Kernel-space AOC SSD Driver and Block Device FTL. This allows application to use AOC SSD as generic block device, covering most of the legacy use cases.
- User-space AOC SSD Driver, which works along with Alibaba’s Fusion Engine (user-space storage engine). Moving entire I/O path into user-space significantly reduces software overhead, which is one of the key advantages of Alibaba’s software/hardware integrated architecture.
- User-space FTL solutions customized for Alibaba’s use cases (non-block solutions). This is also the first full user-space Open Channel software solution in industry.
- Full suite of management, monitoring and test tools that integrates with Alibaba’s DevOps infrastructure.
The entire AOC SSD host-side software stack is developed by Alibaba, which is different than the lightnvm solution in open source community. In fact, AOC SSD host-side software is by far the only Open Channel software solution that reaches production quality and is deployment-ready. Moreover, AOC SSD’s user-space software is also the first full user-space Open Channel software solution in industry.
Software/Hardware Integrated Solutions with AOC SSD
AOC SSD works with Alibaba’s Fusion Engine to deliver software/hardware integrated solutions for Alibaba’s business units. Several different solutions has already been developed.
Solution based on kernel-space AOC Driver and Block Device FTL, covering most of the legacy use cases.
Significant improvement and extension has been done on AOC Block Device FTL since its development. Preliminary tests show random read and write IOPS of 700K and 120K respectively, exceeding major alternative of standard NVMe SSD.
In addition to basic I/O performance, AOC SSD further optimized QoS capability along with Fusion Engine software, resulting in much lower read/write latencies for high-priority applications.
Read latency of high-priority applications. Average latency reduced by 75%, and 99% latency reduced by 83%.
High-priority application’s read/write latency in mixed workload. Average read/write latency reduced by 81% and 99% respectively, and 99% read latency reduced by 49%.
Solution based on User-space AOC SSD Software.
Using User-Space AOC SSD Software, customized FTLs are developed for specific use cases to achieve higher level of software/hardware co-optimization. For example, Alibaba developed an “Object SSD FTL” for the Key-Value use cases, which is widely used by Alibaba’s business units. The Object SSD FTL works with Alibaba’s user-space KV engine. By exposing its AOC SSD’s parity groups as objects to KV engine, and by combining the internal Garbage Collection with application’s compaction operations, the KV-Object SSD solution is expected to reduce write-amplification factor by up to 4 times and 99% latency by up to 80%.
Comparing AOC SSD to standard NVMe SSD, in summary:
AOC SSD Ecosystem
AOC SSD architecture is designed as a platform. Alibaba is working with major SSD vendors to develop compatible AOC SSD products. In these collaborations, SSD vendors are responsible for SSD hardware and firmware, while Alibaba is responsible for host-side software. Both sides work together in a joint-development-and-debug model, in order to reduce time of product development and qualification. Alibaba is deploying the vendor-supplied AOC SSD products in data center, and will adopt AOC SSD in to all Alibaba server models.
Future Plans for AOC SSD
As the core hardware of Alibaba’s next-generation storage architecture, Alibaba has a long-term plan for AOC SSD.
- Near-term: massive deployment of vendor-supplied AOC SSD products.
- Mid-term: QLC version of AOC SSD, Ultra-low latency version of AOC SSD, Computing offload solution with direct link to FPGA/GPU accelerators.
- Long-term: New non-volatile media solutions, In-storage/In-memory computation solutions.
The deployment of Alibaba Open Channel SSD is not only a key step of Alibaba’s next-generation storage architecture, but it also marks Alibaba’s transition from a follower to leader in storage technology. The massive deployment of AOC SSD will benefit Alibaba’s infrastructure, making it more efficient and competitive in support for Alibaba’s business innovations and globalization.