Behind Alibaba’s Double 11 Mysterious “Dragonfly” Technology ®C PB-Grade Large-File Distribution System

What Is Dragonfly?

Alibaba Dragonfly is an intelligent P2P based image and file distribution system. It resolves problems in large-scale file distribution scenarios such as time-consuming distribution, low success rates, and bandwidth waste. It significantly improves business capabilities such as distribution deployment, data pre-heating, and large-scale container image distribution. Within Alibaba, Dragonfly has already exceeded an average of 2 billion distributions per month, distributing 3.4 PB of data, and succeeding in becoming part of Alibaba’s infrastructure. Container technology brings convenience to operation and maintenance (O&M), but also presents massive challenges to image distribution efficiency. Dragonfly supports many kinds of container technology, such as Docker and Pouch. Once Dragonfly is used, image distribution can be accelerated by as much as 57 times and data source network export traffic reduced by over 99.5%. Dragonfly can save bandwidth resources, upgrade O&M efficiency, and reduce O&M costs.

Sources of Questions

Dragonfly is a P2P file distribution system that is a product of Alibaba’s own research. It is an important component of the company’s basic O&M platform, and forms the core competitiveness of its cloud efficiency°™a smart O&M platform. It is also a crucial component of its Cloud Container Service.

Design Goals

To address these shortcomings, Dragonfly set a few objectives at the beginning of the design process:

  1. To solve the problem of file sources being blown out, a P2P network was organized between hosts, alleviating pressure on the file servers, and saving bandwidth resources that spanned IDC.
  2. Accelerating file distribution speeds and ensuring that the fluctuations between over ten thousand servers downloading and one server downloading were not too significant.
  3. Resolving transnational download acceleration and bandwidth saving.
  4. In solving the problem of large file downloads, it is necessary to also support continued transmission during power failures.
  5. The host’s computer disk IO and network IO must be capable of being controlled in order to avoid impacts on business.

System Architecture

  1. The traditional mode increases with the client, the download duration increases accordingly, and the fact that dfget can support up to 7,000 clients still hasn’t been improved.
  2. After the traditional mode has reached 1,200 clients there is no more data because the data source has been blown out.

Moving From Distribution Systems to Infrastructure

After Double 11 2015, Dragonfly achieved a download rate of 120,000 per month and a distribution volume of 4 TB. During that period, other download tools were used at Alibaba, such as wget, curl, scp and ftp, as well as small-scale distribution systems of our own construction. Apart from our full-coverage distribution system, we also conducted small-scale promotion. By around Double 11 2016, our download volume reached 140 million per month, and our distribution volume 708 TB.

  1. To reduce equipment duplication.
  2. To optimize our overall situation.

Alibaba’s Container Technology: PouchContainer

The strengths of container technology naturally need little introduction. From a global perspective, Docker enjoys the greatest share of the container technology market. Of course, apart from Docker, other solutions exist, such as rkt, Mesos Uni Container, and LXC, while Alibaba’s container technology is called Pouch. As early as 2011, Alibaba, on its own initiative, researched and developed the LXC container, T4. At that time, we hadn’t created the concept of the image. T4 nevertheless served as a virtual machine, but of course it had to be much lighter than that.

  1. Large-scale simultaneous distribution: must be able to support a 100,000-level simultaneous pull image scale.
  2. Non-intrusion into the internal core of container technology (Docker Daemon, registry). In other words, it cannot alter any container service code.
  3. Support of all container and virtual machine technologies, such as Docker, Pouch, Rocket, and Hyper.
  4. Support of image warm-up (pushes to Dragonfly cluster CM during construction).
  5. Supports large image files (at least 30 GB).
  6. Security

Native Docker vs. Dragonfly

All together, we performed two sets of experiments:

Experiment 1: 1 Client

  1. Image sizes tested: 50 MB, 200 MB, 500 MB, 1 GB, 5 GB
  2. Image repository bandwidth: 15 Gbps
  3. Client bandwidth: double 100 Mbps network environment
  4. Scale test: single download

Experiment 2: Multi-Client Concurrency

  1. Image sizes tested: 50 MB, 200 MB, 500 MB, 1 GB, 5 GB
  2. Image repository bandwidth: 15 Gbps
  3. Client bandwidth: double 100 Mbps network environment
  4. Multiple concurrencies: 10 concurrencies, 200 concurrencies, 1,000 concurrencies

Real-World Applications for Alibaba Group

Alibaba has already committed to the use of Dragonfly for about two years, during which business has developed rapidly. Statistics on the number of distributions currently show 2 billion per month, distributing 3.4 PB of data. The volume of container image distribution accounts for almost half of this.

Smart Traffic Control

Traffic control is commonly observed in road traffic. In the speed-limit regulations on Chinese roads, for example, the speed limit on highways without center lines is 40 km/h. By the same token, only one public road for motor vehicles has a speed limit of 70 km/h. On high-speed roads, it is 80 km/h, and the maximum speed limit on freeways is 120 km/h. These kinds of limits are the same for all vehicles: clearly not flexible enough. So, in circumstances in which the road is extremely clear, resources are in fact heavily wasted and overall efficiency is extremely low.

Smart Scheduling

Block job scheduling is the critical element in determining whether the distribution rate is high or low. If performed merely by a simple scheduling strategy, such as according to the situation or other fixed priority scheduling, it will always cause fluctuations in the regularity of download speed, easily leading to excessive downloading glitches, and, at the same time, very poor download speeds. We made countless attempts and probes to optimize job scheduling, ultimately adopting multi-dimensional data analysis and smart trends to determine requesters’ optimal follow-up block job lists. The many dimensions included machine hardware configuration, geographical position, network environment, and historical download results and speeds. The data analysis mainly used gradient descent algorithms and other follow-up algorithms.

Smart Compression

Smart compression implements an appropriate compression strategy for the part of the file that most merits compression, and can thereby save large volumes of network bandwidth resources.


When downloading certain sensitive files (such as classified files or account data files), transmission security must be effectively guaranteed. In this regard, Dragonfly performs two principal tasks:

  1. Supports HTTP header data in order to support file sources that have to perform verification by means of the header.
  2. Uses symmetric encryption algorithms, applying transmission encryption to the file contents.


Dragonfly resolves large-scale file downloading and all kinds of difficult file distribution issues in cross-network, isolated scenarios using P2P technology combined simultaneously with smart compression, smart traffic control, and a wide range of innovative technologies. This significantly improves data preheating, large-scale container image distribution, and other business capabilities.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: