The Secret behind Youku’s Success with Big Data

Youku’s Migration to MaxCompute

Hello, everybody! I am Men Deliang, currently working on middleware-related data work at Youku. It is my honor to see that Youku has switched to MaxCompute. I have been working for Youku for nearly five years. Right at the fifth year after I joined Youku, Youku migrated from Hadoop to MaxCompute. This shows the development process from May 2016 to May 2019. The upper section shows computing resources and the lower section shows storage resources. You can see that the number of users and the volume of table data are actually increasing exponentially. However, after Youku completely migrated from Hadoop to MaxCompute in May 2017, the computing and storage consumption has been decreasing. The migration continues to bring significant benefits to Youku.

Why Youku Chose MaxCompute?

Based on these business characteristics, I summarized several features of MaxCompute that can perfectly support our business:

  1. Simplicity and ease of use
  2. Full-fledged ecosystem
  3. Robust performance
  4. Elastic use of computing resources

Simplicity and Ease of Use

MaxCompute provides a complete link that covers data modeling, data administration (data integration and quality control), data map, and data security. After Youku migrated from Hadoop to MaxCompute that year, the biggest benefit is that we no longer have to maintain clusters and run tasks at midnight as we did before. Before the migration, I may have needed several weeks for a request proposed by one of my colleagues. However, now I can immediately run the task and obtain the result. Previously, to conduct BI analysis, analysts had to log on to the client and write scripts and scheduling tasks themselves. They often complained that the data they needed wasn’t available yet. Data required by high-ranking executives may not be available until after noon. Nowadays, basically all the important data is produced by 7 a.m. Some basic business requirements can be implemented by analysts themselves, without having to send all the requests to the data department.

Full-fledged Ecosystem

Before 2017, Youku was completely based on Hadoop. After migration to MaxCompute, Youku is based on the serverless big data service ecosystem provided by Alibaba Cloud. The components in the open-source community can also be found in the MaxCompute ecosystem and are better and simpler. As shown in the architecture diagram, MaxCompute is in the middle, and MySQL, HBase, ES, and Redis are on the left side and implement two-way synchronization from the synchronization center. Resource management, resource monitoring, data monitoring, data asset management, and data specifications are on the right side. Our underlying data input involves some collection tools provided by Alibaba Group. In the upper layer, DataWorks is provided for developers, including some command line tools, and QuickBI and Data Services are provided for BI developers.

Robust Performance

MaxCompute supports EB-level data storage at Youku and analysis of hundreds of billions data samples, hundreds of millions of data reports, and concurrency and tasks for hundreds of thousands of instances. This high performance was completely unimaginable when Hadoop was used.

Elastic Use of Computing Resources

Before we started the migration in 2016, we had already included more than 1,000 machines in our Hadoop cluster, which was considered a large-scale cluster at that time. However, we had many annoying problems at that time, including the name mode memory problem, the inability to scale the data center, and some O&M and management problems. We had to constantly request resources from our O&M colleagues, who often replied that we had used lots of resources and money. The problem that we faced was how to use computing resources on demand. A large number of tasks run at night and requires lots of resources. In the afternoon, the entire cluster is idle, unnecessarily consuming many resources. MaxCompute can perfectly solve this problem.

Youku’s Big Data Solution Architecture

