Building a Web-Based Platform with PolarDB: Yuanfudao Case Study

8 min readApr 24, 2019

The enormous amount of issue databases, audio and video frequency answer materials, user data and diaries has provided stringent requirements for Yuanfudao’s back end data storage and processing abilities. Owing to the education guidance industry business features, Yuanfudao is also facing huge challenges of business peak values for data bank capacities. This text introduces to everyone how the Alibaba Cloud PolarDB helps Yuanfudao build a web-based platform for “Children Like Teachers”?

Yuanfudao Business Background

Yuanfudao is a well-known online educational institution in China. It has five core online education applications under its umbrella, namely Yuanfudao, Yuantiku, Xiaoyuansouti, Xiaoyuankousuan and Zebra English. It provides relevant intelligent education services including online guidance, photographing answers to questions, an intelligent problem database and automatic correction of school assignments for students and parents. Yuanfudao possesses a ten-billion-level K-12 study behavior database, and is taking the lead in applying leading edge technologies such as AI and big data in educational scenarios.

Challenges Faced by Yuanfudao’s Self-Configured Database Scheme

Self-configured database solutions were used before Yuanfudao, but owing to their own business characteristics, on weekends or during online virtual testing, the number of people who are simultaneously online will abruptly and instantly increase, and the original self-configured database scheme has difficulty coping with this visit peak, and therefore, about one-third of the students will be unable to access the online test. The delay in the response also increased from within 1 second under normal circumstances to an average of 5 seconds, and this caused an abrupt decline in the question-answering experience of student users. At the same time, the number of users of Yuanfudao is increasing by leaps and bounds each year, and the CPU use rate of the self-configured MySQL database has already reached over 70%. In addition, Yuanfudao’s DBA was originally handled concurrently by Ops personnel, but they faced numerous and complicated database management tasks, and their wish to handle the DBA concurrently fell short of their abilities. It was moreover estimated that capital of at least 1 million yuan would be needed to hire professional DBA. In sum, the self-configured database scheme had difficulty dealing with visit peaks, and had difficulty in satisfying the high-speed development needs of the business. In addition, there were problems management difficulties, which would increase labor costs. These are all huge challenges faced by Yuanfudao high-speed development.

Solutions Based on the PolarDB Database

To face the various challenges described above, Yuanfudao achieved a new database solution based on the Alibaba Cloud PolarDB. Yuanfudao decided to use the Alibaba Cloud PolarDB database because the PolarDB database has relatively high performance and can achieve 100% compatibility for MySQL. In addition, it focused more on PolarDB’s elastic scaling ability and its capacity to reach a maximum of 100T. Owing to Yuanfudao’s business characteristics, the number of user page views can be handled easily at ordinary times, but it reaches a peak of business visits on weekends and during tests, and therefore the main difficulty in coping with database problems is the read and write competition caused by high concurrent user visits; this in turn makes the I/O relatively high, and if you have been consistently purchasing a highly configured MySQL database it is hard to bear the costs. By using the Alibaba Cloud PolarDB, Yuanfudao avails itself of its rapid and elastic abilities, and temporarily adds the data bank configuration and cluster scale during the peak of business. This greatly reduces the overall cost compared to the previous solutions.

Alibaba Cloud PolarDB Minute-level Elastic Database Cluster

As far as Yuanfudao’s products that have clear business peaks are concerned, the most important thing is the minute-level elastic ability of PolarDB. And lying behind its powerful elastic ability is actually the design that separates the PolarDB storage and calculation. The so-called separation is the calculation node (DB engine) and storage node (DB Store) on different physical servers, and any I/O operation that lands on storage equipment is a network I/O. Moreover, it avails itself of the testing performance effects of PolarFS through network visits to the PolarStore to be basically capable of balancing the local single copy SSD. PolarDB’s storage and calculation separation architecture reduces storage costs, ensures high data consistency between the master and slave data, and prevents data loss. In addition, it has a huge advantage insofar as it makes elastic scaling of the database extremely simple and convenient.

Alibaba Cloud PolarDB hierarchical architecture drawing

As shown above, PolarDB is a hierarchical architecture. The proxy PolarProxy from the top layer provided such functions as the read-write splitting and SQL acceleration, the database engine node PolarDB at the middle layer creates a database cluster of multiple-read single-write, and then the distributed storage PolarStore at the lower layer provides data sharing with multi-node mounting for the top layer. Each of these layers handles its respective duties, and they jointly make up the PolarDB Cloud database cluster.

From the definition of the PolarDB product, what is meant by the number of nodes purchased by users and the specification size (e.g., 4 cores, 16G) is the configuration of the PolarDB in this middle layer and the upper layer PolarProxy can self-adjust based on the configuration of the PolarDB. Users do not need to purchase it and also do not need to be concerned about performance or capacity. The capacity of the lower layer PolarStore is automatically enlarged, and it is only necessary to pay based on the capacity that is actually used.

Regarding scalability in the ordinary sense, there are generally two modes, scale up and scale out, scale up means upgrading the configuration, while scale out means the configuration is unchanged but the nodes are increased. As for the databases, they are all first scale up, for example if 4 cores are insufficient it is upgraded to 8 cores. However, eventually there will be bottlenecks. On the one hand, the performance improvement is nonlinear, and this is related to the design and application access model of the database engine itself (in the multi-threaded design of MySQL, if there is only one session it very hard to embody the advantages of several cores), while on the other hand there is an upper limit to a physical calculation server and a ceiling exists. Therefore, the ultimate means is to scale out and to add more nodes.

Schematic diagram of the Alibaba Cloud console PolarDB upgrade and downgrade operations

As far as PolarDB is concerned, its elasticity can be summarized as “capable of achieving a maximum of 16 nodes horizontally, and capable of achieving a maximum of 88 cores vertically, and in addition the storage capacity expands dynamically, and there is no need to configure it.” In the background of this kind of powerful elastic ability, how is the lower layer of the PolarDB ultimately realized? Next an introduction from three aspects, horizontal, vertical and storage will be provided.

Vertical expansion (upgraded/downgraded configuration): Thanks to storage and computing separation, the configuration of the PolarDB database nodes can be separately upgraded or downgraded. If the current server resources are insufficient, it is also possible to migrate rapidly to other servers. The entire process only requires 5–10 minutes at present, and there is no need for any data relocation in the middle, but if cross-machine migration is involved, it is possible to eliminate the effects of upgrading on the business applications by PolarProxy in the future. Because at present all of the nodes within the same cluster must be bound and upgraded, PolarDB has adopted the rolling upgrade method, and it has further reduced unavailable time by controlling the pace of upgrading and arranging master and backup switches.

Horizontal expansion (increased/decreased nodes): Since the memory is shared, it is possible to increase the nodes rapidly, and no data COPY of any kind is needed. The entire process only requires 5–10 minutes. If nodes are increased, there is no effect at all on the business applications, but if nodes are decreased, there is an effect only on the connection that falls on the said node execution, and the connection can be re-established. After the nodes are increased, PolarProxy can dynamically detect and automatically include it in the read node of the read-write splitting back end. It is possible to immediately enjoy better performance and throughput for using the cluster access address (the read-write splitting address) to connect to the application programs of the PolarDB.

The storage space that does not need management: You do not need to be concerned about PolarDB’s storage space as payment works on a pay-as-you-go basis, settled automatically for each hour. In the current design, the I/O ability is related to the specification of the database node. The larger the specification is, the higher the IOPS and I/O throughput will be. There are isolation and restrictions for I/O on the node, and it is possible to avoid I/O contention between several database clusters. Essentially, the data is saved in a storage pool composed of a large number of servers. Owing to reliability requirements, 3 copies are replicated for each data block, and these are saved on different servers on different racks. The storage pool can carry out self-management, dynamic expansion and balancing, and it prevents storage fragmentation and data hot spots.

The Benefits Obtained by Yuanfudao from Using PolarDB

After Yuanfudao migrates the data to the Alibaba Cloud PolarDB, it will be capable of simultaneously supporting a large number of students in studying online during business peaks, and the business basically will not feel any pressure. When coping with a business peak, it only needs to make preparations one hour beforehand, and he business ability could temporarily be upgraded to a level where it is sufficient to cope with 1 million concurrent students visits. Moreover, owing to the fact that the PolarDB database resources can be scaled elastically on demand, after Yuanfudao migrated from MySQL to PolarDB, it saved the capacity of 5 read-only libraries, and thereby saved almost 70% of the expenditure outlays for the database. In addition, it was able to reduce the work volume of online database management by 95%, and there was no need for a senior professional DBA to attend to the database, so it was possible greatly reduce the labor costs for database maintenance. Finally, if the matter is viewed from the business standpoint, the user experience was improved greatly after Yuanfudao migrated to PolarDB.

Reference:

https://www.alibabacloud.com/blog/building-a-web-based-platform-with-polardb-yuanfudao-case-study_594705?spm=a2c41.12785312.0.0