How PolarDB Produced Scores for 1.2 Million Online Test Takers in 10 Minutes
In order to win this inevitable battle and fight against COVID-19, we must work together and share our experiences around the world. Join us in the fight against the outbreak through the Global MediXchange for Combating COVID-19 (GMCC) program. Apply now at https://covid-19.alibabacloud.com/
The recent coronavirus outbreak has led to innovation in every sector, such as healthcare, education, and telecommunications. Many schools and higher education institutions have resorted to online learning platforms to conduct lessons and host exams, in order to keep students engaged.
However, because of the sudden increase of demand, many online education platforms are struggling to cope with sudden traffic peaks. On the first day of class, online education platforms, including Xuexitong and IMOOC, encountered freezing and system crashes. “Xuexitong has crashed” became a hot search on Weibo.
At the same time, the scaling demand of Alibaba Cloud’s customers in the education field increased rapidly, especially for databases. The minute-level scaling capability of Alibaba Cloud databases has ensured the stability of many online education platforms, such as Hujiang, Yuanfudao, VIPKID, and Onion Academy, which helps build a firewall that prevents the epidemic from affecting students’ education.
Yuanfudao is a well-known online tutoring platform in China. It has five core online education Apps: Yuanfudao, Yuantiku, Xiaoyuansouti, Xiaoyuankousuan, and ZebraEnglish. Yuanfudao provides students and parents with intelligent education services including online tutoring, answers to questions searched through photos, intelligent question banks, and automatic correction of school assignments.
As the first unicorn company in the online K-12 education field, Yuanfudao did not suffer from problems such as freezing and latency during the epidemic. In addition, it launched an online practice English test for 1.2 million students at the same time. The system issued the test scores within 10 minutes and generated an intelligent diagnostic and analytical report.
“In order to improve the test experience for millions of people, Yuanfudao temporarily makes an elastic scaling of the Alibaba Cloud ApsaraDB for PolarDB cluster that is in use.” Zhang Wenzhi, head of Yuanfudao O&M, said they had evaluated many database products. Among these products, ApsaraDB for PolarDB performs well and can guarantee smooth user experience through rapid scaling.
Self-built Solutions Are Insufficient Due to Latency, Freezing, and High Costs
The online education industry has massive data storage requirements, such as question banks, audio and video answer materials, user data, and logs. These needs impose strict requirements for Yuanfudao’s background data storage and processing capabilities.
Yuanfudao had previously used a self-built database solution. On weekends or during online practice testing, the number of people who were online concurrently would abruptly and instantly increase, and the original self-built database solution had difficulty coping with such peaks. This led to about one-third of the students being unable to access online tests. The delay in the response also increased from within 1 second under normal circumstances to an average of 5 seconds, causing an abrupt decline in the question-answering experience of student users.
At the same time, the number of users of Yuanfudao was increasing by leaps and bounds each year, and the CPU utilization of the self-built MySQL database had already exceeded 70%. In addition, Yuanfudao used to make the O&M personnel as part-time DBAs. In the case of complicated database management tasks, the part-time DBAs were unable handle the DBA tasks. It is estimated that hiring professional DBA staff would cost at least 1 million RMB. To summarize, the self-built database solution had trouble coping with visit peaks and satisfying the high-speed development needs of the business. In addition, management was difficult, which increased labor costs. These were all huge challenges to Yuanfudao’s rapid growth.
How ApsaraDB for PolarDB Helped Yuanfudao Handle Business Peaks during the Epidemic
To confront the various challenges described above, Yuanfudao implemented a new database solution based on Alibaba Cloud ApsaraDB for PolarDB. Yuanfudao decided to use Alibaba Cloud ApsaraDB for PolarDB because it offers relatively high performance and can achieve 100% compatibility for MySQL. In addition, Yuanfudao values the elastic scaling capability and the maximum storage space of 100 TB, which ApsaraDB for PolarDB can provide.
ApsaraDB for PolarDB’s Minute-level Elastic Scaling Easily Copes with Various Business Scenarios
Due to Yuanfudao’s business characteristics, the number of user page views can be handled easily at ordinary times, but it has business access peaks on weekends and during tests. Therefore, the main difficulty in coping with database problems is the read and write contention caused by high concurrent user access, which, in turn, increases I/O. The continuous purchase of a highly configured MySQL database is also costly. By using ApsaraDB for PolarDB, Yuanfudao has availed itself of the rapid elastic scaling capability to temporarily upgrade database configuration and expand the cluster scale during business peaks. This greatly reduced the overall costs compared to the previous solution.
For those of Yuanfudao’s products that have clear business peaks, the most important factor is the minute-level elasticity that ApsaraDB for PolarDB can provide. Behind this powerful elasticity is the design, which separates ApsaraDB for PolarDB’s storage from computation. The so-called “separation” means that the compute node (DB engine) and storage node (DB Store) are on different physical servers, and any I/O operation that lands on storage equipment is network I/O. Moreover, performance test results show that using PolarFS to access PolarStore over the network is as fast as accessing a local SSD. ApsaraDB for PolarDB’s compute-storage separation architecture reduces storage costs, ensures high data consistency between the primary and secondary databases, and prevents data loss. In addition, it has the advantage of making elastic scaling of the database extremely simple and convenient.
As shown in the preceding figure, ApsaraDB for PolarDB has a hierarchical architecture. At the top layer, the proxy PolarProxy provides functions such as read/write splitting and SQL acceleration, at the middle layer, the database engine node PolarDB creates a multiple-read single-write database cluster, and then at the lower layer, the distributed storage PolarStore provides multi-node data sharing for the top layer. Each of these layers handles its respective tasks, and they jointly make up an ApsaraDB for PolarDB cluster.
From the product definition of ApsaraDB for PolarDB, we can see that the nodes and specifications (such as 4-core, 16 GB) for which a user pays refer to the configuration of PolarDB at the middle layer. PolarProxy at the top layer adapts according to the PolarDB configuration. Users do not need to pay for it, or concern themselves about its performance or capacity. The storage of PolarStore at the bottom layer is resized automatically. Users only need to pay for the actual volume used.
Generally speaking, there are two types of database scalability: vertical scaling (also known as scaling up) and horizontal scaling (also known as scaling out). Vertical scaling is the process of upgrading the configuration, and horizontal scaling is the process of adding nodes without changing the configuration. For databases, we first use vertical scaling. An example of this is upgrading from 4 cores to 8 cores. However, there will be bottlenecks. On the one hand, the performance improvement is nonlinear, which is related to the database engine’s own design and application access model (for example, the multi-threaded design of MySQL does not show the multi-core advantage if there is only one session). On the other hand, there are upper limits for the configuration of a physical computing server. Therefore, the ultimate means is to scale out by adding more nodes.
Underlying Technical Implementation of ApsaraDB for PolarDB
ApsaraDB for PolarDB’s elasticity can be summarized as “capable of supporting a maximum of 16 nodes horizontally, and capable of supporting a maximum of 88 cores vertically, with dynamically expanding storage capacity that does not require configuration.” Against the backdrop of this powerful elastic capability, how is the underlying layer of ApsaraDB for PolarDB implemented? Here we will discuss two aspects: horizontal scaling and vertical scaling.
Vertical scaling (upgrading or downgrading the configuration): Thanks to compute-storage separation, the configurations of the ApsaraDB for PolarDB database nodes can be upgraded or downgraded separately. If the current server resources are insufficient, it is also possible to migrate rapidly to other servers. At present, the entire process only takes 5 to 10 minutes, and does not require data relocation. For cross-server migration, it will be possible to eliminate the effects of upgrades on business applications by using PolarProxy in the future.
At present, all of the nodes within the same cluster must be bound for upgrades. ApsaraDB for PolarDB has adopted a rolling upgrade method, which further reduces periods of unavailability by controlling the pace of upgrades and arranging primary and secondary switchover. At the same time, the new version of ApsaraDB for PolarDB will support the warm buffer pool function. With this function, the data in the buffer pool does not need to be reloaded after the upgrade, avoiding performance jitter caused by the upgrade restart, and making the upgrade process smoother.
Horizontal expansion (increasing or decreasing the number of nodes): Data storage is shared, which makes it possible to increase the number nodes rapidly with no data copy of any kind being needed. The entire process takes only 5 to 10 minutes. If the number of nodes is increased, there is no effect at all on the business applications, but if the number of nodes is decreased, there is an effect only on the connections that fall on the terminated nodes, and the connections can be re-established. With the new version of ApsaraDB for PolarDB that supports the warm buffer pool function, newly added nodes can quickly provide the same performance as old nodes. They can obtain the content of the user’s most commonly used pages without having to read data from storage, providing a smoother experience. After a node has been added, PolarProxy can dynamically detect it and automatically include it to the read nodes of the read/write splitting backend. This allows applications that are connected to ApsaraDB for PolarDB to use the cluster access address (read/write splitting address) to immediately obtain better performance and throughput.
Of the application scenarios of Yuanfudao, the answer scenario is similar to the online shopping scenario of the Double 11 Shopping Festival. The teacher publishes the question, and then all the students need to modify a database at the same time. The plugin of ApsaraDB for PolarDB in the flash sale scenario can enhance the concurrency capability under high-pressure situations, effectively resolving the write pressure caused by the burst scenario.
After Yuanfudao migrated data to Alibaba Cloud ApsaraDB for PolarDB, during business peaks, it was able to support a large number of students studying online concurrently, and the business did not feel any pressure. When coping with a business peak, it only needs to make preparations one hour in advance to temporarily upgrade the business capability to a level that can cope with 1 million concurrent student visits.
Moreover, because ApsaraDB for PolarDB database resources can be scaled elastically on demand, after Yuanfudao migrated from MySQL to ApsaraDB for PolarDB, it cut the excess capacity of 5 read-only databases, reducing database costs by almost 70%. In addition, it is able to reduce the effort involved in online database management by 95%, making hiring a senior professional DBA unnecessary and greatly reducing the labor costs for database maintenance. Finally, from the business standpoint, Yuanfudao’s migration to ApsaraDB for PolarDB has significantly improved the user experience.
Since the epidemic began, Alibaba Cloud has helped nearly 180 million primary and secondary school students in China attend classes at home by supporting DingTalk, Youku, national Internet cloud platforms for primary and secondary schools, and various social teaching organizations, making it the largest online education technology service platform.
While continuing to wage war against the worldwide outbreak, Alibaba Cloud will play its part and will do all it can to help others in their battles with the coronavirus. Learn how we can support your business continuity at https://www.alibabacloud.com/campaign/supports-your-business-anytime