Frontend Technology behind Tmall’s Record-Breaking US$74.1 Billion GMV

Image for post
Image for post

Step up the digitalization of your business with Alibaba Cloud 2020 Double 11 Big Sale! Get new user coupons and explore over 16 free trials, 30+ bestselling products, and 6+ solutions for all your needs!

By Haiwen, Frontend PM of Double 11

Released by TaoXi Technology

Every Double 11, various teams within Alibaba Group face changes and adjustments to cope with the evolving needs of our customers. And that is why it is not a coincidence that one of the most popular sayings in Alibaba Group goes like this: “The only thing that remains unchanged is change itself”. These changes present both challenges and opportunities to teams.

When facing these challenges, three pursuits should be guaranteed. First, we should efficiently support and ensure the success of the business. Second, we should ensure customers can receive the ultimate stable and smooth experience. Third, we need to pursue continuous innovation in the evolution of frontend technology. In order to realize these “three pursuits”, a lot of designing and testing have been carried out in various aspects, including technical scheme, process mechanism, and personnel organization.

With all the efforts above, the 2020 Double 11 has ended successfully despite facing two massive traffic peaks, and Taobao FED has also achieved its goals. During the 2020 Double 11, Taobao FED applied a large number of optimization methods and innovative solutions to help improve business transformation. Some of the technologies adopted include FaaS, PHA, and ESR to further extend frontend capabilities and boundaries to servers, clients, and CDN nodes. In addition, Taobao FED adopted visual restoration and integrated R&D to improve R&D efficiency, which greatly mitigated the impacts from resource bottlenecks.

This article provides an overall introduction to the experience of the Taobao FED during the 2020 Double 11, and the frontend technologies used this year.

Changes & Challenges

During the 2020 Double 11, the first change that comes to mind is the extended promotional and purchasing period to help lessen the traffic burden on servers. This change has led to a series of “continuous changes” within Alibaba to better cope with the demands from customers.

Change of Purchasing Peak: This year’s Double 11 festival has changed from having only one purchasing period to two purchasing periods, which means peaks have doubled correspondingly in the pre-sale, warm-up, and official sale stages. Not surprisingly, the change of the purchasing peaks has brought many challenges to this year’s event. First of all, R&D workloads have substantially increased, requiring efficient R&D while the time limit and resources remain unchanged. Secondly, maintaining accurate switching, stable operation, and smooth experience are challenging given that there are now 6 key stages to manage. Thirdly, considering the festival spans over 20 days, strong organizational processes and mechanisms are required to ensure production security, personnel status, and rapid response.

Image for post
Image for post
Double 11 sales timeline

Homepage Revision: The content on the latest Taobao homepage has been changed dramatically, such as content simplification, recommendation in advance, as well as channels embedded into the recommendation part. In the absence of fixed traffic portals, each business needs to actively adjust its operation strategy, product strategy, design plan, and technical plan. In addition, the recommendation capabilities in various scenarios need to be continuously enhanced. During the 2020 Double 11, the number of displaying units was increased to over 1,000, which in theory, can lead to significant increase to the overall performance. Moreover, the click-through rate of users has also improved through smart UI.

Image for post
Image for post
Version comparison of Mobile Taobao

Business changes: Business innovations and new methods emerge one after another. Many business, including mini details, flagship stores, price expressions, order red packet, and Zhima Go, are brand-new marketing activities and upgrades of the old business model. For new explorations in business changes, some technical problems, such as architecture selection, reconciliation, consistent expression, and scheduling need to be solved.

Doing Your Job Well

In order to ensure that your application works well, the first thing you need to do is do your job well. In our case, this means ensuring the R&D demands and stability requirements of the system are met. In terms of R&D of demands, Taobao FED has implemented the automatic development of most UI modules through Design to Code (D2C). Taobao FED has also reduced the interactive R&D cost by building an Eva interactive system, and improved the R&D and O&M efficiency through the integrated R&D of Serverless. By doing so, the frontend is no longer a resource bottleneck. For stability, it is guaranteed through a series of mechanisms and tool systems. At the same time, strategies and plans for the prevention and control of asset loss, which receive little attention from most of us at ordinary times, have been added.

Last year, Taobao FED launched a special R&D efficiency project. The core of this project was to improve R&D efficiency by using the D2C platform Imgcook. During the 2019 Double 11, 78.94% of new module codes were automatically generated through Imgcook, and the code availability rate reached 79.34%.

This year, frontend intelligence supports the upgrade of the frontend R&D model. Several Business Units (BU) jointly build their algorithm models and data sets of frontend design recognition. The technical system of D2C is comprehensively upgraded, including improvement on UI polymorphism, live streaming components, and cyclic intelligent recognition. During the 2020 Double 11, 90.4% of new module codes are generated intelligently, with the code availability rate reaching 79.26%. Compared with that in the last year, Taobao FED upgrades its intelligent design check so that visual drafts require no manual adjustments. Thanks to the R&D efficiency improvement through D2C, resource secondment for venue development did not appear again. In addition, compared with the traditional module development model, the coding efficiency (assessed based on the ratio of module complexity and R&D time) has increased by 68% with support from D2C. The module demand throughput with fixed human resource has also increased by about 1.5 times per unit time.

Image for post
Image for post
D2C operation process

In the e-commerce filed, interaction is an important plan for the growth of user number, which also plays an important role in enhancing the stickiness and activeness of users and attracting new users. During the 2020 Double 11, TaoXi Technology Department launched the “Super Cat Stars”. Different from activities in previous years, users can raise their own cats online and help them to be super cat stars this year. These three cute cats on Taobao are very different from each other in styles and many consumers are captured at the first sight. Through a complete set of solutions provided by EVA interactive system, the R&D efficiency is greatly improved. Thus, it can support the intercommunication of “Super Cat Stars” in multiple apps, such as Mobile Taobao, Maoke, and Alipay. With the help of the client capability and EVA interactive system, the performance and memory are well controlled so that most users can experience stable and high-definition interaction. This also guarantees zero failure and second-level entering, and the number of participants in “Super Cat Stars” reaches another new record. The subsequent articles will describe in detail how Taobao interactive team provides fast, steady, and smooth interaction during Double 11. These articles will cover three aspects, which are the foundation of interaction, the EVA R&D system, and the overall stability solution.

Image for post
Image for post
Interactive applications

By connecting page code and service code, the cloud plus device R&D model of Serverless is developed, which provides complete support for frontend pages and back-end services. By doing so, intermediate communication and collaboration costs are reduced. In terms of the implementation of the Tmall ranking and V-ranking, the overall R&D efficiency of Node FaaS-related business is improved by 38.89% during the 2020 Double 11. Supported by the cloud plus device model, the demand for Double 11 shopping guides in the industry realizes the quick entry of outsourcing, which increases the overall efficiency by about 20%.

Stability assurance

During Double 11, the stability is ensured all the time. Next, I will give a brief introduction to the stability from following key aspects:

Change assessment: During Double 11 of every year, Taobao FED utilizes the experience gained in previous Double 11. Therefore, main risks lie on new and changed parts. The changes here refer to both technical and personnel changes. Therefore, Taobao FED need to fully assess the changes, and verify the assessment results during the 99 Shopping Festival and make sure they will not change again. In this way, the Double 11 can be maintained in a stable state.

Stress testing: First, the traffic should be evaluated. The resources such as machines and bandwidth should be prepared according to the changes of this year and the data of last year. Second, the single-line stress testing should be carried out to ensure the normal operation of services and upstream and downstream systems, under the estimated traffic model. Third, the full-procedure stress testing is conducted to verify the operation of the concurrent traffic at midnight, especially for some underlying public services and the guaranteed priority.

Backup solution and plan: Backup solution is generally used to minimize impact on user experience and business under large traffic or other uncontrollable factors. Plan needs to assess possible situations and provide solutions accordingly.

Acceptance: Acceptance includes many aspects. Functional preview refers to operation performed according to all paths of users. Currently, this is still a manual operation. Time traversal refers to verification by setting the page and system status as active, which requires the connection of upstream and downstream systems. Server model acceptance involves high-end, mid-end, and low-end server acceptance. Many features of business for low-end servers must be degraded. In the case of stability acceptance, the performance and stability of each page are guaranteed separately. However, problems may occur after the business stack, especially for venues, interactions, live streaming, and flagship stores. There are giants in memory consumption and some of their traffics are connected. It is difficult to ensure the performance and stability after switching. Therefore, overall full-procedure acceptance is required.

Changes and emergency response: Fault data shows that most of the problems are caused by changes. Therefore, it is particularly important to implement change management. Change management can be divided into weak management and strong management based on the time. It can also be divided into management for core applications of Alibaba Group, core applications of BU, and non-core applications based on the service level. CR and review mechanisms are established for changes. Emergency response refers to the circulation mechanisms of problems, public opinions, and faults during core active period. Emergency response has different time requirements for finding, locating, and fixing problems. Emergency response also makes decisions for handling problems at different levels.

Monitoring: Taobao FED continuously develops and upgrades monitoring capabilities to ensure the availability during peak hours and the timeliness of alerts in all business scenarios. End-to-end monitoring and data analysis platform are required in increasingly complex scenarios. The gray release lacks metrics and fixed-point monitoring. Based on these problems and requirements, Taobao FED JSTracker provides an overall solution for production safety. Through JSTracker, Taobao FED builds an end-to-end frontend monitoring and data analysis platform, as well as the intelligent platform for real-time monitoring, multi-end coverage, and data analysis. At the same time, Taobao FED creates a frontend data dashboard for the Double 11, based on page information, error logs, origin server data, and FaaS logs.

Prevention and Control of Assets Loss

The platform has always been weak in frontend asset loss prevention and control. There are many cases of asset loss problems triggered by front end. Previously, it was all guaranteed by the developers with their experience and awareness, but it is not a systematic way. Last year, centralized check and manual preview at the team level was organized, which is costly in terms of human resources and time. What’s more, it was difficult to ensure quality and accumulate experience. Therefore, in order to have a lower-cost and higher-efficiency method for the asset loss prevention and control, Taobao FED has focused on the design and implementation of related products, since the beginning of 2020. At the same time, Taobao FED has also paid more attention to asset loss prevention and control on back-end sides of merchants and operators.

Taobao FED has divided the asset loss prevention and control into three stages: R&D stage, testing stage, and operation stage. In the R&D stage, warehouses with asset loss risks are marked, and cases such as regular prices, discounts, and default copy are enumerated for prevention and control through static scanning and UI test case scanning. The testing stage mainly refers to the stage when merchants and operators set up preferences, rights, and interests. In this stage, asset loss is prevented and controlled through unified expression, double check, boundary limiting, and low price warning. In the operation stage, snapshot comparison and server-side data reconciliation are available. However, Prevention and control in the operation stage are relatively delayed. Therefore, it is highly likely that the actual impact had been caused when asset loss has been detected.

At present, prevention is still the focus. It is impossible to totally guarantee that no asset loss will occur. Now, Taobao FED is thinking about prevention and control measures at the procedure level and in production environments to develop some alarm and automatic protection capabilities in protecting the platform.

Benefits

In addition to doing our job well, Taobao FED also hopes to bring incremental value to the business. This chapter will introduce benefits for business form four aspects: venue performance improvement, improvement of new solution for basic procedures, accuracy improvement through calling customization strategy, and click-through rate (CTR) increase through smart UI.

The venue is one of the leading roles during Double 11 every year. So, the user experience in the venue is also the most important aspect. Under the increasingly complex business demands, how to ensure and improve the user experience is becoming a long-standing difficulty. This year, pre-rendering and Server Side Rendering (SSR) are used for user experience optimization. The first step is to redefine the standard of second-level entering time from the original frontend entering time to the visible entering time since clicking. Then, the time of client-side routing and WebView startup are added. By doing so, the user experience becomes better, covering dozens of scenarios, including the main venue, industrial venues, and beyond-device deployment venues.

Pre-rendering is a technical solution used in Double 11 of this year. It is used to enhance user experience in entering the venues. Time-consuming operations such as WebView initialization, page resource loading, and JS execution in the original H5 page rendering process are executed in advance, and the page “rendering” is completed in the off-screen state. Thus, when users enter the venue, the “pre-rendered” page is used to greatly save time for entering the venue. By doing so, the average entering time is shortened by 200ms to 700ms, and the second-level entering rate is increased by 10% to 14%. The optimization brings higher absolute benefits to the mid- and low-end machines, and the second-level entering has been realized on low-end machines. This allows users to enter the venue more smoothly, which is more obvious on mid- and low-end cellphones. Subsequent articles will also cover practices and reflections on performance optimization, including pre-rendering, data snapshots, and parallel requests.

Image for post
Image for post
Comparison of pre-rendering effects on low-end and mid-end cellphones

This year, the SSR technology is used in the venue without changing the existing architecture or the business. This improves the second-level entering to a new level, which is 82.6%. When the user experience is optimized, business indicators such as CTR have also increased significantly, bringing good business value. In subsequent articles, specific practices and methodologies of frontend for engineering and business effect evaluation will be introduced in detail. Apart from that, reflections and solutions of solving frontend module code execution, isolation, and performance optimization in servers will also be introduced.

Image for post
Image for post
SSR effect comparison between low-end and mid-end cellphones

The basic procedure is the core of e-commerce procedures, including the homepage, product details, micro details, transactions (such as order placement, orders, shopping cart, and successful payment), information flow, my Taobao, and other basic services on Taobao App. The existing technical solution is using the Native basic procedure in Mobile Taobao to pursue the ultimate experience and stability. Out-of-site traffic and Alibaba Apps, including Alipay, use the H5 basic procedure, which pursues flexibility and availability. With the improvement of Alipay’s containerization system and its cohesion in other apps, preparation for new containerized basic procedure has been completed. In addition, some disadvantages of the H5 basic procedure, such as remote deployment of resources and use limits of Native basic procedure can also be optimized.

“DinamicX” solution is mainly for solving the business customization problems and the consistency in Android, iOS, and H5 to achieve “developing in one system and implementing in all three”. With the help of the previous “New Aochuang” and “DinamicX” solutions, the containerized basic procedure has been rapidly developed, achieving consistency in four systems. In terms of the performance, the containerized version is 2s quicker than the H5 version in the loading time, basically achieving the goal of second-level entering. In terms of the business, the UV conversion rate of the containerized version is over 70% higher than that of the H5 version.

The containerized version has covered many Alibaba Apps, such as Alipay, Mobile Taobao (Special Offers), Youku, AMAP, Taobao Shop, and Yitao. It is also integrated into many external media apps through the Alibaba’s Baichuan SDK. It is also applied in business, such as daily boutique, big-name preference, Taobao Special, Taobao Live, Baichuan Media, Youku, Xiaopu, light store, Ant Credit Pay and other business.

With the traffic ceiling and the further escalating of e-commerce competition, all companies are pursuing the way to increase user number. User number growth involves a wide range of issues. This year, Taobao FED focuses on calling technology. It is a technology of calling Mobile Taobao through external traffics. Calling technology is very simple and can be triggered by URL scheme. It also can be very complex, because there are many restrictions on the calling protocol in different channels, operating systems, and apps. There are also various compatibility problems. What’s more, different services in the calling procedure may have their own service customization requirements, for example, parameter pass-through. The efficiency of the calling procedure is especially important because the efficiency varies in different scenarios and services. Therefore, the calling effect needs to be monitored and compared. To solve these complex problems, Taobao FED has made another upgrade in the calling technology, built customizable calling strategies, and created a detailed A/B testing procedure for calling. As shown in the 2020 Double 11, the calling efficiency in different scenarios shows a relative increase by 25% to 40%.

Image for post
Image for post
Diagram of calling policy

With the development of mobile Internet and recommendation systems, the accurate matching between people and commodities has greatly improved the business efficiency. More and more refined methods are gradually applied to personalized recommendation, such as scenario-based recommendation and consumer-targeted recommendation technology. At the same time, the information of commodities is richer than ever before, such as buyer show, brand endorsement, and worry-free shopping service. Different users have different demands for content UI expression. Therefore, the business will be significantly improved by providing the right UI expression for different users.

At the beginning of this project, Taobao FED made it clear that the same UI scheme differed in different scenarios, while different UI schemes also differed in the same scenario, through direct quantitative tests. In other words, it makes sense to use different schemes in different scenarios. During the 2020 Double 11, Taobao FED adopts the smart UI on a large scale for the first time and implements multiple frontend modules. These modules include “guess you like it” module, product module, and store module. Smart UI covers pre-sale and formal sale stages of Double 11, withstands the test of the traffic peak, and helps steady increase in traffic. With the support of smart UI, the highest PV click-through rate is increased by more than 10% across 300 venues covered.

Technology Upgrade

With the technological evolution and business development in the industry, Taobao FED has made new technological attempts and upgrades, especially in the in-depth use of FaaS, PHA progressive experience enhancement, and application of edge node rendering.

Serverless, like a piece of ice, gradually emerges from the deep ocean. Since the Double 11 in last year, Serverless has been applied to all aspects of the frontend field. During the 2020 Double 11, FaaS has been applied in venue and marketing business, which greatly enriched the business complexity. With further improved capabilities, FaaS is able to support business with QPS increased from 2,000 to 50,000. The CPU usage is reduced by about 50%. In terms of R&D, solution system with capabilities such as unit assurance, big promotion management, expert system, and function inventory is constructed. O&M efficiency is increased by about 50%. It also lowers the threshold for R&D and supports the fast entry of outsourcing.

PHA is short for Progressive Hybrid App, which is an application framework for improving Hybrid experience. It is also a progressive Web application for improving the page loading speed and interactive experience. The applications developed with PHA are essentially not separated from frontend development and W3C standards. However, they still possess the features and experience of native applications. It may remind you of PWA, but PHA is stronger in UI capabilities and faster in loading speed than PWA. Currently, it has been implemented in multiple apps, such as Mobile Taobao, Mobile Taobao (Special Offer), Lazada, and CBU, and has supported big promotions such as 618 and Double 11. PHA, together with the client, frontend team, and data analysis team, has implemented cross-stack collaboration and made performance optimization. It has also sorted the performance event tracking in the entire procedure and defined new performance standards, which refers to the change from clicking to visualization. Optimization methods such as preloading, pre-rendering, accelerated resource download, and offline resource are used in PHA.

Currently, rendering nodes are mainly on clients or servers. Corresponding methods of Client Side Rendering (CSR) and Server Side Rendering (SSR) are used. They have their own applicable scenarios, advantages and disadvantages. Now, with capabilities of Alibaba Cloud, rendering can be transferred to the CDN nodes, which is called Edge Side Rendering (ESR). It provides rendering capability for the frontend and, at the same time, can also make use of a large number of computing resources on CDN machines.

Alibaba Cloud launched EdgeRoutine, the CDN lightweight programming environment, which points to a new direction for us. Rendering can be performed in advance on the CDN nodes. The access policy of CDN is to find the node that is closest to the user. Just like the last mile policy in express transportation, the parcels will always be delivered to the allocating center closest to the customer. So, the network scheduling duration of the page can be greatly improved. With the resource sharing feature of CDN nodes, to cache some data can be cached in CDN nodes to reduce remote data requests.

This solution, the combination of ESR and CDN cache, is applicable to pages with low data refresh rate and high traffic. Take the Talent page as an example. The first page rendering time can be shortened by about 50%. At present, EdgeRoutine has just started. Thus, its application scenarios are relatively limited, and its capabilities are still insufficient. Its system also needs to be continuously built. However, this new technology provides more possibilities for the frontend technology, which demands for constant exploration and improvement.

As the most core shopping festival of Taobao during a year, Double 11 receives the biggest investment in various aspects. Although I have participated in Double 11 for eight times, it is my first time to experience it as a frontend PM. This time, I felt differently.

Complexity: In terms of business, there are unique customized main venue, main interaction, and cat gala for Double 11, as well as Taobao’s shopping guide, industry, marketing, live streaming and many other services. Taobao also connects with multiple BUs of Alibaba Group, such as Alipay, Youku, Local Life, Alimama, and Cainiao to collaborate and cooperate with merchants, ISVs, logistics, and media. The complexity is also shown in technologies, for example, the development, setup, origin servers and CDN of the frontend page. Containers, middleware, capacity preparation, traffic allocation, and data center deployment of Node FaaS are also included. It is necessary to further explore the whole system.

Process: As a major challenge for Taobao, Taobao FED has developed a set of mature process mechanisms, including personnel composition, communication mechanism, time scheduling, organizational assurance and other aspects. Those aspects are guaranteed by detailed mechanisms.

Collaboration: Double 11 is a very good activity, allowing all teams, staffs, and BUs to cooperate and further improve such a large system together. Many technology upgrades and breakthroughs are made and further promoted during Double 11. The pre-rendering solution used during 2020 Double 11 is implemented and verified within a short time, through close collaboration of client and front end.

Multiple perspectives: As a frontend PM, I learned that issues can be views from multiple perspectives, such as different technical positions, the entire procedure, and the business. Taking the review of a change as an example, previously I paid more attention to the code implementation of the change. Other issues, such as its impact on the upstream and downstream, the stability, and the business, the introducing of new risks, and the impact scope, also need to be comprehensively measured. Therefore, making a judgment is not just simply make a choice, but often needs to make trade-offs.

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store