Developing and Deploying Frontend Code in Taobao: Eight-year Long Case Study Analysis
Step up the digitalization of your business with Alibaba Cloud 2020 Double 11 Big Sale! Get new user coupons and explore over 16 free trials, 30+ bestselling products, and 6+ solutions for all your needs!
Written by Zhang Wei (Shangpo), and edited by Chengzijun, from Taobao Technology Department, Alibaba New Retail
In Alibaba’s Taobao frontend team, development and deployment patterns are still evolving as technology advances. While numerous capability modules inside and outside the system are developing, underlying technologies such as Language Server Protocol (LSP) and the Debug Adapter Protocol (DAP) also reach maturity. Currently, we are integrating and updating the development ecosystem, which evolved over time, using the method of Integrated Development Environment (IDE). We reorganize the old process, identify breakthroughs from users’ pain points, and explore the best combination of capabilities in the current development scenarios to build a common underlying platform and upgrade the existing model.
I joined Taobao in 2014 as an intern and became a regular employee in 2015. As a frontend developer, I have experienced the development and deployment iterations of various scales and contributed to some of them. This article is based on my experience and content contributed by my colleagues.
I divide the whole story into four phases: In the stone age of 2013, code release, storage, and deployment were transformed to GitLab technologies. In the silver age of 2014, engineering tools were built with the then-mature Node.js. In the golden age of 2015, the online and offline frontend engineering system was built in a more systematic way. In the current age, we are building right now with diverse technologies including clients, containers, and algorithms.
The Stone Age
Before 2013, the frontend development mode was not very different from the backend mode. In most cases, the SCM code was managed based on SVN. After daily development, the code was uploaded to the SVN server using tools such as the command line or TortoiseGit to complete the development process of the day. During deployment, the test code was uploaded to the test server manually or through FTP. After the test finished and the code version and content were manually verified, the code was uploaded to the production environment to complete the deployment process.
In addition to SVN, code management tools at that time included GitLab, which is based on GIT protocol. SHA-1 algorithm enabled easy version detection and flexible version control for locally distributed code. Therefore, we gradually migrated the SVN development workflow within the department to GitLab. This change also opened the way toward transforming Taobao’s frontend development.
While we were enjoying the convenience brought by the new tool for code version management, we thought about improving the deployment process which was tedious and required manual assurance. With the new publishing system in place, we found that the webhook mechanism could be encapsulated and work with rule-based operations to trigger the connection with the publishing system and automate the publish and deployment process. We triggered webhook event notifications based on publish/version information, a Git tag with fixed rules, to trigger the invocation and launch processes of the publish system. This was our first automated publish process for frontend development.
While this phase might look primitive (that’s why I call it the stone age), it provided a solid foundation for subsequent growth after the infrastructure for development had undergone a radical transformation. Meanwhile, with the advancement of frontend technologies, we entered a new phase based on this underlying system.
The Silver Age
Node.js, which had been developed for more than five years, became mature around 2014. The team developed a local CLI terminal tool called DEF based on Node.js. The tool featured a management mechanism for installing and calling Node modules. By encapsulating the development modules as plug-ins in the DEF system, tool developers call plug-in modules in different combinations to compile, debug, and deploy frontend projects.
Under the KISSY framework development system, we used Node.js capabilities to compile scripts by means of regular expression matching and UglifyJS parsing AST and built engineering tools for frontend development by using Node.js, gradually replacing the tools implemented through the Ant platform in the Java system.
At that time, in addition to using tools such as Yeoman to complete the initialization process, we developed basic concepts such as builder with the continuous abstraction of business compilation and building logic.
In a traditional project organization, the configuration logic of building and compilation and the code files of the project are stored in the same code directory. From a team’s perspective, the building and compilation logic is scattered without a unified update management logic. If there are any updates or changes to the building and compilation tools in a certain development scenario within the team, updating the tools across the team can be very costly. At the same time, from a user’s point of view, there are a large number of identical building dependencies that are installed in each different project of the same type. Each project requires building dependencies installed before building, wasting space, and time.
The compilation and building dependencies are converged and abstracted through the builder, and the compilation dependencies of the project are maintained in an npm package. Therefore, the compilation logic changes from many-to-many to one-to-many. This greatly reduced the space occupied by building logic and enabled the reuse of installed building dependencies, thereby simplifying the building process and improving building efficiency. This also laid the groundwork for the subsequent online building system.
In this phase, Node.js became another basic skill besides page development. With the creation and large-scale application of Node.js tools, frontend development and deployment entered the silver age. As people were getting familiar with the basic application of the Node.js system, we began to design a more sophisticated frontend engineering system.
The Golden Age
After the team completed the infrastructure construction of engineering tools and web service frameworks based on Node.js, more and more online and offline engineering systems developed in the following years. The services and tools designed and built during that phase have formed the infrastructure for the current frontend development and deployment processes.
With the development of local DEF tools, more and more tool plug-ins have emerged. Users find tools for any local development functionalities in the plug-in ecosystem. However, in actual project development, users must know what plug-ins to use, their usage, and the best combination of plug-ins. As the number of projects increases, users will find it more difficult to memorize tool combinations and need to be careful when they switch plug-in combinations between projects.
In the face of the problems derived from diversification, we proposed the development suites concept on top of the original local DEF tool. Five functionality points of local development tools were extracted, namely init, dev, build, test, and publish. We classified the existing plug-in capabilities by project development type and came up with a set of standard local tools for each development type, which is a development suite.
By using the suites, users may start and use the corresponding tools and services via a unified command when they develop different projects. The new local suite system also provides each user with a more fine-grained and flexible method for specifying versions. It has an improved log monitoring system that monitors and resolves problems in real-time in order to ensure frontline developers are satisfied with the development tools.
As the process for publishing frontend resources based on the webhook capability of GitLab is widely used, developers asked for a better publishing experience, and our team also had higher requirements for more systematic and organized process governance and data statistics of the frontend development process.
In this context, we integrated the publishing capabilities and streamlined the old publish process to produce a deployment platform under the frontend system. In practice, the operation of git publishing tags is simplified and users can directly start their publishing tasks with a simple command-line command. In the entire publishing process, the execution information and logs of each part are returned and displayed through persistent connections, providing a publish process that requires new operations. For the first time, users had greater control over the publishing process with higher information visibility from submitting code for detection to deploying resources to CDN. This has greatly improved the publishing experience.
After the deployment platform implemented by Node.js streamlined all basic parts, it began to remodel the deployment and release processes in a systematic manner as the underlying structure. The parts in the original process were abstracted to three main steps: build, detect, and publish. As the business development type changed from PC to wireless, the frontend concept of publish type was developed on the execution layer based on the three basic steps. For example, at that time, the types included web application, frontend resource, and Weex application. For each publishing type, the underlying system adopts a different deployment method in terms of the publish process details. Local development together with the suite tools forms development combinations with online and offline steps associated and unified, providing the best deployment experience for every development mode in a fine-grained manner.
Users no longer need to trigger the publish process through GitLab’s webhook. With one-click deployment enabled, the frontend engineering system entered yet another new phase.
In the previous phase, we centralized and encapsulated the compilation logic and formed the specification description for development, compilation, and building in the form of an npm package. In deployment, however, the builder is not connected to the online deployment process, and different local system environments can also lead to unstable building results.
With the development of Docker at that time, we were considering building an environment for builders to run based on Docker’s ability to quickly start and stop a unified environment, and simulate the local compilation and building process through containers started by Docker. At the same time, the building content is kept as stable and consistent as possible by using the unified container environment along with the version identification information (for example, commit) provided by GitLab after migration, as described previously.
We were able to build the Cloud Building system in the field of frontend continuous integration after a series of explorations, such as establishing frontend Docker container clusters, connecting network channels between containers and applications, formulating scheduling policy logic for building tasks, and storing Docker container runtime logs in one place with Redis.
When we built the online building capability through the Midway.js application framework of the Taobao system, we were also establishing a friendlier and robust business logic for online building tasks on the business side. By calling the Cloud Building service during online publish, the builder, repository, and user are connected during the execution process of a building task. Based on this relationship, we established a comprehensive scheduling management mechanism for task execution, which can record specific information about how builder tasks are executed
The builder view provides information about how the builder runs for different repositories, including the complete runtime logs, building time, and building error records, allowing users to easily troubleshoot and optimize the builder. For iterations and updates of builders, builder developers can set the phased release range for users to ensure stable coverage of deployment through a complete phased release process when versions are being updated. If anything goes wrong during a builder release, the phased release will be canceled immediately to resolve the problem. This mode minimizes the risk of version releases. A complete phased release mechanism plays a decisive role in scenarios with extensive horizontal coverage.
Users no longer need to push local building code to the repository. After source code is developed, it is directly committed to the repository. When a publish task is executed, the Cloud Building system automatically builds stable and unified building results.
After compilation and compression of a project are complete, the frontend resources at the edge of the deployment stage still have many hidden problems in the face of increasingly complicated business scenarios. Manual inspection can no longer ensure the reliability of the resources to be deployed. Therefore, we need a standardized method to systematically detect issues as the last defense before deployment and part of the entire process.
Similar to Cloud Building, a runtime environment for online resource checking is built based on Node.js, and the checking logic is further abstracted and presented in the form of an npm package to form a checker. After pre-processing of user resources is complete, the Menshen system checks the user resources for things like resource URL, sensitive words, and code annotations, and tells users how the issues are handled by the level of checking results. As the last defense before resource deployment, the system is named Menshen, meaning that it safeguards every resource deployment for quality and security.
Menshen automates the checking process before deployment, removing the need for users to manually check for problems that cannot be detected by conventional compilation tools.
As frontend engineering within the system develops, more and more development solutions designed for different business scenarios emerge. The deployment platform, Cloud Building, and Menshen have become the de facto standards and infrastructure for the group’s frontend development scenarios. In this context, underlying capabilities were further opened, and frontend development processes specific to the upper-layer business systems can be built using the matured underlying engineering mid-end capabilities.
For example, from the perspective of the Alibaba system, different delivery clients and different resource organization forms enable the best development experience for developers working in their respective business scenarios with the abstraction of underlying engineering capabilities.
Let’s understand how Pegasus Scaffolding Service used by the Alibaba economy was designed.
In addition to general development, in the e-commerce system, especially in marketing scenarios, there is a demand for quick page production. Therefore, a scaffolding platform for generating pages from modules was built. The ability to quickly build pages is achieved by dividing a page into a frame and several modules. Frontend developers develop and combine the page elements while streamlining data from different services.
After local modules are developed, frontend developers publish the module resources to the online environment, assemble the page, inject the context, and preview the page in the scaffolding system to finally publish the page. At the underlying layer, the origin server system implemented by Node.js delivers CDN content to provide previews for hundreds of millions of visitors. This system has become one of the core scenarios of frontend development after several rounds of iterations.
Since 2015, Taobao’s frontend development system has shifted its focus from local tools and deployment platform to online and offline system solutions. While ensuring continuous breakthroughs and upgrades for the basic experience of the frontend development, large-scale and enterprise-level frontend engineering construction transformed the scattered daily development mode to a more systematic, efficient, and scalable one. It also laid a solid foundation for breakthroughs and upgrades in development and deployment modes at higher levels in the future.
The Future Age
As the engineering system matures over time, in recent years, we have been thinking about the possibility of a groundbreaking and future-oriented solution in order for more efficient development. With many internal products constantly improving and iterating, new directions and patterns emerge.
The imgcook project originated from visual building and obtains its current form after continuous improvements and changes. From an external perspective, the project went through several phases:
- The First phase: Users built pages by dragging and dropping page elements to generate the source code and then publish the page.
- The Second Phase: An image scanning engine was built for pixel scanning, and visual design tools such as Sketch were integrated to convert design drafts. In this phase, frontend developers uploaded visual drafts to the platform for code conversion.
- The Third Phase (The Current Phase): Platforms are improving their image-to-code conversion capability through AI technologies like deep learning. The basic development process is also being streamlined on the platform side. The internal development mode is being migrated to the console for designing and coding, providing comprehensive development and deployment experience.
Currently, design-to-code (D2C) capabilities are gradually being verified in the development scenarios of intelligent restoration of components, forms, modules, and pages, and producing favorable results.
Since last year, we have been committing to a new direction of IDE, aiming to create a new and more efficient development mode based on the platform capabilities of IDE.
From an external perspective, two trends have appeared: First, in the IDE field, star startups are emerging, for example, Theia in the Eclipse system, Coder implemented through compatibility with VSCode, and CodeSandbox, a rising star in the field of web development. Another trend is that cloud vendors are joining in, including AWS which took over Cloud9, Tencent which took over Coding, and Azure which provides the Codespace service.
With ever-evolving technologies such as editors and Docker, IDE-related service providers are working to seize opportunities for improving development efficiency and satisfaction with the ability to integrate development environments, so as to identify users’ pain points and expand the market.
Inside the System
The frontend development mode now has more and more tools and services. Tools and services today are not only command-line tools, but they also have become the entry to tools and services of rich interactions. Meanwhile, a business development mode usually needs to be connected to development tools and services provided by other systems and teams. For example, the development of the Alipay mini program now requires services such as simulator, debugger, and physical device debugging in addition to the basic compilation and preview services.
Alipay mini program reflects many internal scenarios. The current solution to Alipay mini program is a local IDE development tool created based on Electron. In this context, an underlying IDE system has been built inside the system since last year, with the aim that basic online and offline IDE solutions can be implemented through a single set of underlying IDE components.
By using the IDE underlying plug-in mechanism, Taobao has incubated its own IDE-integrated development tools. For example, D2C, Alipay mini program, serverless, and other scenarios are undergoing extensive internal development and usage improvement. With IDE and its plug-in system, we improve basic development experience by utilizing the VSCode ecosystem. We also connect all tools and services along a project’s pipeline with plug-in UI capabilities extended on top of VSCode.
The difference from a previous process is that developers now perform all development operations on the development panel, including coding, debugging preview, and deployment, streamlined in a unified IDE. This is a seed we have planted for the future that is budding.
This article shares my experience in frontend development and deployment in the Taobao system, and we are now working toward breakthroughs in the current development mode.