How TuSimple Built an Automated Data Processing Platform with Serverless Workflow
In April, Alibaba Cloud Serverless Workflow was officially launched for customers in Mainland China commercial use. It is a fully hosted serverless cloud service that is used to coordinate the execution of multiple distributed tasks. This product aims to simplify the tedious work of developing and running workflows, including task coordination, state management, and error handling. This allows users to focus on developing business logic.
Precise Construction of Automated Production Lines in the Cloud
Workflows are very common in scenarios such as internal approvals, purchase orders, extract-transform-load (ETL) processes, big data processing pipelines, and regular or custom automated O&M processes. The audio and video industry uses workflows for long-term tasks such as the transcoding of multimedia file slices, format conversion, review and verification, and face recognition. The e-commerce and travel industry uses workflows for online customer orders, the artificial intelligence (AI) industry uses machine learning pipelines, and the bioinformatics industry often uses gene sequencing workflows.
These scenarios face many difficulties. They generally involve many asynchronous distributed tasks with the control logic and task logic intertwined, making the processes complex and lengthy. Distributed tasks may span across public clouds and on-premises data centers. Secure network connections are costly. It takes too long to complete the entire workflow, which wastes resources. When asynchronous and critical workflows are involved, data consistency must be ensured. These scenarios require visualized monitoring of complex execution steps.
Serverless Workflow can address each of these problems. It separates the control logic and the task logic to improve accountability and facilitate management and maintenance. It centrally defines and controls workflows by using templates to simplify orchestration. It also uses a variety of methods like serial or parallel orchestration to orchestrate tasks. It supports various task types, such as function, queue, and cloud service, and connects public clouds with enterprise internal networks. It supports the execution of tasks for up to one year and adopts a serverless, pay-as-you-go billing model. Through the dynamic calling of concurrent functions, it maintains state and message persistence so that information is not lost and eventually synchronized for improved fault tolerance and automatic exception handling. It visualizes the workflow progress and allows the tracing of previous executions.
Building a Simple and Reliable Platform for Automated Data Processing
TuSimple is an AI enterprise that focuses on the R&D and application of L4 self-driving truck technology. It provides large-scale commercial operation technology for self-driving trucks worldwide to empower the global logistics and transportation industry. TuSimple has completed its series D round of financing and is valued at over USD 1 billion. The development of self-driving vehicle technology requires the accumulation of a large amount of road test data. This means TuSimple requires efficient road testing and rapid road test data processing to guide the updates and iterations of its models.
Road tests generate large volumes of data that must be processed in complex and varied ways. For the same batch of data, different business teams may have different usage and processing methods. Therefore, effectively managing different data processing procedures and reducing the frequency of human intervention can significantly improve productivity.
Road tests are conducted irregularly, which results in high uncertainty in the start times and durations of workflow orchestration tasks. Consequently, it is difficult to maximize machine utilization by establishing an independent process management system in a local data center. Unfortunately, it results in wasted resources. TuSimple already has many local unitized business processing scripts and applications. Due to various limitations, the company cannot migrate all of them to the cloud. They must find a way to rationally use cloud services.
Due to the many complicated steps in the processing workflow, data sharing between different tasks is very important. Due to mutual dependencies between tasks, system reliability is critical. Managing the inter-step states and data of complex workflows is also a challenge for businesses.
In this context, TuSimple began to explore automated data processing platforms. Alibaba Cloud Serverless Workflow is billed based on the number of scheduled jobs. Serverless Workflow is easy to use and integrate and features simple O&M. It provides an effective solution to the preceding problems and is ideal for scenarios with irregular offline tasks. Serverless Workflow also supports task orchestration for on-premises and user-created data centers. TuSimple uses Serverless Workflow’s native support for Message Notification Service (MNS) to solve the data communication problem between the cloud-based and off-premises data centers. This allows the company to properly orchestrate and manage local tasks.
In addition to scheduling, Serverless Workflow maintains task states and data generated during the execution process. TuSimple uses the task input and output mapping and status reporting mechanisms to efficiently manage the lifecycle of each task in the workflow and the data transfers between them.
As its business expands in the future, TuSimple will continue to optimize the operational efficiency and automation of offline big data processing. By exploring various methods, TuSimple will further improve the efficiency of its engineering team and invest more in business innovation.
More Serverless Workflow Scenarios
Many companies share common workflow scenarios. Below, we will look at three typical scenarios where Serverless Workflow can be used:
Order approval processes, with execution durations of up to one year
Orders in the e-commerce and travel industries and various routine applications within enterprises must go through multiple processes from issuance to approval. These multi-step distributed workflows must travel across company office networks and multiple network environments in public clouds. They may also involve human intervention to ensure strong eventual data consistency. Currently, Serverless Workflow supports the parallel triggering of 10,000 workflows with durations of up to one year.
Lower failure rate and increased throughput in multimedia file processing
Serverless Workflow also applies to multitask orchestration scenarios, such as transcoding, screenshot capturing, face recognition, voice recognition, review, and upload for multimedia files. You can orchestrate and submit Intelligent Media Management (IMM) tasks (or user-created processors) through Function Compute to generate output that meets your business needs. Tasks that encounter errors and exceptions can be reliably retried, significantly improving the processing throughput of multimedia tasks.
If you use the serverless method to construct computing-intensive tasks such as video-on-demand and video transcoding, you can use Function Compute and Serverless Workflow together to launch complicated tasks in three days.
Automated O&M and visualized progress tracking
Automated O&M systems must overcome many challenges, such as differences in step complexity and duration, low reliability of single-host scripts, and complex dependencies. It is also impossible to visualize workflow progress. With Serverless Workflow and Function Compute, you can cope with these challenges. For example, in automatic software deployment, you need to build a Docker container, upload the container image, start and track the images pulled on each node, and start containers with new image versions. The logs generated by the functions at each step are stored to Log Service for query and sharing. Compared with single-host O&M scripts, automation tools based on workflows provide high availability, error handling mechanisms, and progress visualizations.
“Serverless Workflow is a key part of Alibaba Cloud’s serverless product portfolio,” said Yang Haoran, the head of Alibaba Cloud Serverless products. “Serverless Workflow allows users to integrate multiple Alibaba Cloud services, such as Function Compute and our visual intelligence platform, or their own services. It also allows users to quickly build flexible and highly available cloud-native applications through simple and intuitive workflow orchestration.”
Alibaba Cloud launched Function Compute in 2017. By implementing auto-scaling in real-time based on application load changes, this service can scale thousands of instances in one minute, ensuring little latency variation. Function Compute supports the key applications of major users, such as Weibo, Mango TV, BGI Group, TuSimple, and Shimo. Our support allows them to easily cope with business peaks.