How TuSimple Built an Automated Data Processing Platform with Serverless Workflow

By Alibaba Cloud Serverless

In April, Alibaba Cloud Serverless Workflow was officially launched for customers in Mainland China commercial use. It is a fully hosted serverless cloud service that is used to coordinate the execution of multiple distributed tasks. This product aims to simplify the tedious work of developing and running workflows, including task coordination, state management, and error handling. This allows users to focus on developing business logic.

Precise Construction of Automated Production Lines in the Cloud

These scenarios face many difficulties. They generally involve many asynchronous distributed tasks with the control logic and task logic intertwined, making the processes complex and lengthy. Distributed tasks may span across public clouds and on-premises data centers. Secure network connections are costly. It takes too long to complete the entire workflow, which wastes resources. When asynchronous and critical workflows are involved, data consistency must be ensured. These scenarios require visualized monitoring of complex execution steps.

Serverless Workflow can address each of these problems. It separates the control logic and the task logic to improve accountability and facilitate management and maintenance. It centrally defines and controls workflows by using templates to simplify orchestration. It also uses a variety of methods like serial or parallel orchestration to orchestrate tasks. It supports various task types, such as function, queue, and cloud service, and connects public clouds with enterprise internal networks. It supports the execution of tasks for up to one year and adopts a serverless, pay-as-you-go billing model. Through the dynamic calling of concurrent functions, it maintains state and message persistence so that information is not lost and eventually synchronized for improved fault tolerance and automatic exception handling. It visualizes the workflow progress and allows the tracing of previous executions.

Building a Simple and Reliable Platform for Automated Data Processing

Road tests generate large volumes of data that must be processed in complex and varied ways. For the same batch of data, different business teams may have different usage and processing methods. Therefore, effectively managing different data processing procedures and reducing the frequency of human intervention can significantly improve productivity.

Road tests are conducted irregularly, which results in high uncertainty in the start times and durations of workflow orchestration tasks. Consequently, it is difficult to maximize machine utilization by establishing an independent process management system in a local data center. Unfortunately, it results in wasted resources. TuSimple already has many local unitized business processing scripts and applications. Due to various limitations, the company cannot migrate all of them to the cloud. They must find a way to rationally use cloud services.

Due to the many complicated steps in the processing workflow, data sharing between different tasks is very important. Due to mutual dependencies between tasks, system reliability is critical. Managing the inter-step states and data of complex workflows is also a challenge for businesses.

In this context, TuSimple began to explore automated data processing platforms. Alibaba Cloud Serverless Workflow is billed based on the number of scheduled jobs. Serverless Workflow is easy to use and integrate and features simple O&M. It provides an effective solution to the preceding problems and is ideal for scenarios with irregular offline tasks. Serverless Workflow also supports task orchestration for on-premises and user-created data centers. TuSimple uses Serverless Workflow’s native support for Message Notification Service (MNS) to solve the data communication problem between the cloud-based and off-premises data centers. This allows the company to properly orchestrate and manage local tasks.

In addition to scheduling, Serverless Workflow maintains task states and data generated during the execution process. TuSimple uses the task input and output mapping and status reporting mechanisms to efficiently manage the lifecycle of each task in the workflow and the data transfers between them.

As its business expands in the future, TuSimple will continue to optimize the operational efficiency and automation of offline big data processing. By exploring various methods, TuSimple will further improve the efficiency of its engineering team and invest more in business innovation.

More Serverless Workflow Scenarios

Many companies share common workflow scenarios. Below, we will look at three typical scenarios where Serverless Workflow can be used:

Order approval processes, with execution durations of up to one year

Lower failure rate and increased throughput in multimedia file processing

If you use the serverless method to construct computing-intensive tasks such as video-on-demand and video transcoding, you can use Function Compute and Serverless Workflow together to launch complicated tasks in three days.

Automated O&M and visualized progress tracking

“Serverless Workflow is a key part of Alibaba Cloud’s serverless product portfolio,” said Yang Haoran, the head of Alibaba Cloud Serverless products. “Serverless Workflow allows users to integrate multiple Alibaba Cloud services, such as Function Compute and our visual intelligence platform, or their own services. It also allows users to quickly build flexible and highly available cloud-native applications through simple and intuitive workflow orchestration.”

Alibaba Cloud launched Function Compute in 2017. By implementing auto-scaling in real-time based on application load changes, this service can scale thousands of instances in one minute, ensuring little latency variation. Function Compute supports the key applications of major users, such as Weibo, Mango TV, BGI Group, TuSimple, and Shimo. Our support allows them to easily cope with business peaks.

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.