Serverless Practices for Large-Scale Data Processing

8 min readMar 4, 2021

By Xiliu, Alibaba Cloud Technical Expert

Preface

You may have noticed a new usage of Alibaba Cloud’s serverless series, which is not that obvious, during your first use. Compared with the traditional server-based approach, the serverless service platform enables rapid scale-out of your apps and more efficient parallel processing. With this platform, you do not need to pay for idle resources or worry that reserved resources may be insufficient. However, in a traditional usage paradigm, you must reserve hundreds or thousands of servers for highly paralleled and short-lived tasks. In addition, you must pay for each of the used servers, even if some of them are no longer in use.

Alibaba Cloud’s serverless product, Function Compute, can address all the preceding concerns:

In a scenario, your tasks feature a small computing budget but need to process massive concurrent requests in parallel, such as multimedia file processing and document conversion.
In another scenario, your tasks feature a great computing budget and require quick completion of each task and parallel processing of multiple tasks.

In these scenarios, whether your tasks can be divided-and-conquered and whether their subtasks can run in parallel may be your only concerns. A long task that takes one hour to complete can be divided into 360 independent subtasks, each of which takes ten seconds for parallel processing. This division shortens the processing time from the previous one hour to only ten seconds. Moreover, when using the pay-as-you-go billing method, the computing budget squares with the cost. In traditional billing methods, however, waste is inevitable because you always have to pay for all the reserved resources.

Next, let’s elaborate on serverless practices for large-scale data processing.

Supreme Elasticity to Cope with Computing Load Fluctuations

Before we move on to large-scale data processing examples, let’s take a brief look at Function Compute.

1. About Function Compute

The developer uses a programming language to write an app or service. The development language list shows the development languages supported by Function Compute.
The developer uploads the app to Function Compute.
Trigger Function Execution: You can use one of the following trigger methods to execute a function: Object Storage Service (OSS), API Gateway, Log Service, Tablestore, Function Compute APIs, and SDKs.
Dynamically Scale Out to Respond to Requests: Function Compute can automatically scale out based on the number of user requests. This process is transparent and imperceptible to you and your users.
Charged based on the actual execution time of the function: After the function is executed, you can view the cost in the bill. The billing granularity is accurate to 100 milliseconds.

For more information, please visit the Function Compute product page.

Thus far, you have seen how Function Compute works. Next, let’s explain its operation with a case that involves a large amount of parallel video transcoding. Let’s assume there is a homeschooling or entertainment enterprise that produces teaching videos or new videos in a centralized manner. The enterprise wants to transcode these videos quickly so the clients can play them back shortly. Amid the current pandemic, the number of online education courses has surged, and the peak hours of classes are generally at 10:00, 12:00, 16:00, and 18:00. In this case, it is common to process all newly uploaded videos within a specific period, such as half an hour.

2. Elastic and Highly Available Audio and Video Processing System

The OSS Trigger

As shown in the preceding figure, when a user uploads a video to OSS, the OSS trigger automatically triggers function execution, and Function Compute automatically scales out. The function logic in the execution environment calls FFmpeg to transcode the video and transmits the transcoded video back to OSS for storage.

The Message Trigger

As shown in the preceding figure, an app only needs to send a message to automatically trigger a function for executing audio and video processing tasks. Accordingly, Function Compute automatically scales out. The function logic in the execution environment calls FFmpeg to transcode the video and transmits the transcoded video back to OSS for storage.

Manually call the SDK to execute audio and video processing tasks

Take Python as an example. The procedure is shown below:

python
    # -*- coding: utf-8 -*-
    import fc2
    import json
    client = fc2.Client(endpoint="http://123456.cn-hangzhou.fc.aliyuncs.com",accessKeyID="xxxxxxxx",accessKeySecret="yyyyyy")
    # Select synchronous or asynchronous calling
    resp = client.invoke_function("FcOssFFmpeg", "transcode", payload=json.dumps(
    {
        "bucket_name" : "test-bucket",
        "object_key" : "video/inputs/a.flv",
        "output_dir" : "video/output/a_out.mp4"
    })).data
    print(resp)

As indicated in the preceding section, there are many methods to trigger function execution. With easy configuration on SLS logs, you can quickly implement an elastic, highly available, and pay-as-you-go audio and video processing system, with a dashboard that supports zero O&M, the visualization of specific business data, and powerful custom monitoring and alerting.

Currently, the implemented audio and video user cases include UC, Yuque, Tangping Shejijia, Hupu, and several leading online education clients. During peak hours, some of them use computing resources with more than 10,000 CPUs and can process more than 1,700 videos in parallel, sporting extremely high-cost-effectiveness.

For more information, check the links below:

Divide-and-Conquer Tasks for Parallel Acceleration

It is interesting to apply the divide-and-conquer idea for tasks to Function Compute. For example, you have a 20 GB 1080p HD video to be transcoded, it may take hours for you to complete the transcoding with a single computer. Even worse, if the transcoding is interrupted midway, you must restart from the beginning. If you adopt the divide-and-conquer idea together with Function Compute, the transcoding process turns into a sequence of sharding > parallel transcoding and sharding > shard merging, resolving the pain points mentioned above.

Sharding and shard merging are memory-level replication tasks and require a minimal amount of computing budget. Transcoding is the only large consumer of computing budget and therefore is split into many subtasks for parallel processing. In this model, the maximum transcoding time of a shard is almost the same as that of the entire large video.
Even if a shard encounters a transcoding exception, you only need to re-transcode this shard. You do not need to rerun the whole task.

By properly dividing the large task and using Function Compute, coupled with a small amount of coding effort, you can quickly build a large data processing system that features high elasticity and availability, parallel acceleration, and pay-as-you-go billing.

Before we introduce this solution, let’s take a brief look at Serverless Workflow. Serverless Workflow can orchestrate functions in an organized manner with other cloud services and your independently developed services.

1. Overview

Serverless Workflow is a fully managed cloud service that coordinates multiple distributed tasks. In Serverless Workflow, you can orchestrate distributed tasks by sequence, branch, or in parallel. Serverless Workflow coordinates task execution based on specified steps. It tracks the status change of each task and executes user-defined retry logic as necessary to ensure smooth completion of the workflow. Serverless Workflow streamlines complex task coordination, status management, error handling, and other efforts required for business workflow development and operation, allowing you to focus on business logic development.

For more information, please visit the Serverless Workflow product page.

Next, let’s use a quick transcoding case for a large video to describe Serverless Workflow’s orchestration function, which breaks down large computing tasks into smaller subtasks for parallel processing to quickly complete a single large task.

2. Quick Transcoding of Large Videos Into Multiple Formats

As shown in the preceding figure, let’s assume that you upload a .mov video file to OSS. After the upload is completed, the OSS trigger automatically triggers function execution, and the triggered function calls FnF for execution. At the same time, FnF transcodes the video into one or more formats depending on the DST_FORMATS parameter in the template.yml file. In this example, we want to transcode the video into the MP4 and FLV formats simultaneously.

A video file can be transcoded into various formats and receive custom processing, such as adding watermarks or updating information to a database in the after-process stage.
When multiple files are uploaded to OSS at the same time, Function Compute automatically scales in or out to process them in parallel. Transcoding files into multiple formats are also carried out in parallel.
With the help of Apsara File Storage NAS and video slicing, you can transcode ultra-large videos. For each video, first, slice the video, and carry out transcoding and slicing in parallel, and finally composite the slices into the final video. You can greatly accelerate the transcoding of large videos by setting a proper slicing time.
FnF tracks the execution status of each step and customizes the retry logic of each step to improve the robustness of the task system. For example, retry-example.

For more information, see fc-fnf-video-processing.

In the specific case study of task dividing-and-conquering and parallel acceleration, the previous section explained how to break down a CPU-intensive task. However, you can also break down I/O-intensive tasks. For example, a large 20 GB object is saved in an OSS bucket of the China (Shanghai) region and needs to be transmitted to an OSS bucket of the China (Hangzhou) region within seconds. In this case, the divide-and-conquer idea also works. After the Master function receives the transfer task, it allocates the sharding range of the ultra-large file to each Worker sub-function. The Worker sub-functions then store the shards they are responsible for in parallel. After all Worker sub-functions finish execution, the Master function sends a request to merge the shards to complete the entire transmission task.

Summary

This article explained how to scale your apps in or out on the serverless service platform to process tasks in parallel with practical cases. Whether you are handling a CPU-intensive scenario or an I/O-intensive one, the combination of Function Compute and Serverless Workflow can alleviate the following concerns.

No costs for idle resources
No worries about insufficient computing resources reserved
Quick completion of tasks with a high computing budget
Improved task process tracking
Comprehensive monitoring and alerting, zero O&M, business data visualization, and more

The serverless audio and video processing case described in this article is only an example. It showcases the capabilities and unique advantages of Function Compute in conjunction with Serverless Workflow in offline processing scenarios. We can employ a divergent approach to expand the boundaries of Serverless Workflow’s large-scale data processing practices, such as artificial intelligence (AI), gene computing, and scientific simulation. We hope you find this article interesting, and it inspires you to embark on a journey with Serverless Workflow.

Serverless Practices for Large-Scale Data Processing

Preface

Supreme Elasticity to Cope with Computing Load Fluctuations

1. About Function Compute

2. Elastic and Highly Available Audio and Video Processing System

Divide-and-Conquer Tasks for Parallel Acceleration

1. Overview

2. Quick Transcoding of Large Videos Into Multiple Formats

Summary

Original Source:

Serverless Practices for Large-Scale Data Processing

Alibaba Cloud Serverless November 25, 2020 621 By Xiliu, Alibaba Cloud Technical Expert You may have noticed a new…

Written by Alibaba Cloud

No responses yet