Serverless Practices for Large-Scale Data Processing

Preface

You may have noticed a new usage of Alibaba Cloud’s serverless series, which is not that obvious, during your first use. Compared with the traditional server-based approach, the serverless service platform enables rapid scale-out of your apps and more efficient parallel processing. With this platform, you do not need to pay for idle resources or worry that reserved resources may be insufficient. However, in a traditional usage paradigm, you must reserve hundreds or thousands of servers for highly paralleled and short-lived tasks. In addition, you must pay for each of the used servers, even if some of them are no longer in use.

  • In a scenario, your tasks feature a small computing budget but need to process massive concurrent requests in parallel, such as multimedia file processing and document conversion.
  • In another scenario, your tasks feature a great computing budget and require quick completion of each task and parallel processing of multiple tasks.

Supreme Elasticity to Cope with Computing Load Fluctuations

Before we move on to large-scale data processing examples, let’s take a brief look at Function Compute.

1. About Function Compute

  • The developer uses a programming language to write an app or service. The development language list shows the development languages supported by Function Compute.
  • The developer uploads the app to Function Compute.
  • Trigger Function Execution: You can use one of the following trigger methods to execute a function: Object Storage Service (OSS), API Gateway, Log Service, Tablestore, Function Compute APIs, and SDKs.
  • Dynamically Scale Out to Respond to Requests: Function Compute can automatically scale out based on the number of user requests. This process is transparent and imperceptible to you and your users.
  • Charged based on the actual execution time of the function: After the function is executed, you can view the cost in the bill. The billing granularity is accurate to 100 milliseconds.

2. Elastic and Highly Available Audio and Video Processing System

  • The OSS Trigger
  • The Message Trigger
  • Manually call the SDK to execute audio and video processing tasks
python
# -*- coding: utf-8 -*-
import fc2
import json
client = fc2.Client(endpoint="http://123456.cn-hangzhou.fc.aliyuncs.com",accessKeyID="xxxxxxxx",accessKeySecret="yyyyyy")
# Select synchronous or asynchronous calling
resp = client.invoke_function("FcOssFFmpeg", "transcode", payload=json.dumps(
{
"bucket_name" : "test-bucket",
"object_key" : "video/inputs/a.flv",
"output_dir" : "video/output/a_out.mp4"
})).data
print(resp)

Divide-and-Conquer Tasks for Parallel Acceleration

It is interesting to apply the divide-and-conquer idea for tasks to Function Compute. For example, you have a 20 GB 1080p HD video to be transcoded, it may take hours for you to complete the transcoding with a single computer. Even worse, if the transcoding is interrupted midway, you must restart from the beginning. If you adopt the divide-and-conquer idea together with Function Compute, the transcoding process turns into a sequence of sharding > parallel transcoding and sharding > shard merging, resolving the pain points mentioned above.

  • Sharding and shard merging are memory-level replication tasks and require a minimal amount of computing budget. Transcoding is the only large consumer of computing budget and therefore is split into many subtasks for parallel processing. In this model, the maximum transcoding time of a shard is almost the same as that of the entire large video.
  • Even if a shard encounters a transcoding exception, you only need to re-transcode this shard. You do not need to rerun the whole task.

1. Overview

Serverless Workflow is a fully managed cloud service that coordinates multiple distributed tasks. In Serverless Workflow, you can orchestrate distributed tasks by sequence, branch, or in parallel. Serverless Workflow coordinates task execution based on specified steps. It tracks the status change of each task and executes user-defined retry logic as necessary to ensure smooth completion of the workflow. Serverless Workflow streamlines complex task coordination, status management, error handling, and other efforts required for business workflow development and operation, allowing you to focus on business logic development.

2. Quick Transcoding of Large Videos Into Multiple Formats

  • A video file can be transcoded into various formats and receive custom processing, such as adding watermarks or updating information to a database in the after-process stage.
  • When multiple files are uploaded to OSS at the same time, Function Compute automatically scales in or out to process them in parallel. Transcoding files into multiple formats are also carried out in parallel.
  • With the help of Apsara File Storage NAS and video slicing, you can transcode ultra-large videos. For each video, first, slice the video, and carry out transcoding and slicing in parallel, and finally composite the slices into the final video. You can greatly accelerate the transcoding of large videos by setting a proper slicing time.
  • FnF tracks the execution status of each step and customizes the retry logic of each step to improve the robustness of the task system. For example, retry-example.

Summary

This article explained how to scale your apps in or out on the serverless service platform to process tasks in parallel with practical cases. Whether you are handling a CPU-intensive scenario or an I/O-intensive one, the combination of Function Compute and Serverless Workflow can alleviate the following concerns.

  • No costs for idle resources
  • No worries about insufficient computing resources reserved
  • Quick completion of tasks with a high computing budget
  • Improved task process tracking
  • Comprehensive monitoring and alerting, zero O&M, business data visualization, and more

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com