Image Processing Service Tutorial using Unique Alibaba Function Compute Features

Introducing Function Compute

is an Alibaba cloud serverless platform that allows engineers to develop an internet scale service with just a few lines of code. It seamlessly handles resource management, auto scaling and load balancing so that developers can focus on their business logic without worrying about managing the underlying infrastructure making it easy to build applications that respond quickly to new information”. Internally, we utilize container technology and develop proprietary distributed algorithms to schedule our user’s code on resources that are scaled elastically. Since it’s inception a little over an year ago, we have developed many cutting-edge technologies internally aiming to provide our users with high scalability, reliability and performance.

In this guide, we show you a step by step tutorial that showcases some of its innovative features. You can read this to familiarize yourself with basic serverless concepts if this is your first time using Function Compute.

Using Network File System

The first that we introduce allows developers to write functions that read and write from a network attached file system like Alibaba Cloud .


The serverless nature of the platform means that user code can run on different instances each time it is invoked. This further implies that the functions cannot rely on its local file system to store any intermediate results. The developers have to rely on another cloud service like to share processed results between functions or invocations. This is not ideal as dealing with another distributed service adds extra development overhead and complexities in the code to handle various edge cases.

To solve this problem, we developed the access Network Attached Storage () feature. NAS is another Alibaba cloud service that offers a highly scalable, reliable and available distributed file system that supports standard file access protocols. We can mount the remote NAS file system to the resource on which the user code is running which effectively creates a “local” file system for the function code to use.

Image Crawling Example

This demo section shows you how to create a serverless web crawler that downloads all the images starting from a seed webpage. This is a quite a challenge problem to be run on a serverless platform as it is not possible to crawl all the websites in one function given the time constraints. However, with the access to NAS feature, it becomes straightforward as one can use the NAS file system to share data between function runs. Below we show a step by step tutorial. We assume that you understand the concept of and know how to create a NAS in a VPC. Otherwise, you can read the before proceeding to the steps below.

Create a service with NAS configuration

  1. Log on to the .
  2. Select the target region in which your NAS is located.
  3. Create a service that uses a pre-created NAS file system. In this demo:
  1. Enter the Service Name and Description.
  2. Enable Advanced Settings.
  3. Finish the VPC Configs fields, make sure that you select the VPC in which the NAS mount point is located.

After the VPC Configs are complete, the NAS Config fields appear.

4.Complete the Nas Config fields as described below.

  1. The UserId and GroupId fields are the uid/gid under which the function runs. They determine the owner of all the files created on the NAS file system. You can pick any user/group id for this demo as they are shared among all functions in this service.
  2. The NAS Mount Point drop down menu list all the valid NAS mount points that are accessible from the chosen VPC.
  3. The Remote Path is a directory on the NAS file system, it does not need to be the root directory of the NAS file system. Please choose a directory that you want to store the images.
  4. The Local Mount Path is the local directory where the function can access the remote directory and please remember what you choose here.
  5. Complete the configuration with your desired logstore destination.
  6. Make sure that you config your to grant Function compute access to your VPC and logstore.
  7. Click OK.

Create a function that starts every five minutes

Now that we have a service with NAS access, it’s time to write the crawler. Since the crawler function has to run many times before it can finish, we use a to invoke it every 5 minutes.

  1. Log on to the Function Compute console and select the service you just created
  2. Create a for the service by clicking the plus sign.

3.Function Compute provides various function templates to help you quickly build an application. Select to create an empty function for this demo and click next but you can play with other templates when you have time.

4.Select time trigger in the drop down menu in the next page. Fill out the trigger name and set the invoke interval to be 5 minutes and leave the events empty for now and click next.

5.Fill in the function name and make sure to select java8 as the runtime. Also fill in the function handler and set the memory to be 2048MB and Time out as 300 seconds and click next

6.Click next and make sure the preview looks good before clicking create.

Write the crawler in Java

Now you should see the function code page and it’s time to write the crawler. The handler logic is pretty straightforward as shown below.

  1. Parse the time trigger event to get the crawler config.
  2. Create the image crawler based on the config. The crawler uses a to parse html pages to identify images and links.
  3. Read the already and not-yet visited web page lists from the NAS file system (only if the function is running in a new environment).
  4. Continue the depth-first traverse of the web pages and use the crawler to download any new pictures along the way.
  5. Save the newly found web pages to the NAS file system.

Here is an excerpt of the Java code and you can see that we read and write files to the NAS file system exactly the same way as to the local file system.

public class ImageCrawlerHandler implements PojoRequestHandler<TimedCrawlerConfig, CrawlingResult> {
private String nextUrl() {
String nextUrl;
do {
nextUrl = pagesToVisit.isEmpty() ? "" : pagesToVisit.remove(0);
} while (pagesVisited.contains(nextUrl) );
return nextUrl;
private void initializePages(String rootDir) throws IOException {
if (this.rootDir.equalsIgnoreCase(rootDir)) {
try {
new BufferedReader(new FileReader(rootDir + CRAWL_HISTORY)).lines()
.forEach(l -> pagesVisited.add(l));
new BufferedReader(new FileReader(rootDir + CRAWL_WORKITEM)).lines()
.forEach(l -> pagesToVisit.add(l));
} catch (FileNotFoundException e) {;
this.rootDir = rootDir;
private void saveHistory(String rootDir, String justVistedPage, HashSet<String> newPages)
throws IOException {
//append crawl history to the end of the file
try (PrintWriter pvfw = new PrintWriter(
new BufferedWriter(new FileWriter(rootDir + CRAWL_HISTORY, true)));
) {
//append to be crawled workitems to the end of the file
try (PrintWriter ptfw = new PrintWriter(
new BufferedWriter(new FileWriter(rootDir + CRAWL_WORKITEM, true)));
) { -> ptfw.println(p));
public CrawlingResult handleRequest(TimedCrawlerConfig timedCrawlerConfig, Context context) {
CrawlingResult crawlingResult = new CrawlingResult();
this.logger = context.getLogger();
CrawlerConfig crawlerConfig = null;
try {
crawlerConfig = JSON_MAPPER.readerFor(CrawlerConfig.class)
} catch (IOException e) {
ImageCrawler crawler = new ImageCrawler(
crawlerConfig.rootDir, crawlerConfig.cutoffSize, crawlerConfig.debug, logger);
int pagesCrawled = 0;
try {
if (pagesToVisit.isEmpty()) {
while (pagesCrawled < crawlerConfig.numberOfPages) {
String currentUrl = nextUrl();
if (currentUrl.isEmpty()) {
HashSet<String> newPages = crawler.crawl(currentUrl); -> {
if (!pagesVisited.contains(p)) {
saveHistory(crawlerConfig.rootDir, currentUrl, newPages);
// calculate the total size of the images
} catch (Exception e) {
crawlingResult.errorStack = e.toString();
crawlingResult.totalCrawlCount = pagesVisited.size();
return crawlingResult;
public class ImageCrawler {
public HashSet<String> crawl(String url) {
try {
Connection connection = Jsoup.connect(url).userAgent(USER_AGENT);
Document htmlDocument = connection.get();
Elements media ="[src]");
for (Element src : media) {
if (src.tagName().equals("img")) {
Elements linksOnPage ="a[href]");
for (Element link : linksOnPage) {
logDebug("Plan to crawl `" + link.absUrl("href") + "`");
} catch (IOException ioe) {
return links;

For the sake of simplicity, we have omitted some details and other helper classes. You can get all the code from the repo if you would like to run the code and get images from your favorite websites.

Run the crawler

Now that we have written the code, we need to run it. Here are the steps.

  1. We use maven to do dependency and build management. Just type the following command after you sync with the repro (assuming you have maven installed already) to create the jar file ready to upload.
mvn clean package
  1. Select the Code tab in the function page. Upload the jar file (the one with name ends with dependencies) created in the previous step through the console.

2.Select the Triggers tab in the function page. Click the time trigger link to enter the event in Json format. The Json event will be serialized to the crawler config and passed to the function. Click Ok.

3.The time trigger invokes the crawler function every five minutes. Each time, the handler picks up the list of URLs still need to be visited and start from the first one.

4.You can select the Log tab to search for the crawler execution log.

Create a Serverless Service

The second that we introduce allows anyone to send an HTTP request to trigger a function execution directly.


Now that we have a file system filled with the images downloaded from the web, we want to find a way to serve those images through a web service. The traditional way is to mount the NAS to a VM and start a webserver on it. This is both a waste of resources if the service is lightly used and not scalable when the traffic is heavy. Instead, you can write a serverless function that reads the images stored on the NAS file system and serve it through a HTTP endpoint. In this way, you can enjoy the instant scalability that Function Compute provides while still only pay for the actual usage.

Image Processing Service Example

This demo shows how to write an Image Processing Service.

Create a Function with HTTP Trigger

  1. Log on to the Function Compute console and select the same service as the crawler function.
  2. Create a for the service by clicking the plus sign.
  3. Select to create an empty python2.7 function and click next.
  4. Select HTTP trigger in the drop down menu and make sure that it supports both GETand POST invoke method and click next.

5.Finish the rest of the step and click OK.

6.Get the from the same repro and upload the directory to the function.

Image Processing Using Python

Function Compute’s runtime comes with many built-in modules that one can use. In this example, we use both and to do image transformations.

Use the HTTP trigger in Python

Even with an image processing function, we still need to setup a web site to serve the requests. Normally, one needs to use another service like to handle HTTP requests. In this demo, we are going to use the Function Compute feature to allow a HTTP request to trigger a function execution directly. With the HTTP trigger, the headers/paths/query in the HTTP requests are all passed to the function handler directly and the function can return the HTML content dynamically.

With these two features, the handler code is surprisingly straightforward and here is a high-level breakdown.

  1. Get the HTTP path and query from the system environ variable.
  2. Use the HTTP path to load the image on the NAS file system.
  3. Apply different image processing techniques based on the query action.
  4. Insert the transformed image onto the pre-build html file and return it.

Here is an excerpt of the handler logic and we can see that wang loads the image stored on NAS just like a normal file on the local system.

import cv2
from wand.image import Image
TEMPLATE = open('/code/index.html').read()
NASROOT = '/mnt/crawler'
face_cascade = cv2.CascadeClassifier('/usr/share/opencv/lbpcascades/lbpcascade_frontalface.xml')
def handler(environ, start_response):
logger = logging.getLogger()
context = environ['fc.context']
path = environ.get('PATH_INFO', "/")
fileName = NASROOT + path
query_string = environ['QUERY_STRING']
except (KeyError):
query_string = " "
action = query_dist['action']

if (action == "show"):
with Image(filename=fileName) as fc_img:
img_enc = base64.b64encode(fc_img.make_blob(format='png'))
elif (action == "facedetect"):
img = cv2.imread(fileName)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.03, 5)
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 1)
cv2.imwrite("/tmp/dst.png", img)
with open("/tmp/dst.png") as img_obj:
with Image(file=img_obj) as fc_img:
img_enc = base64.b64encode(fc_img.make_blob(format='png'))
elif (action == "rotate"):
assert len(queries) >= 2
angle = query_dist['angle']"Rotate " + angle)
with Image(filename=fileName) as fc_img:
img_enc = base64.b64encode(fc_img.make_blob(format='png'))
# demo, mixed operation

status = '200 OK'
response_headers = [('Content-type', 'text/html')]
start_response(status, response_headers)
return [TEMPLATE.replace('{fc-py}', img_enc)]

What’s Next ?

Now we have the function and the HTTP trigger ready, we can try or an advanced transformation like .

  1. The URL is constructed based on your user/region/service/function .
  2. Use the relative path to the local NAS mount dir of any image. You can find out all the files on your NAS system through your crawler log.
  3. You can edit the python code on the Function Compute console directly and add many more different image transformations and play with it.


  1. You can read the to get a more general idea what can do.
  2. You can also read the official NAS and other Function Compute to learn more exciting new features.
  3. Please give us feedbacks or suggestions in our official Function Compute or the official Alibaba Cloud Channel.




Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: