Image Processing Service Tutorial using Unique Alibaba Function Compute Features

Introducing Function Compute

Function Compute is an Alibaba cloud serverless platform that allows engineers to develop an internet scale service with just a few lines of code. It seamlessly handles resource management, auto scaling and load balancing so that developers can focus on their business logic without worrying about managing the underlying infrastructure making it easy to build applications that respond quickly to new information”. Internally, we utilize container technology and develop proprietary distributed algorithms to schedule our user’s code on resources that are scaled elastically. Since it’s inception a little over an year ago, we have developed many cutting-edge technologies internally aiming to provide our users with high scalability, reliability and performance.

In this guide, we show you a step by step tutorial that showcases some of its innovative features. You can read this quick start guide to familiarize yourself with basic serverless concepts if this is your first time using Function Compute.

Using Network File System

The first feature that we introduce allows developers to write functions that read and write from a network attached file system like Alibaba Cloud NAS.

Motivation

The serverless nature of the platform means that user code can run on different instances each time it is invoked. This further implies that the functions cannot rely on its local file system to store any intermediate results. The developers have to rely on another cloud service like Object Storage Services to share processed results between functions or invocations. This is not ideal as dealing with another distributed service adds extra development overhead and complexities in the code to handle various edge cases.

To solve this problem, we developed the access Network Attached Storage (NAS) feature. NAS is another Alibaba cloud service that offers a highly scalable, reliable and available distributed file system that supports standard file access protocols. We can mount the remote NAS file system to the resource on which the user code is running which effectively creates a “local” file system for the function code to use.

Image Crawling Example

This demo section shows you how to create a serverless web crawler that downloads all the images starting from a seed webpage. This is a quite a challenge problem to be run on a serverless platform as it is not possible to crawl all the websites in one function given the time constraints. However, with the access to NAS feature, it becomes straightforward as one can use the NAS file system to share data between function runs. Below we show a step by step tutorial. We assume that you understand the concept of VPC and know how to create a NAS mount point in a VPC. Otherwise, you can read the basic NAS tutorial before proceeding to the steps below.

Create a service with NAS configuration

Image for post
Image for post
Image for post
Image for post

After the VPC Configs are complete, the NAS Config fields appear.

4.Complete the Nas Config fields as described below.

Image for post
Image for post

Create a function that starts every five minutes

Now that we have a service with NAS access, it’s time to write the crawler. Since the crawler function has to run many times before it can finish, we use a time trigger to invoke it every 5 minutes.

Image for post
Image for post

3.Function Compute provides various function templates to help you quickly build an application. Select to create an empty function for this demo and click next but you can play with other templates when you have time.

4.Select time trigger in the drop down menu in the next page. Fill out the trigger name and set the invoke interval to be 5 minutes and leave the events empty for now and click next.

Image for post
Image for post

5.Fill in the function name and make sure to select java8 as the runtime. Also fill in the function handler and set the memory to be 2048MB and Time out as 300 seconds and click next

Image for post
Image for post

6.Click next and make sure the preview looks good before clicking create.

Write the crawler in Java

Now you should see the function code page and it’s time to write the crawler. The handler logic is pretty straightforward as shown below.

Here is an excerpt of the Java code and you can see that we read and write files to the NAS file system exactly the same way as to the local file system.

public class ImageCrawlerHandler implements PojoRequestHandler<TimedCrawlerConfig, CrawlingResult> {
private String nextUrl() {
String nextUrl;
do {
nextUrl = pagesToVisit.isEmpty() ? "" : pagesToVisit.remove(0);
} while (pagesVisited.contains(nextUrl) );
return nextUrl;
}
private void initializePages(String rootDir) throws IOException {
if (this.rootDir.equalsIgnoreCase(rootDir)) {
return;
}
try {
new BufferedReader(new FileReader(rootDir + CRAWL_HISTORY)).lines()
.forEach(l -> pagesVisited.add(l));
new BufferedReader(new FileReader(rootDir + CRAWL_WORKITEM)).lines()
.forEach(l -> pagesToVisit.add(l));
} catch (FileNotFoundException e) {
logger.info(e.toString());
}
this.rootDir = rootDir;
}
private void saveHistory(String rootDir, String justVistedPage, HashSet<String> newPages)
throws IOException {
//append crawl history to the end of the file
try (PrintWriter pvfw = new PrintWriter(
new BufferedWriter(new FileWriter(rootDir + CRAWL_HISTORY, true)));
) {
pvfw.println(justVistedPage);
}
//append to be crawled workitems to the end of the file
try (PrintWriter ptfw = new PrintWriter(
new BufferedWriter(new FileWriter(rootDir + CRAWL_WORKITEM, true)));
) {
newPages.stream().forEach(p -> ptfw.println(p));
}
}
@Override
public CrawlingResult handleRequest(TimedCrawlerConfig timedCrawlerConfig, Context context) {
CrawlingResult crawlingResult = new CrawlingResult();
this.logger = context.getLogger();
CrawlerConfig crawlerConfig = null;
try {
crawlerConfig = JSON_MAPPER.readerFor(CrawlerConfig.class)
.readValue(timedCrawlerConfig.payload);
} catch (IOException e) {
....
}
ImageCrawler crawler = new ImageCrawler(
crawlerConfig.rootDir, crawlerConfig.cutoffSize, crawlerConfig.debug, logger);
int pagesCrawled = 0;
try {
initializePages(crawlerConfig.rootDir);
if (pagesToVisit.isEmpty()) {
pagesToVisit.add(crawlerConfig.url);
}
while (pagesCrawled < crawlerConfig.numberOfPages) {
String currentUrl = nextUrl();
if (currentUrl.isEmpty()) {
break;
}
HashSet<String> newPages = crawler.crawl(currentUrl);
newPages.stream().forEach(p -> {
if (!pagesVisited.contains(p)) {
pagesToVisit.addAll(newPages);
}
});
pagesCrawled++;
pagesVisited.add(currentUrl);
saveHistory(crawlerConfig.rootDir, currentUrl, newPages);
}
// calculate the total size of the images
.....
} catch (Exception e) {
crawlingResult.errorStack = e.toString();
}
crawlingResult.totalCrawlCount = pagesVisited.size();
return crawlingResult;
}
}
public class ImageCrawler {
...
public HashSet<String> crawl(String url) {
links.clear();
try {
Connection connection = Jsoup.connect(url).userAgent(USER_AGENT);
Document htmlDocument = connection.get();
Elements media = htmlDocument.select("[src]");
for (Element src : media) {
if (src.tagName().equals("img")) {
downloadImage(src.attr("abs:src"));
}
}
Elements linksOnPage = htmlDocument.select("a[href]");
for (Element link : linksOnPage) {
logDebug("Plan to crawl `" + link.absUrl("href") + "`");
this.links.add(link.absUrl("href"));
}
} catch (IOException ioe) {
...
}
return links;
}
}

For the sake of simplicity, we have omitted some details and other helper classes. You can get all the code from the awesome-fc github project repo if you would like to run the code and get images from your favorite websites.

Run the crawler

Now that we have written the code, we need to run it. Here are the steps.

mvn clean package
Image for post
Image for post

2.Select the Triggers tab in the function page. Click the time trigger link to enter the event in Json format. The Json event will be serialized to the crawler config and passed to the function. Click Ok.

Image for post
Image for post

3.The time trigger invokes the crawler function every five minutes. Each time, the handler picks up the list of URLs still need to be visited and start from the first one.

4.You can select the Log tab to search for the crawler execution log.

Create a Serverless Service

The second feature that we introduce allows anyone to send an HTTP request to trigger a function execution directly.

Motivation

Now that we have a file system filled with the images downloaded from the web, we want to find a way to serve those images through a web service. The traditional way is to mount the NAS to a VM and start a webserver on it. This is both a waste of resources if the service is lightly used and not scalable when the traffic is heavy. Instead, you can write a serverless function that reads the images stored on the NAS file system and serve it through a HTTP endpoint. In this way, you can enjoy the instant scalability that Function Compute provides while still only pay for the actual usage.

Image Processing Service Example

This demo shows how to write an Image Processing Service.

Create a Function with HTTP Trigger

Image for post
Image for post

5.Finish the rest of the step and click OK.

6.Get the files from the same github repro and upload the directory to the function.

Image for post
Image for post

Image Processing Using Python

Function Compute’s python runtime comes with many built-in modules that one can use. In this example, we use both opencv and wand to do image transformations.

Use the HTTP trigger in Python

Even with an image processing function, we still need to setup a web site to serve the requests. Normally, one needs to use another service like API gateway to handle HTTP requests. In this demo, we are going to use the Function Compute HTTP Trigger feature to allow a HTTP request to trigger a function execution directly. With the HTTP trigger, the headers/paths/query in the HTTP requests are all passed to the function handler directly and the function can return the HTML content dynamically.

With these two features, the handler code is surprisingly straightforward and here is a high-level breakdown.

Here is an excerpt of the handler logic and we can see that wang loads the image stored on NAS just like a normal file on the local system.

import cv2
from wand.image import Image
TEMPLATE = open('/code/index.html').read()
NASROOT = '/mnt/crawler'
face_cascade = cv2.CascadeClassifier('/usr/share/opencv/lbpcascades/lbpcascade_frontalface.xml')
def handler(environ, start_response):
logger = logging.getLogger()
context = environ['fc.context']
path = environ.get('PATH_INFO', "/")
fileName = NASROOT + path
try:
query_string = environ['QUERY_STRING']
logger.info(query_string)
except (KeyError):
query_string = " "
action = query_dist['action']

if (action == "show"):
with Image(filename=fileName) as fc_img:
img_enc = base64.b64encode(fc_img.make_blob(format='png'))
elif (action == "facedetect"):
img = cv2.imread(fileName)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.03, 5)
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 1)
cv2.imwrite("/tmp/dst.png", img)
with open("/tmp/dst.png") as img_obj:
with Image(file=img_obj) as fc_img:
img_enc = base64.b64encode(fc_img.make_blob(format='png'))
elif (action == "rotate"):
assert len(queries) >= 2
angle = query_dist['angle']
logger.info("Rotate " + angle)
with Image(filename=fileName) as fc_img:
fc_img.rotate(float(angle))
img_enc = base64.b64encode(fc_img.make_blob(format='png'))
else:
# demo, mixed operation

status = '200 OK'
response_headers = [('Content-type', 'text/html')]
start_response(status, response_headers)
return [TEMPLATE.replace('{fc-py}', img_enc)]

What’s Next ?

Now we have the function and the HTTP trigger ready, we can try image rotation or an advanced transformation like face detection.

Conclusions

Reference: https://www.alibabacloud.com/blog/image-processing-service-tutorial-using-unique-alibaba-function-compute-features_594003?spm=a2c41.12092722.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store