Building a Serverless PDF Text Recognition Using Function Compute with Node.js in 10 Minutes

By Johnson Chiang, Solutions Architect

Alibaba Cloud Function Compute (FC) is a, serverless FaaS with an event-driven programming model. This tutorial demonstrates how you can develop a PDF-to-Text conversion function with Function Compute, and you will see the simple yet powerful paradigm of FC to implement such helper service.

What You Will Learn

This tutorial is organized into the following sections. Each section represents a specific task when developing a Function Compute service:

Prerequisites

Preparing OSS:

Preparing FC:

Write Function Codes

Currently FC supports runtimes including Java/Python/PHP/Node.js. We will code upon Node.js and use the npm pdfreader module to read text from PDF files.

  • // required modules var OSS = require('ali-oss').Wrapper; // FC built-in module var PdfReader = require("pdfreader").PdfReader; // packaged 3rd-party PDF parser module console.log('Loading function'); module.exports.handler = function (eventBuf, ctx, callback) { console.log('Received event:', eventBuf.toString()); let eventObj = JSON.parse(eventBuf); let ossEvent = eventObj.events[0]; let ossRegion = "oss-" + ossEvent.region; // Init oss client instance where credentials can be retrieved from context. let ossClient = new OSS({ region: ossRegion, accessKeyId: ctx.credentials.accessKeyId, accessKeySecret: ctx.credentials.accessKeySecret, stsToken: ctx.credentials.securityToken }); ossClient.useBucket(ossEvent.oss.bucket.name); // Bucket name is from OSS event // Source PDF from "in/<filename>.pdf", processed to "out/<filename>.txt" let newKey = ossEvent.oss.object.key.replace("in/", "out/").replace(".pdf", ".txt"); // Parse PDF to text console.log("Getting object: " + ossEvent.oss.object.key); ossClient.get(ossEvent.oss.object.key).then(function (val) { let pdfBuf = val.content; let convertedTxt = ""; console.log("Start parsing PDF buffer."); new PdfReader().parseBuffer(pdfBuf, function(err, item) { if (err) { console.error("Failed to read PDF binary"); callback (err); return; } if (!item) { console.log("Done parsing text."); const outBuf = Buffer.from(convertedTxt, "utf8"); // Upload converted text as buffer to "out" directory ossClient.put(newKey, outBuf).then(function (val) { console.log("Put object: ", val); callback(null, val); return; }).catch(function (err) { console.error("Failed to put object: %j", err); callback(err); return; }); return; } if (item.text) { console.log("Continue parsed text: " + item.text); convertedTxt += item.text; } }); }).catch (function (err) { console.error("Failed to get object: %j", err); callback(err); return; }); };
  • $ ls -l; du -hs . total 8 -rw-r--r--@ 1 owner staff 2600 Jan 21 20:00 index.js drwxr-xr-x 5 owner staff 170 Jan 21 20:00 node_modules 180M . $ du -h -d3 | sort -nr | head -n8 660K ./node_modules/pdf2json/node_modules 180M ./node_modules 180M . 178M ./node_modules/pdf2json 176M ./node_modules/pdf2json/test 108K ./node_modules/pdf2json/lib 88K ./node_modules/pdf2json/.idea 28K ./node_modules/pdfreader/lib $ zip pdf-to-text.zip index.js node_modules/ adding: index.js (deflated 63%) adding: node_modules/ (stored 0%) $ ls -lh pdf-to-text.zip -rw-r--r-- 1 owner staff 1.3K Jan 21 20:10 pdf-to-text.z

You can download the working ZIP deployment package to proceed to next step.

Configure Service and Function

We will primarily be using the Alibaba Cloud Console to complete this task. In our case, all Alibaba Cloud resources are in the same region, ap-southeast-1.

By completing above configurations, you have created the PDF-to-Text function with OSS event trigger.

Invoke Function

Next, to test the conversion function, you will upload the sample PDF file to OSS <YOUR_BUCKET>/in to invoke the FC function.

Then, check the <YOUR_BUCKET>/out, and see the pdf-sample.txt created and view the texts recognized from the PDF file. That's it.

Troubleshooting

When you implement your own FC, you will always run a testing and debugging cycle. Listed here are two common errors you may potentially encounter, and the corresponding troubleshooting tips:

What’s Next?

In this tutorial, you have completed a quick and powerful file conversion service using FC with OSS trigger. Here are some suggestions for you to get more information we recommend for next:

Reference:https://www.alibabacloud.com/blog/building-a-serverless-pdf-text-recognition-using-function-compute-with-node-js-in-10-minutes_594429?spm=a2c41.12548475.0.0

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.