11.11 Big Sale for Cloud. Get unbeatable offers with up to 90% off on cloud servers and up to a $300 rebate for all products! Click here to learn more.
By Lyu Chengfei, nicknamed Lyu Xing, representing the Taobao Technology Department, part of Alibaba New Retail.
In this article, we are going to look at Taobao’s Edge AI Application. The first part of this article will look at some current trends in edge AI and its application on Taobao Mobile. The second part of this article looks at some of the problems and challenges that the Taobao team faced when adapting edge AI onto their systems and the solution they proposed, which was to build a complete end-to-end technological system. And last, the third part of this article considers the latest updates to Taobao’s Mobile Neural Network (MNN), an open-source inference engine.
Edge AI: Current trends and applications
To explain exactly what Edge AI really is, what edge AI does is it brings the inference operations and upper-layer application of machine learning and other AI models, like deep learning or neural networks, to edge clients, or to put more simply, end devices. Edge AI has several advantages over cloud-based AI, among which include lower latency, ensured privacy, and a lower consumption of cloud computing resources. Edge AI has been widely deployed in edge AI-powered applications that we use on a daily basis, anything from AI-empowered cameras tricks and features, like with Google’s Nightsight, Apple’s Deep Fusion, and Sony’s Eye AF, and on-device security systems, like Apple’s FaceID, and several of the diverse AR effects you’ll find in many Apps nowadays.
Then, you may ask, where does edge AI come from, and what is the underlying logic behind edge AI? Well, first, one clear trend is that, in the last two years, deep learning has been moving from the development phases to real industry and customer-facing applications, with massively popular end devices, like smart phones, tablets, and other mobile devices, becoming the carrier for several deep learning applications. And, when combined with smart phones, which are, without a doubt, the most used-end devices nowadays, deep learning can easily be applied on a large scale. And besides these clear trends, we believe that there are also three other major factors, which were critical to the birth and evolution of edge AI. These factors are a rise in computing power, algorithms, and application scenarios.
- Computing power: The computing power of smart phones-and some other mobile devices, including some wearables-has been growing rapidly year on year, with there being clear CPU or GPU performance improvements. So much so that the neural processing unit (NPU), also referred to as neural processor or AI accelerator, has become a standard component of several smart phones with performance improving by several orders of magnitude.
- Algorithms: The model compression technique of AI algorithms in recently years has become increasingly mature, especially in terms of quantization, which is capable of reducing model size by three-quarters without any losses to accuracy. Network models for mobile devices continue to emerge as the architecture design of small models develop. Algorithm model designs for resource-constrained mobile devices has also matured over time.
- Scenarios: As far as smart phones are concerned, AI has become a major selling point and year-to-year highlight, with AI-empowered camera tricks and modes and on-device security found on an increasing number of device. In terms of apps, many of the apps that have gone viral lately, both in China and abroad, have all used AI-for example, FaceApp and ZAO. Currently, some software programs primarily rely on cloud AI capabilities. They upload data, such as an image of your face, to their servers online to create the final output, but of course doing so is troublesome when it comes to your privacy. Without the need to upload your data, edge AI can naturally avoid these kinds of privacy issues. The only obstacle, then, is that the current computing power on the edge devices may not be sufficient to support these new AI applications.
However, we can expect more interesting and innovative applications to be powered and driven by AI as computing power and algorithms develop to even greater heights.
Edge AI has been adopted by increasing numbers of apps in the Alibaba ecosystem family. For instance, edge AI has become the core infrastructural capability of Taobao Mobile, which has driven us to several new business innovations and breakthroughs. In fact, today over 20 Alibaba apps use edge AI. On Taobao Mobile, for instance, more than 10 scenarios and 25 models rely on edge AI, which together run around 50 trillion inferences every day. Helping to produce results that are more accurate for important scenarios such as user search recommendations, edge AI will be extensively used in this year’s Double 11 Shopping Festival.
The Construction of Taobao’s Edge AI System
With having implemented Edge AI in so many ways, then what were some problems and challenges that we encountered along the way?
Well, before we get into the specific challenged we faced, let’s first discuss the transition in general. The transition to edge AI mainly involves the following processes and items: data collection, data cleansing or tagging, algorithm model design, model training on the server, model compression and conversion, running inferences on the edge device, and implementations of the edge AI application for use. This long process involves both the cloud and client devices, requiring algorithm engineers and mobile developers to work together to implement the related applications.
Again, speaking in a general sense, I see three major challenges in edge AI application. The first challenge is that the overall process is quite long, which in turn means that any one error on any one node could cause the implementation to fail. In 2017, the deployment and operation of algorithm models on edge devices became a bottleneck for us, which prompted us to develop the Mobile Neural Network (MNN) to solve this problem. The second challenge that we faced was edge AI requires that algorithm and mobile engineers work together closely to bridge any inherent gaps. And, the third challenge is that the edge computing environment is complex, involving fragmented hardware and systems, which bring about a host of compatibility issues. To address all of these challenges, we built a complete, end-to-end deployment system, which is shown below:
This diagram shows the overall technology framework of our system, which mainly consists of three parts:
- Client-side engines and frameworks, which include the machine learning algorithm library, deep learning inference engine MNN, Python Virtual Machine (VM) containers with fast algorithm iteration, upper-layer algorithm collections, and industry solutions.
- Offline tools, which include tools for model conversion, model compression, performance testing, and debugging.
- Our cloud platform, which provides the model conversion and compression services, as well as various systems for model management, deployment, O&M, and monitoring.
Let’s take a closer look at the inference engine, default algorithm sets, and industry solutions.
The top technical challenge for the inference engine MNN is fragmentation, with network models, training frameworks, and end devices all being fragmented. The second challenge is resource limitation. Unlike a server, an end device has limited memory and computing power. The third challenge is high performance. Functions such as face detection require real-time access to resources. Therefore, it is also a major challenge to achieve high performance in resource-constrained environments.
The purpose of the inference engine is to find a solution that allows different models to run on different devices with the maximum possible efficiency. There are three different design ideas involved here.
One is an automatic search solution that is similar to the transversion model (TVM), which takes into consideration the characteristics of the model (such as the convolution kernel size) and the effects on the hardware of the edge device to be deployed, such as the memory capacity and computing power, and finds the most efficient way of running the applications. This approach enables a higher level of performance but also entails high costs, because offline traversal fine-tuning is required for each type of device to find the most efficient way. For instance, it is costly for Taobao Mobile, which currently runs on countless smart phone models.
The second approach is using manual optimization, which would target specific convolution kernel sizes, such as a 3x3 convolution. With this approach, specifically optimized network models would perform better than those that are not optimized. Moreover, as hardware features are not taken into account, the overall performance would be lower and costs would be high because a lot of traversal is required.
The third approach is the solution adopted by our system of mobile neural network (MNN). I call this specific approach “semi-automatic search optimization”. Different convolution kernels are aligned through the NC4HW4 memory layout for unified matrix computing optimization. We then chose the optimal operation mode based on hardware features. We are now relying more on rules, and will consider online optimization in the future. The overall performance meets our general requires and does so with low costs.
This is the MNN framework. The offline section on the left includes model conversion and compression, which converts various training framework models into MNN models. The diagram on the right illustrates offline inference. Currently, MNN supports CPUs and back ends that are compatible with GPUs that use OpenCL and Vulkan.
Take the preceding figure as an example. Google’s Pixel 2 supports CPU ARM and Vulkan back ends, Xiaomi 6 supports CPU ARM, OpenCL, and Vulkan back ends, and Huawei’s Mate20 supports CPU back ends in ARM82 architecture, as well as OpenCL and Vulkan back ends.
In the pre-inference step, we identify the fast operation mode according to model structure information and device hardware information. For example, for Pixel 2, it selects CPU and Vulkan, for Xiaomi 6, it selects CPU and OpenCL, and for Mate 20, it selects ARM82 instructions and OpenCL. This is a coarse-grained way to select the CPU and GPU running modes. A more refined approach would be, for example, still with CPU running, to use the winogrand and strasort matrix computing algorithms to choose different blocks based on the model and hardware features. In this way, the fastest operation is implemented.
MNN has already supported model compression based on the iDST quantization algorithm a long time ago. However, the productization of the quantization tool did not deliver satisfactory results. Therefore, over the past few months, we optimized the quantization tool to make it as simple and easy as possible. And as of now, we have released a quantization tool without training capabilities. You can use a single command to complete quantization, with loss of accuracy of less than 1%, a file size reduction of 75%, and inference speed increased anywhere from 30% to 70%. At the same time, we have continued to develop a quantization solution with training capabilities. The implementation is not like the current industry solutions. In order to solve the problem of the fragmented training frameworks, we are now adding training capabilities directly into MNN, so that the MNN model can be used for the fine-tune training.
The inference engine MNN is mainly used to solve the operations issue for algorithm models deployed at the edge. However, mobile developers are usually not familiar with algorithm models. Consider facial recognition applications for example, mobile developers would prefer to call an API for that purpose. Therefore, we develop an easy-to-use algorithm collection for face, human posture, and object recognition.
We later found that even with face attributes, it is still costly to develop a feature like AR face effects, so we further developed an AR effects solution, which allows users to make an AR sticker effect with ease. To be specific, an WYSIWYG (what-you-see-is-what-you-get) IDE editor is used to edit effects and export a resource file package. Then, a rendering SDK on the corresponding end device parses the resource file for effect restoration. However, Taobao Mobile is not, after all, a short-video app, and AR stickers offer limited business value for us. And therefore, the AR solution is more frequently used in the beauty and furniture industry to drive product-related AR applications for improved shopping experience.
Now, let’s look at a big data solution. In the traditional big data system, the client is responsible for data collection, and the server is responsible for data calculation and mining. Applications such as personalized recommendation are built on top of this.
However, as edge computing power increases, the machine learning and deep learning models can be deployed on end devices. We develop a calculation framework for edge data features to support data feature extraction and model calculation. One of the applications is context computing. Relying on multi-dimensional data from end devices, we can accurately describe a user’s status, for example, whether they are walking, riding in a vehicle, or lying in bed.
In addition, depending on how long you stay on the webpage and your browsing path, edge AI can identify your preference for the product and make recommendations that are more accurate. This technique has been widely applied in Taobao Mobile’s search recommendation function and achieving satisfactory results.
Let’s take a look at some typical application scenarios in Taobao Mobile. Pailitao is Taobao’s quick image search and recognition platform. Originally, photos taken by users are uploaded to the cloud for recognition purposes, which is time consuming and can weigh on servers. Now, a large part of the task including object detection and segmentation processes are performed on edge devices, and results are uploaded to the cloud for recognition recall. This new approach makes for a better user experience and also saves costs at the sever-side.
The figure shows more typical applications. The first one is interactive search recommendation based on users’ real-time intent, the second AR make-up try-out, the third an intelligent publishing function on Xianyu, and the fourth some applications on intelligent hardware.
Open Source Development and Applications of MNN
Now, I would like to talk about MNN. The history of MNN is as follows. MNN was officially launched in October 2017. Having withstood the test of the 2018 Double Eleven, MNN will see larger-scale application in this year’s Double Eleven.
Here are several figures. MNN runs more than 200 million inferences on Taobao Mobile every day, with a crash rate less than 0.1%. It has been used in more than 20 industry applications and fixed 260 issues. From these figures, it can be confidently stated that MNN is a stable and reliable inference engine that has been tested against massive multi-scenario computation tasks.
MNN is universal, lightweight, offers high performance and is easy to use.
We are leading the industry in terms of performance, device support, training framework support, and OP support.
The ease of use of MNN is what we care about. We build a Python tool chain to make it easy for algorithm engineers to test and validate, because Python is an algorithm engineer-friendly language. Currently, we provide model conversion, model compression, model structure visualization, and tools that support OP list querying. The specific installation command is as follows: pip install MNN. After installation, you can select the applicable tool. The diagram on the right shows an example of model structure.
The open source MNN is now Alibaba’s recommended project. We are glad and grateful to see the birth of rich MNN applications, such as vehicle license plate recognition, object detection, and vehicle detection. We also want to thank those who have written documentation and instructions and hope more people will be joining us.
Open source URL: https://github.com/alibaba/MNN
I would like to share three points about the future of MNN.
- App-based. This feature is engraved in our gene. MNN is born to serve super apps like Taobao Mobile, which runs on iOS and Android devices and supports many different makes of smart phones and other mobile devices, including several older models running on older systems. Therefore, it is necessary that we tackle the issues of device fragmentation and system compatibility. There are few engines like MNN that can run on over 200 types of devices and supports scale application.
- High-performance. Taobao Mobile has several real-time business scenarios such as face recognition and search recommendation and needs to be compatible with several smart phone and mobile device models, including middle- and low-ender models. Therefore, we need to optimize performance to the maximum possible level to better support businesses.
- Open-source. MNN runs on hundreds types of devices and is compatible with a wide range of smart phones and other mobile devices and systems to support business at scale. As a one-of-a-kind product, MNN is open-source with the rich experience to help the community progress. Last but not least, our vision is to work with the community to build a versatile, high-performance, and easy-to-use edge inference engine that is widely used in the industry.