Use Case Analysis — Image Recognition and Similar Feature Retrieval with PostgreSQL

Application Scenarios

This article demonstrates a case of image recognition, cashless payment, biometric clock-in, and feature recognition. Such cases are common in almost all industries, including the Internet, new retail, transportation, smart buildings, education, gaming, medical care, social networks, and public security. Let’s take a look at some examples:

  • Smart hotel applications use image recognition to automatically check-in a customer and automatically connect the customer with exclusive services based on their membership level.
  • E-commerce applications use image recognition to search for similar products.
  • Education applications monitor students’ attentiveness on a class (such as snoring, daydreaming, fidgeting, and raising hands) based on their expressions.
  • Transportation applications automatically identify the driver in case of a traffic violation.
  • New retail applications use image recognition to map customers to the background member system. This technology enables new retail businesses to set customer arrival reminders, provide corresponding shopping guidance, and customize their operations.
  • The public transportation industry uses cashless payment.
  • The gaming industry uses image recognition in virtual reality (VR) games.

Challenges and Pain Points in the Scenarios

Business Features

  • Businesses require highly efficient image search with high precision.
  • Businesses require not only image search but also other searches that involve other filtering conditions.

Business Challenges and Pain Points

  • Common relational databases, such as MySQL, do not support vector retrieval. Such databases need to traverse all records for a search and return all results to the application layer for computing, resulting in poor performance and high network bandwidth consumption.
  • Although relational databases support vector retrieval operators, they do not support vector indexes. Therefore, traversal computing is still required, which results in poor performance and an inability to handle highly concurrent queries.
  • When image vector computing is moved to the application layer, all data must be loaded from the database. Since the loading speed is slow and images cannot be loaded in real-time after being updated. This results in low efficiency.
  • When image vector computing moves to the application layer, the combined filtering for image recognition and other attributes is not supported. This results in low efficiency.

Technical Solution

Solution 1

  • The database only stores the image vectors and does not perform vector computing.
  • Image vector computing is moved up to the application layer.
  • Applications need to load all data from the databases. The loading speed is slow and updated images cannot be loaded by the applications in real-time. This results in low efficiency.
  • Records cannot be filtered in the databases by a combination of conditions for image recognition and conditions for other attributes. The filtering by conditions for image recognition needs to be done at the business layer. The volume of records transmitted over the network is high. This results in low efficiency and an inability to handle high concurrency scenarios.

Solution 2

  • The PASE plug-in of ApsaraDB RDS for PostgreSQL is used to create vector indexes for the image feature vectors.
  • The application inputs feature vectors for searching. The database searches for similar images by using vector indexes. The returned results contain vector distances and are sorted by distance.
  • When multiple filtering conditions exist, the database uses multiple indexes to filter records by a combination of conditions.
  • ApsaraDB RDS for PostgreSQL supports filtering by combined indexes. It filters records by a combination of conditions for image search and conditions for other attributes at the same time. The indexes maximize the narrowing down of result sets, greatly improving performance and reducing transmission volume. A single query is complete in milliseconds.
  • Read-only ApsaraDB RDS for PostgreSQL instance further improves overall query throughput.
The IVFFlat algorithm
The HNSW algorithm

Benefits of ApsaraDB RDS for PostgreSQL

1) ApsaraDB RDS for PostgreSQL supports index retrieval for high-dimensional vectors (by using the PASE plug-in). This enables highly efficient similarity matchup searching for image vectors. A single request takes only milliseconds to complete.
2) The high-dimensional vector retrieval function can be used not only in image searches but also in any feature search that can be digitized, such as feature searches for user profiling and similar people selection in marketing systems.
3) ApsaraDB RDS for PostgreSQL supports searches through a combination of indexes. Therefore, the combined filtering by vector conditions and other common query conditions can be achieved at the same time, substantially improving the performance.
4) ApsaraDB RDS for PostgreSQL satisfies the high concurrency requirement of image recognition and image search applications in industries such as the Internet, new retail, transportation, smart buildings, education, gaming, medical care, social networks, and public security. In addition, these databases meet the high concurrency requirement of similar people selection in marketing systems. Compared with the general MySQL solution, the acceleration solution of ApsaraDB RDS for PostgreSQL using the PASE vector index plug-in is far more advantageous. It is a cost-effective and highly efficient solution for image recognition, image search, and similar people selection.
5) With this solution, the performance is improved by 2,457,900% on average, and the response time is reduced to milliseconds.

Demo Introduction

Prerequisites include the following operations:

Demo for Solution 1

Step 1) Create a test table.
Step 2) Create a function to generate random vectors for simulating image feature values. In real-life scenarios, use actual image feature values.
Step 3) Write one million random vectors into the table.
Step 4) Return the queried one million records to the client.
Step 5) Conduct the concurrency capability test.

Demo for Solution 2

Step 1) Create the PASE vector index plug-in.
Step 2) Create a test table.
Step 3) Create a function to generate random vectors for simulating image feature values. In real-life scenarios, use actual image feature values.
Step 4) Write one million random vectors to the table.
Step 5) Create a vector index by using the HNSW algorithm. The PASE plug-in currently supports two types of indexing: IVFFlat and HNSW. For more information about actual use, see the topic about the PASE plug-in in the official documentation for ApsaraDB RDS for PostgreSQL. The index parameters must be set correctly. Pay special attention to ensuring that the dimension is consistent with the actual dimension.

  • Space consumed by the index:

Solution Comparison

The following table represents the case environment:

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com