Use Case Analysis — Image Recognition and Similar Feature Retrieval with PostgreSQL

Image for post
Image for post

Application Scenarios

This article demonstrates a case of image recognition, cashless payment, biometric clock-in, and feature recognition. Such cases are common in almost all industries, including the Internet, new retail, transportation, smart buildings, education, gaming, medical care, social networks, and public security. Let’s take a look at some examples:

  • Smart building applications use image recognition to identify whether a person is an employee in the building, automatically clock an employee check-in, and automatically set the elevator to stop at their working floor.
Image for post
Image for post

Challenges and Pain Points in the Scenarios

Business Features

  • Businesses require highly efficient image search with high precision.

Business Challenges and Pain Points

  • Common relational databases, such as MySQL, do not support vector retrieval. Such databases need to traverse all records for a search and return all results to the application layer for computing, resulting in poor performance and high network bandwidth consumption.

Technical Solution

Solution 1

  • The database only stores the image vectors and does not perform vector computing.
Image for post
Image for post

This solution has the following disadvantages:

  • Common databases do not support vector indexes. Therefore, vector filtering is unavailable in these databases.

Solution 2

Image for post
Image for post
  • The ApsaraDB RDS for PostgreSQL database stores the feature vector values of the images.

This solution has the following advantages:

  • ApsaraDB RDS for PostgreSQL supports vector indexes. Image search is highly efficient because filtering can be done in the databases.

Note: This is an acceleration solution for database vector searching and does not involve the extraction of image feature values (for converting images to high-dimensional vectors). Image feature values can be extracted at the application layer.

Currently, the PASE plug-in of ApsaraDB RDS for PostgreSQL supports two popular vector index algorithms: IVFFlat and HNSW. In the future, it will continue to integrate cutting edge vector index algorithms in the industry.

Image for post
Image for post
The IVFFlat algorithm
Image for post
Image for post
The HNSW algorithm

For more information about the PASE plug-in, refer to the official documentation for ApsaraDB RDS for PostgreSQL.

Benefits of ApsaraDB RDS for PostgreSQL

1) ApsaraDB RDS for PostgreSQL supports index retrieval for high-dimensional vectors (by using the PASE plug-in). This enables highly efficient similarity matchup searching for image vectors. A single request takes only milliseconds to complete.
2) The high-dimensional vector retrieval function can be used not only in image searches but also in any feature search that can be digitized, such as feature searches for user profiling and similar people selection in marketing systems.
3) ApsaraDB RDS for PostgreSQL supports searches through a combination of indexes. Therefore, the combined filtering by vector conditions and other common query conditions can be achieved at the same time, substantially improving the performance.
4) ApsaraDB RDS for PostgreSQL satisfies the high concurrency requirement of image recognition and image search applications in industries such as the Internet, new retail, transportation, smart buildings, education, gaming, medical care, social networks, and public security. In addition, these databases meet the high concurrency requirement of similar people selection in marketing systems. Compared with the general MySQL solution, the acceleration solution of ApsaraDB RDS for PostgreSQL using the PASE vector index plug-in is far more advantageous. It is a cost-effective and highly efficient solution for image recognition, image search, and similar people selection.
5) With this solution, the performance is improved by 2,457,900% on average, and the response time is reduced to milliseconds.

The preceding comparison data comes from the actual operations of one million images in a quad-core 8-GB RDS database instance.

Currently, the ApsaraDB RDS for PostgreSQL version that supports this function is V11.

In the future, this function will be supported by ApsaraDB RDS for PostgreSQL V10 and later.

For more information about this function, see this guide

Demo Introduction

Prerequisites include the following operations:

1) Purchase an ApsaraDB RDS for PostgreSQL V11 instance.
2) Set up a whitelist.
3) Create a user.
4) Create a database.

Demo for Solution 1

Step 1) Create a test table.
Step 2) Create a function to generate random vectors for simulating image feature values. In real-life scenarios, use actual image feature values.
Step 3) Write one million random vectors into the table.
Step 4) Return the queried one million records to the client.
Step 5) Conduct the concurrency capability test.

Demo for Solution 2

Step 1) Create the PASE vector index plug-in.
Step 2) Create a test table.
Step 3) Create a function to generate random vectors for simulating image feature values. In real-life scenarios, use actual image feature values.
Step 4) Write one million random vectors to the table.
Step 5) Create a vector index by using the HNSW algorithm. The PASE plug-in currently supports two types of indexing: IVFFlat and HNSW. For more information about actual use, see the topic about the PASE plug-in in the official documentation for ApsaraDB RDS for PostgreSQL. The index parameters must be set correctly. Pay special attention to ensuring that the dimension is consistent with the actual dimension.

  • Time spent creating the index:

After creating an index, when image feature values are updated or new values are added in the future, the index will be automatically updated and no additional index needs to be created.

Step 6) Assign random values to a vector, query the five vectors that are most similar to the vector, and return them sequentially based on their vector distances.
Step 7) Conduct the concurrency capability test.

Sample simulation query:

The test result:

Solution Comparison

The following table represents the case environment:

Image for post
Image for post

The following table shows the performance comparison:

Image for post
Image for post

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store