PostgreSQL Similarity Search Distributed Architecture: DBLink Asynchronous Call and Multi-machine Parallelism

Background

When the data size for a similarity search exceeds the processing capability of a single machine, you need to horizontally split the data to improve the search capability. PostgreSQL supports such similarity search scenarios.

Alternatively, you can use Alibaba Cloud’s POLARDB for PostgreSQL. Similar to Oracle RAC, POLARDB for PostgreSQL allows you to add computing nodes. This solution is superior to horizontal database sharing because it allows to share and store data without splitting.

In the case of a horizontal database sharing scenario, the critical question is how to perform a parallel query when a database is divided into multiple sub-databases.

You can perform a parallel query by using DBLink asynchronous calls. The architecture is designed as follows.

Image for post
Image for post

In fact, many examples of running a parallel query by using DBLink asynchronous calls are available.

The following section demonstrates how to implement parallel similarity search in multiple databases using asynchronous calls.

Demo

In this example, four local databases are used as remote databases. These databases can be installed on remote computers. The approach in this example is just for test convenience.

Following shows the name of the local database.

Name of the remote databases is as follows.

Step 1. First, create a user and test databases.

Step 2. Create a DBLink plug-in a local database.

Step 3. Create a server to connect to the remote databases.

Step 4. Configure the user password used for connecting to the remote databases. The user password can only be used to connect to the remote databases but not the local database.

Remote Database Operations

Create a test table in each remote database, write test data to the table, and create a similarity search function.

Step 1. Create the required plug-in. (Note that the database name in the following script must be replaced with the actual database name.)

Step 2. Create the index on which the primary table and similarity search depends. (Note that the database name in the following script must be replaced with the actual database name.)

Step 3. Now, Create partitions. The following is only for the test.

Step 4. Create a connection function as shown below.

Step 5. Next, create a function that generates a random function.

Step 6. Write the test data and random text. (Note that the database name in the following script must be replaced with the actual database name.)

Step 7. Create a UDF for the similarity search.

Local Database Operations

Create a function that sets up a remote connection.

Return the Cursor

Define UDF1 to return the cursor. If a large number of records are returned, it is recommended to use the cursor because PLPGSQL returns the records only after all records are retrieved. The response time (RT) is long when the records are returned. Refer the following example.

The following example illustrates the usage.

Return Record

Define UDF2 to return the record. It is recommended that you specify the maximum number of returned records because PLPGSQL returns the records only after all records are retrieved. Refer the following example.

The following example illustrates the usage.

Conclusion

Applying the method discussed in this article, you can use multiple PostgreSQL databases as one PostgreSQL database to linearly scale-out the parallel similarity search performance. Also, DBLink asynchronous calls can be used to linearly scale out the search performance of similar texts, without degrading the performance.

Build your own PostgreSQL solution on Alibaba Cloud with ApsaraDB for RDS PostgreSQL.

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store