Similarity Analysis for PostgreSQL Text Databases

By Digoal.

When it comes to searches and all sorts of search filters, practically speaking, we can often find ourselves needing to search according to many different parameters. To name a few, these could be things such as having a similar appearance, or geographical closeness, or even similar personality traits, so on.

These aren’t just technical questions, of course. They are ones that affect us all the time. For instance, let’s if say you know what kind of particular item you want and you have a picture of it, but you find it hard to describe. In that case, it’s a lot easier to search by image.

Well, luckily for us, PostgreSQL, which is arguably the world’s most advanced open-source database, is powerful enough to support the background inner workings of any of these types of search queries and filtering scenarios. In reality, the limit isn’t the technology, but rather the limit is our imagination!

In this article, we will quickly and briefly go over some of the more common scenarios in which you can use PostgreSQL in, specifically we will go over:

  • sorting by image or facial similarity
  • sorting by overlapping hobbies
  • sorting by age approximation
  • sorting by distance
  • sorting by textual similarity
  • sorting by word-break similarities

Of course, there’s many more scenarios in which you can apply PostgreSQL to filter and sort search results. In reality, the applications are nearly endless! We hope that this blog can inspire you to do more with PostgreSQL.

Sorting by Image or Facial Similarity

Sorting by Image and facial similarity are both things that are very relevant nowadays-especially with the prominence of images and video in our daily lives. Image searches are now common on most online search engines. I’m sure nearly everyone has used the reverse-image search option on Google before. And now, on many e-commerce platforms, such as Alibaba’s own Taobao (淘宝), you can search for items based on the power of AI.

Well, you can use PostgreSQL to empower image similarity searches, too. Consider the example given in this blog, for example.

Sorting by Overlapping Hobbies

You can also collect hobby data from a group of people, then group people by hobby or sort by hobby overlap to find the target group. Also, potential as more hobbies are gathered, and more information is gathered about these hobbies, you can do even more with this data. As this is relatively simple, we won’t be discussing this in more detail here.

Sorting by Age Approximation

Alternatively, you could sort by ages or by age approximations. Actually, this is relatively simple, too. For example, you can enter the age 23, and have the system return the results that are the closest to this age.

Consider this example in PostgreSQL’s official documentation. The below example is based on the example given by PostgreSQL themselves-where we use age approximation as the main search filter. For this particular output example, the system returns 10 pieces of data that are the closet to the age 100:

Sorting by Distance

You can also sort based on distance. For this, you can read more about the functions and operators involved here in this tutorial provided in PostgreSQL’s official documentation. Anyway, before we get to distracted, let’s look at example. In the example below, our search retrieves 10 points that are geographically nearest to the specific point we entered.

Sorting by Textual Similarity

Next, you can sort by similar features, or more specifically words and the repetition of words, in a text. Important to our discussion here is, of course, the code discussed at this page. Let me show you an example here. The example shows how ‘Hello Digoal’ and the Mandarin sentence of ‘Nihao Digoal’ (你好德哥) are markedly similar-which they ought to be, as they mean the same thing:

Sorting by Word-Break Similarity

Last let’s look at a more complex scenario. This is one of sorting by word-break similarities. Unlike sorting by textual similarity, this sorting scenario counts work-break similarity rather than similarities in the words themselves.

The rum plug-in can be used for this kind of sorting scenario. For this, we will be using this resource. Now consider this example:

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store