Data De-duplication in Image Search Services

PostgreSQL’s Image Search Plug-in Background Technology

Steps to Install PostgreSQL Image Search Plug-in

  • $ git clone https://github.com/postgrespro/imgsmlr $ cd imgsmlr $ export PGHOME=/home/digoal/pgsql9.5 $ export PATH=$PGHOME/bin:$PATH:. $ make USE_PGXS=1 $ make USE_PGXS=1 install
  • $ psql psql (9.5.3) Type "help" for help. postgres=# create extension imgsmlr; CREATE EXTENSION

Steps to Perform PostgreSQL Image Search Plug-in Test:

  1. Import images, such as the following (the more the better).
  1. Create the image table (id serial, data bytea);
  2. Import the images to the database.
  3. Insert into image(data) select pg_read_binary_file;
  4. Convert the image to the pattern and signature type.
  • CREATE TABLE pat AS ( SELECT id, shuffle_pattern(pattern) AS pattern, pattern2signature(pattern) AS signature FROM ( SELECT id, jpeg2pattern(data) AS pattern FROM image ) x );
  1. Create an index.
  • ALTER TABLE pat ADD PRIMARY KEY (id); CREATE INDEX pat_signature_idx ON pat USING gist (signature);
  1. Perform an approximation query, such as querying images that are similar to id = :id images and retrieving the top 10 items on the similarity ranking list.
  • SELECT id, smlr FROM ( SELECT id, pattern <-> (SELECT pattern FROM pat WHERE id = :id) AS smlr FROM pat WHERE id <> :id ORDER BY signature <-> (SELECT signature FROM pat WHERE id = :id) LIMIT 100 ) x ORDER BY x.smlr ASC LIMIT 10
  1. K — Nearest Neighbour (KNN) indexing is an option here and the result is output quickly based on similarity rankings.

Testing Our Image Search Engine

Video De-duplication Service

  1. Create the image table and import the key frames of all videos into the table (id serial8 primary key, movie_id int, data bytea);
  2. Import the image (assume it is in jpeg format).
  3. Skipped
  4. Generate the pattern and signature types
  • CREATE TABLE pat AS ( SELECT id, movie_id, shuffle_pattern(pattern) AS pattern, pattern2signature(pattern) AS signature FROM ( SELECT id, movie_id, jpeg2pattern(data) AS pattern FROM image ) x );
  1. Calculate the similarity of different videos.
  • select t1.movie_id, t1.id, t1.signature<->t2.signature from pat t1 join pat t2 on (t1.movie_id<>t2.movie_id) order by t1.signature<->t2.signature desc or select t1.movie_id, t1.id, t1.signature<->t2.signature from pat t1 join pat t2 on (t1.movie_id<>t2.movie_id) where t1.signature<->t2.signature > 0.9 order by t1.signature<->t2.signature desc

--

--

--

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Google Cloud Platform Command line cheat sheet

Data Science: How to — Write CSV from an API (JSON)

Creating a Credit Card Bill Notification using Gmail API, Google Calendar API, and Python

Loading and Moving Data to Cloud Storage Using the Command Line

PayPal Interview Experience for SDE 1 | On-Campus 2021

ProtonDate — Write up (HeroCTF)

Understanding your client data

How to Write Go Code

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

More from Medium

Apache Airflow: Scaling Using Celery Executor

DataFlow with Apache Nifi(Flight & Weather API, Writing various source)

Must Know Docker Commands

How to build a data lake from scratch — Part 2: Connecting the components