Data De-duplication in Image Search Services

PostgreSQL’s Image Search Plug-in Background Technology

Steps to Install PostgreSQL Image Search Plug-in

  • $ git clone https://github.com/postgrespro/imgsmlr $ cd imgsmlr $ export PGHOME=/home/digoal/pgsql9.5 $ export PATH=$PGHOME/bin:$PATH:. $ make USE_PGXS=1 $ make USE_PGXS=1 install
  • $ psql psql (9.5.3) Type "help" for help. postgres=# create extension imgsmlr; CREATE EXTENSION

Steps to Perform PostgreSQL Image Search Plug-in Test:

  1. Import images, such as the following (the more the better).
  1. Create the image table (id serial, data bytea);
  2. Import the images to the database.
  3. Insert into image(data) select pg_read_binary_file;
  4. Convert the image to the pattern and signature type.
  • CREATE TABLE pat AS ( SELECT id, shuffle_pattern(pattern) AS pattern, pattern2signature(pattern) AS signature FROM ( SELECT id, jpeg2pattern(data) AS pattern FROM image ) x );
  1. Create an index.
  • ALTER TABLE pat ADD PRIMARY KEY (id); CREATE INDEX pat_signature_idx ON pat USING gist (signature);
  1. Perform an approximation query, such as querying images that are similar to id = :id images and retrieving the top 10 items on the similarity ranking list.
  • SELECT id, smlr FROM ( SELECT id, pattern <-> (SELECT pattern FROM pat WHERE id = :id) AS smlr FROM pat WHERE id <> :id ORDER BY signature <-> (SELECT signature FROM pat WHERE id = :id) LIMIT 100 ) x ORDER BY x.smlr ASC LIMIT 10
  1. K — Nearest Neighbour (KNN) indexing is an option here and the result is output quickly based on similarity rankings.

Testing Our Image Search Engine

Video De-duplication Service

  1. Create the image table and import the key frames of all videos into the table (id serial8 primary key, movie_id int, data bytea);
  2. Import the image (assume it is in jpeg format).
  3. Skipped
  4. Generate the pattern and signature types
  • CREATE TABLE pat AS ( SELECT id, movie_id, shuffle_pattern(pattern) AS pattern, pattern2signature(pattern) AS signature FROM ( SELECT id, movie_id, jpeg2pattern(data) AS pattern FROM image ) x );
  1. Calculate the similarity of different videos.
  • select t1.movie_id, t1.id, t1.signature<->t2.signature from pat t1 join pat t2 on (t1.movie_id<>t2.movie_id) order by t1.signature<->t2.signature desc or select t1.movie_id, t1.id, t1.signature<->t2.signature from pat t1 join pat t2 on (t1.movie_id<>t2.movie_id) where t1.signature<->t2.signature > 0.9 order by t1.signature<->t2.signature desc

--

--

--

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Cool Things you can do with Google Kubernetes Engine (GKE)

The Summary of RESTful

Facebook LinkBench Tests PostgreSQL Social Relation Profile Scenario Performance

HTML Tag CheatSheet

How to start a Collaborative Code Club

Weekly Roundup Nov. 28th 2017

DataOps in Seven Steps

Taking Snapshot of Prometheus data.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

More from Medium

kafkaVision: An open-source monitoring tool for Apache Kafka

How to join multiple KStreams in Redpanda

Eventbus written in Python based on Kafka

How we implemented Pod Logging at NetBook