PostgreSQL Application in Image Search and Video and Image Deduplication

Image for post
Image for post

Background

Multiple image processing business scenarios, such as image search, video deduplication, face recognition, image beautification, and image deduplication, encounter the common challenge of managing the humongous volume of duplicate images and videos stored on the server.

PostgreSQL Image Search Plug-in: Background Technology

The PostgreSQL image search plug-in uses the mainstream Haar wavelet technology to store images after conversion. Take a look at its Wikipedia page to learn more about this technology.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Introduction to the PostgreSQL Image Search Plug-in

Follow the steps below to install the PostgreSQL Image Search Plug-in:

yum install -y gd-devel
$ git clone https://github.com/postgrespro/imgsmlr
$ cd imgsmlr
$ export PGHOME=/home/digoal/pgsql9.5
$ export PATH=$PGHOME/bin:$PATH:.
$ make USE_PGXS=1
$ make USE_PGXS=1 install
$ psql
psql (9.5.3)
Type "help" for help.
postgres=# create extension imgsmlr;
CREATE EXTENSION
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

PostgreSQL Image Search Plug-in: Testing

Follow the steps listed below to test the PostgreSQL Image Search Plug-in:

Image for post
Image for post
create table image (id serial, data bytea);
insert into image(data) select pg_read_binary_file('文件路径');
CREATE TABLE pat AS (
SELECT
id,
shuffle_pattern(pattern) AS pattern,
pattern2signature(pattern) AS signature
FROM (
SELECT
id,
jpeg2pattern(data) AS pattern
FROM
image
) x
);
ALTER TABLE pat ADD PRIMARY KEY (id);CREATE INDEX pat_signature_idx ON pat USING gist (signature);
SELECT
id,
smlr
FROM
(
SELECT
id,
pattern <-> (SELECT pattern FROM pat WHERE id = :id) AS smlr
FROM pat
WHERE id <> :id
ORDER BY
signature <-> (SELECT signature FROM pat WHERE id = :id)
LIMIT 100
) x
ORDER BY x.smlr ASC
LIMIT 10
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Video Deduplication

For video deduplication, extract keyframes in a video to generate the Cartesian product through self-correlation. Calculate the similarity between any two images of different videos, and when the similarity reaches a certain threshold value, the two videos are deemed the same.

创建图片表,并将所有视频的关键帧导入表中
create table image (id serial8 primary key, movie_id int, data bytea);
导入图片,假设为jpeg格式
... 略 ...
生成patten 和 signature
CREATE TABLE pat AS (
SELECT
id, movie_id,
shuffle_pattern(pattern) AS pattern,
pattern2signature(pattern) AS signature
FROM (
SELECT
id, movie_id,
jpeg2pattern(data) AS pattern
FROM
image
) x
);
计算不同视频的相似度
select t1.movie_id, t1.id, t1.signature<->t2.signature from
pat t1 join pat t2 on (t1.movie_id<>t2.movie_id)
order by t1.signature<->t2.signature desc
or select t1.movie_id, t1.id, t1.signature<->t2.signature from
pat t1 join pat t2 on (t1.movie_id<>t2.movie_id)
where t1.signature<->t2.signature > 0.9
order by t1.signature<->t2.signature desc

Conclusion

In short, PostgreSQL is a very powerful database with highly customizable functions and does not require you to make any changes to the PostgreSQL kernel. Therefore, it is safe and reliable. The image search technology is an example of PostgreSQL feature extension and offers extremely high speed.

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store