Image Search Explained
Here are the questions we hear most. How does it work? How does the computer know that two images are similar?
The principle is easy to understand and is reliant on what Dr. Neal Krawetz calls a fast algorithm.
Or more specifically, the key technology involved here is called “perceptual hash algorithm.” Its role is to generate a “fingerprint” character string for each image and then compare the fingerprints. The closer the comparison result, the more similar the two images are.
Below is a simple implementation:
Step 1: Downsize the image.
Downsize the image to 8x8 pixels, i.e. 64 pixels in total. This step removes the details in the image, and only retains the structure, light, and shade among other essential information to discard the image difference caused by difference in sizes and proportions.
Step 2: Simplify the color.
Convert the downsized image into 64-level grayscale. In other words, there are 64 colors for all the pixels.
Step 3: Calculate the average value.
Calculate the average grayscale of all the 64 pixels.
Step 4: Compare the pixel grayscale.
Compare the grayscale of each pixel with the mean value. If the grayscale is larger than or equal to the average value, record the value as 1; otherwise, record the value as 0.
Step 5: Calculate the hash value.
Combine the comparison results in the previous step to get a 64-bit integer, which is the “fingerprint” of this image. The order of the combination is not critical, as long as you ensure all the images follow the same order.
After you have the fingerprint, you can compare different images to check how many bits in the 64 bits are different. In theory, this is same as calculating the “Hamming distance.” If the number of different bits is less than 5, the two images are similar; if the number of different bits exceeds 10, it means the two images are different.
For specific code implementation, see imgHash.py written by Wote in Python. The code is short (only 52 lines). In usage, the first parameter refers to the benchmark image and the second parameter indicates the directory of other images for comparison. The returned result is the number of different bits of the two images (Hamming distance).
This algorithm is advantageous for being easy and quick, irrespective of the size of the image, but its disadvantage is that the image’s content cannot change. If you add several texts on the image, the algorithm will not recognize it. It locates the original picture based on a thumbnail.
In practical application, pHash and SIFT use more robust algorithms as they can recognize the variations in images. They can match the original image as long as the deformation is less than 25 percent. These algorithms are more complicated, but they follow the same principle as the simple algorithm explained above, namely converting the image to a hash character value and then making the comparison.
See! Not that complex after all.