This content originally appeared on DEV Community and was authored by rkouye
Last month, I started working on a really fun side project: a natural language image search engine.
The idea came after reading about Open AI's CLIP.
CLIP is a neural network able to connect text and images. It allows us to compute how similar a text corpus and a picture are.
The naïve engine
If we can compute a similarity score between a text corpus and a picture, we can build a simple search engine. Imagine we had a million pictures of cute animals.
1- The user would type a query, let's say, "cats with hats".
2- Using CLIP, we would compute every image similarity score with the sentence "cats with hats".
3- Finally, we would return the most relevant images (biggest score).
Et voilà!
However, because we are computing the similarity score on each image, the search time will keep growing with the number of images. It is not scalable.
Also, search engines usually allow multiple facets search (for example, the image's primary color) in conjunction with the text query.
To improve our implementation, let understand how CLIP work.
(Coming in Part 2)
This content originally appeared on DEV Community and was authored by rkouye

rkouye | Sciencx (2021-08-15T21:43:27+00:00) How to build your own natural language image search engine (Part 1). Retrieved from https://www.scien.cx/2021/08/15/how-to-build-your-own-natural-language-image-search-engine-part-1/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.