Creating N1QL Labelled Image Database using Couchbase, FastAPI, and SerpApi

This is a part of the series of blog posts related to Artificial Intelligence Implementation. If you are interested in the background of the story or how it goes:

https://medium.com/media/56289484ed46818bc25074e9b1f2fe32/href

In the previous weeks, we explored how to create your own image dataset using SerpApi’s Google Images Scraper API automatically, and used those images to automatically train a network using a simple command object to be passed to FastAPI. This week we will improve the database creation method by using Couchbase as a Storage Server, and will show how to fetch a random element from a given subset.

Couchbase Configuration

We will be needing Couchbase Community server, and Python SDK for Couchbase for this project. You can find relevant information on Couchbase Docs.

For this tutorial we will be using CouchBase version 7.1.0 for Debian 11:

You can access it from this link

Once you install it, define username and password from the server destination ( http://kagermanov:8091 in my case), you will be greeted with such dashboard:

This means you have successfully deployed the server.
For those of you who want to stop the background process on Linux, you can type sudo systemctl stop couchbase-server to stop the server at your will.

Head to Buckets on the left hand menu and add a new bucket called images from ADD BUCKET button:

Make sure to choose a ram amount that won’t force your local system into frenzy. I

Now, You need to add a scope and collection within this bucket via Scopes & Collections button:

Add a scope named image, and within it, a collection named labelled_image:

Next, head to playground where you can do a manual query, and run the following:

https://medium.com/media/0daff91bf7343b362511d9e9677faf2d/href

Lastly, make sure you install the Couchbase Python SDK via pip, and everything is set for our server.

Automatic Image Collector

Let’s create a seperate file within our project called add_couchbase.py. This will be the refactored version of add.py which was automatically gathering images with a certain query.
Here are the requirements for it:

https://medium.com/media/2f45fa0ed0d5087a02b33d18aab18e66/href

To break them down for their specific usecase:

https://medium.com/media/64c0a3f4694f3f48cefd1dcf90f08408/href

Let’s define a pydantic base model for an image to be uploaded to the Couchbase Storage Server:

https://medium.com/media/9f68664ec59f8d8b991d64fde061e3e4/href

id will be a unique uuid of the image for image to be called manually in the future.
classification is the query given to SerpApi‘s Engine.
base64 will be the the string representation of the image to be recreated within a training session.
uri will represent the url we fetch the image from.

Let’s initialize our Storage Database in a class:

https://medium.com/media/b443e6170abba070f90436e82b2ed844/href

Here’s the function for inserting an image to the Storage server with a unique id:

https://medium.com/media/ce8824f8b140b2f3559f36e51560684b/href

doc in this context represents the Document object we store the image inside to be uploaded to the Couchbase Server.

Let’s have another function to call an image by its unique key we will generate. This function will not be used in the context of this blog post.

https://medium.com/media/20b047710916f13522496f63862260ec/href

Next, we need to build a function in which helps us upload only the unique images. The differentiator will be the unique link within the scope of a classsification:

https://medium.com/media/eabafb8edaa02b2a2b186933af47e2c9/href

This function takes link, which is the link to the image, and cs, which is the classifier of the image. If an image with the same link does not already exist within our storage, it returns None. The reason we don’t query the entire database for the uniqueness is simple. First, it wouldn’t be efficient in the long run. Second, same images could have different classifications. Imagine the logo of Apple, the company. It is also Apple, the fruit. If we are classifying between Apple Logo and Blackberry Logo, and if the image is in Apple classification only, there is a chance that the model could fail to interpret. This approach might create unnoticable duplicate images with different classifications But in the long run it would prove useful.
Here’s an example of the following manual query that we know already exists in the Couchbase Server:

https://medium.com/media/575126ff92890dfd4a55ddeb8b0bdeee/href

Now that we have the uniqueness out of the way, let us focus on randomness of a given query. This part will also not be used in this blog post, but for future purposes. The function will give the number of images in a subset of classifications. It is useful for determening a random number within range of maximum number size.

https://medium.com/media/921217f83776e7091138878020d46db8/href

Here’s another example query for the size of Orange Images in the storage server:

https://medium.com/media/b4fd9dbe072af6dbda1cee0fa99610ac/href

Now, let’s define a function that randomly picks a number for us. For this we will define a random integer outside the scope of the query we will feed. But we will define this random integer with the previous function we constructed:

https://medium.com/media/02d97cb4f851bd174af47f84e799700b/href

Here is an example query with the random number 37, which is between 0 and 103(from 104 images of Orange):

https://medium.com/media/acc18139c456c2af9184908f99e06752/href

We have eveything we need for this week and the coming week’s blog post now. Let’s redefine what we have already defined. A pydantic model for the Query object we pass to the endpoint:

https://medium.com/media/391cc91e6a767418463d3083ca3f2aca/href

Again, the API key mentioned here is your unique API key for SerpApi. It can be accessed via Api Key page.

Here’s the redefinition of the Download Class. We can omit some parts to keep uniqueness, and add new ones like database object.

https://medium.com/media/ba3a75c2a658207c3ac5e2ff845d1e44/href

There is no change in the function of SerpApi’s Google Images API implementation. However, let me restate one amazing fact again. If the query you are searching is cached, you can get it free of charge.

https://medium.com/media/c60f736e02cb1e09fbfdf77f863104fc/href

Let’s define another function for downloading and image and returning it as a Document object:

https://medium.com/media/f5eef7f918045f3d9698515f425a902d/href

Next, we define the function to insert the Document objects we get from the previous function. We check for the uniqueness of the link to reduce duplicates in this function also:

https://medium.com/media/35cc9476f91485c9e2242e14a896529c/href

Here, we can iterate through all the links gathered from SerpApi’s Google Images Scraper API, and upload them to our Couchbase Storage Server:

https://medium.com/media/8ac16ccdcefca64902e660ad4d2c9cfc/href

Now that we have everything in place, let us define the add_couchbase.py function within our main.py:

https://medium.com/media/5c853dbc862473fdeb4f622a583dea7e/href

Collecting Images and Storing Them with Classifications

Let’s put everything we made into practice. Run the server with the following command:

https://medium.com/media/51b27def1508ed69499786458230fb11/href

and then head to localhost:8000/docs to try out /add_to_db/ endpoint with the following request body:

https://medium.com/media/7e992b7226d6f518fba3bf97118857d3/href

If you observe the terminal, you will see that the process of updating the database is happening in real time:

https://medium.com/media/3096523737269b363331c0e0a5ef6f99/href

If we query the database even before it finished, we can see that the entries with the classification label Mango are being updated. Here’s the command for it:

https://medium.com/media/3dc310a7e158740adac9afcba1f2a73b/href

It already added 67 unique images to the database we can use to train our network in the coming weeks.

Conclusion

N1QL Databases such as Couchbase have fast response times compared to other Storage Databases. In this regard, I thought refactoring this part as an essential step before takign any further actions. This implementation will provide us with the speed and scalability we hope to support us in the coming week’s challanges in comparing different approaches in Image Classification. It is also important for async handling of some functions such as inserting Images instead of naming them using the OS.

I am grateful the user for their attention, and all the support of Brilliant People of SerpApi. In the coming weeks, we will explore utilizing async handling, and coroutines to lower the response time of the actions. If you are interested in the tutorials just like these, feel free to sign up to SerpApi Blog, or follow us on the medium you have found us. Your support is the primary factor in making such tutorials.

Originally published at https://serpapi.com on June 15, 2022.


Creating N1QL Labelled Image Database using Couchbase, FastAPI, and SerpApi was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This is a part of the series of blog posts related to Artificial Intelligence Implementation. If you are interested in the background of the story or how it goes:

In the previous weeks, we explored how to create your own image dataset using SerpApi’s Google Images Scraper API automatically, and used those images to automatically train a network using a simple command object to be passed to FastAPI. This week we will improve the database creation method by using Couchbase as a Storage Server, and will show how to fetch a random element from a given subset.

Couchbase Configuration

We will be needing Couchbase Community server, and Python SDK for Couchbase for this project. You can find relevant information on Couchbase Docs.

For this tutorial we will be using CouchBase version 7.1.0 for Debian 11:

You can access it from this link

Once you install it, define username and password from the server destination ( http://kagermanov:8091 in my case), you will be greeted with such dashboard:

This means you have successfully deployed the server.
For those of you who want to stop the background process on Linux, you can type sudo systemctl stop couchbase-server to stop the server at your will.

Head to Buckets on the left hand menu and add a new bucket called images from ADD BUCKET button:

Make sure to choose a ram amount that won’t force your local system into frenzy. I

Now, You need to add a scope and collection within this bucket via Scopes & Collections button:

Add a scope named image, and within it, a collection named labelled_image:

Next, head to playground where you can do a manual query, and run the following:

Lastly, make sure you install the Couchbase Python SDK via pip, and everything is set for our server.

Automatic Image Collector

Let’s create a seperate file within our project called add_couchbase.py. This will be the refactored version of add.py which was automatically gathering images with a certain query.
Here are the requirements for it:

To break them down for their specific usecase:

Let’s define a pydantic base model for an image to be uploaded to the Couchbase Storage Server:

id will be a unique uuid of the image for image to be called manually in the future.
classification is the query given to SerpApi's Engine.
base64 will be the the string representation of the image to be recreated within a training session.
uri will represent the url we fetch the image from.

Let’s initialize our Storage Database in a class:

Here’s the function for inserting an image to the Storage server with a unique id:

doc in this context represents the Document object we store the image inside to be uploaded to the Couchbase Server.

Let’s have another function to call an image by its unique key we will generate. This function will not be used in the context of this blog post.

Next, we need to build a function in which helps us upload only the unique images. The differentiator will be the unique link within the scope of a classsification:

This function takes link, which is the link to the image, and cs, which is the classifier of the image. If an image with the same link does not already exist within our storage, it returns None. The reason we don't query the entire database for the uniqueness is simple. First, it wouldn't be efficient in the long run. Second, same images could have different classifications. Imagine the logo of Apple, the company. It is also Apple, the fruit. If we are classifying between Apple Logo and Blackberry Logo, and if the image is in Apple classification only, there is a chance that the model could fail to interpret. This approach might create unnoticable duplicate images with different classifications But in the long run it would prove useful.
Here's an example of the following manual query that we know already exists in the Couchbase Server:

Now that we have the uniqueness out of the way, let us focus on randomness of a given query. This part will also not be used in this blog post, but for future purposes. The function will give the number of images in a subset of classifications. It is useful for determening a random number within range of maximum number size.

Here’s another example query for the size of Orange Images in the storage server:

Now, let’s define a function that randomly picks a number for us. For this we will define a random integer outside the scope of the query we will feed. But we will define this random integer with the previous function we constructed:

Here is an example query with the random number 37, which is between 0 and 103(from 104 images of Orange):

We have eveything we need for this week and the coming week’s blog post now. Let’s redefine what we have already defined. A pydantic model for the Query object we pass to the endpoint:

Again, the API key mentioned here is your unique API key for SerpApi. It can be accessed via Api Key page.

Here’s the redefinition of the Download Class. We can omit some parts to keep uniqueness, and add new ones like database object.

There is no change in the function of SerpApi’s Google Images API implementation. However, let me restate one amazing fact again. If the query you are searching is cached, you can get it free of charge.

Let’s define another function for downloading and image and returning it as a Document object:

Next, we define the function to insert the Document objects we get from the previous function. We check for the uniqueness of the link to reduce duplicates in this function also:

Here, we can iterate through all the links gathered from SerpApi’s Google Images Scraper API, and upload them to our Couchbase Storage Server:

Now that we have everything in place, let us define the add_couchbase.py function within our main.py:

Collecting Images and Storing Them with Classifications

Let’s put everything we made into practice. Run the server with the following command:

and then head to localhost:8000/docs to try out /add_to_db/ endpoint with the following request body:

If you observe the terminal, you will see that the process of updating the database is happening in real time:

If we query the database even before it finished, we can see that the entries with the classification label Mango are being updated. Here's the command for it:

It already added 67 unique images to the database we can use to train our network in the coming weeks.

Conclusion

N1QL Databases such as Couchbase have fast response times compared to other Storage Databases. In this regard, I thought refactoring this part as an essential step before takign any further actions. This implementation will provide us with the speed and scalability we hope to support us in the coming week’s challanges in comparing different approaches in Image Classification. It is also important for async handling of some functions such as inserting Images instead of naming them using the OS.

I am grateful the user for their attention, and all the support of Brilliant People of SerpApi. In the coming weeks, we will explore utilizing async handling, and coroutines to lower the response time of the actions. If you are interested in the tutorials just like these, feel free to sign up to SerpApi Blog, or follow us on the medium you have found us. Your support is the primary factor in making such tutorials.

Originally published at https://serpapi.com on June 15, 2022.


Creating N1QL Labelled Image Database using Couchbase, FastAPI, and SerpApi was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


Print Share Comment Cite Upload Translate
APA
Emirhan Akdeniz | Sciencx (2024-03-29T01:45:07+00:00) » Creating N1QL Labelled Image Database using Couchbase, FastAPI, and SerpApi. Retrieved from https://www.scien.cx/2022/06/16/creating-n1ql-labelled-image-database-using-couchbase-fastapi-and-serpapi/.
MLA
" » Creating N1QL Labelled Image Database using Couchbase, FastAPI, and SerpApi." Emirhan Akdeniz | Sciencx - Thursday June 16, 2022, https://www.scien.cx/2022/06/16/creating-n1ql-labelled-image-database-using-couchbase-fastapi-and-serpapi/
HARVARD
Emirhan Akdeniz | Sciencx Thursday June 16, 2022 » Creating N1QL Labelled Image Database using Couchbase, FastAPI, and SerpApi., viewed 2024-03-29T01:45:07+00:00,<https://www.scien.cx/2022/06/16/creating-n1ql-labelled-image-database-using-couchbase-fastapi-and-serpapi/>
VANCOUVER
Emirhan Akdeniz | Sciencx - » Creating N1QL Labelled Image Database using Couchbase, FastAPI, and SerpApi. [Internet]. [Accessed 2024-03-29T01:45:07+00:00]. Available from: https://www.scien.cx/2022/06/16/creating-n1ql-labelled-image-database-using-couchbase-fastapi-and-serpapi/
CHICAGO
" » Creating N1QL Labelled Image Database using Couchbase, FastAPI, and SerpApi." Emirhan Akdeniz | Sciencx - Accessed 2024-03-29T01:45:07+00:00. https://www.scien.cx/2022/06/16/creating-n1ql-labelled-image-database-using-couchbase-fastapi-and-serpapi/
IEEE
" » Creating N1QL Labelled Image Database using Couchbase, FastAPI, and SerpApi." Emirhan Akdeniz | Sciencx [Online]. Available: https://www.scien.cx/2022/06/16/creating-n1ql-labelled-image-database-using-couchbase-fastapi-and-serpapi/. [Accessed: 2024-03-29T01:45:07+00:00]
rf:citation
» Creating N1QL Labelled Image Database using Couchbase, FastAPI, and SerpApi | Emirhan Akdeniz | Sciencx | https://www.scien.cx/2022/06/16/creating-n1ql-labelled-image-database-using-couchbase-fastapi-and-serpapi/ | 2024-03-29T01:45:07+00:00
https://github.com/addpipe/simple-recorderjs-demo