Embeddings for Everyday Developers

Imagine being able to recommend documents that feel relevant even if they don’t share a single keyword. Or detecting duplicate support tickets written in completely different language. Or grouping users not by rigid segments, but by how they actually b…


This content originally appeared on Level Up Coding - Medium and was authored by Alexander Korostin

Imagine being able to recommend documents that feel relevant even if they don’t share a single keyword. Or detecting duplicate support tickets written in completely different language. Or grouping users not by rigid segments, but by how they actually behave. These are the kinds of capabilities embeddings unlock.

Embeddings turn your data like text, users, products or images into something comparable at a conceptual level. They let you work with meaning, not just structure. That means better search, smarter recommendations, more flexible classification, and more human-like matching.

And you don’t need to be building an AI-powered product to take advantage of this. Embeddings are increasingly accessible to everyday engineers, thanks to APIs, open-source models, vector databases, and even extensions in systems like PostgreSQL. You can start using them in practical ways without training models or deploying machine learning infrastructure.

What are Embeddings?

Imagine you have data that’s complex and hard to compare directly — like sentences, images, or user behavior logs. Embeddings are a way to convert all that complexity into a simple, fixed-length list of numbers called a vector. Each component in this vector captures some meaningful aspect of the original data, such as topics discussed in a document, visual patterns in an image, or behavioral signals from a user’s actions.

This process is a form of dimension reduction. Instead of working with thousands of sparse or raw features, like every word in a vocabulary or every pixel in an image, you reduce that data into a dense vector that preserves the most important (and, hopefully, meaningful) information for comparison.

Because embeddings live in this smaller, structured space, you can compare them using straightforward math. The key idea is that similar things get mapped to points that are close to each other.

To measure how close two vectors are, we often use cosine similarity, which calculates the angle between them:

Here, A and B represent the vectors of two pieces of data, A⋅B is their dot product, and ||A||, ||B|| are their lengths. The result ranges from -1 (very different) to 1 (very similar).

Embeddings do more than just measure similarity. They can capture relationships too. A classic example is this:

This shows that the components of the vector keep enough information about the original concepts to represent meaningful patterns and connections.

Where Embeddings Are Used in the Real World

One of the most prominent uses is in search and recommendation systems. Traditional search relies heavily on exact keyword matches, which can feel rigid and blind to nuance. Embeddings allow you to search by meaning. For instance, a query like “restaurants with outdoor seating” can return places that mention “terraces,” “patios,” or “alfresco dining,” even if the exact words never appear in the data. This kind of semantic search makes results feel more intuitive and human.

The same principle powers recommendations. When suggesting products, articles, or videos, embeddings let the system go beyond surface features. A product can be recommended not just because it shares a category, but because its description, image, and user reviews form a vector that’s close to something the user liked before. Similarly, users themselves can be embedded based on their activity, enabling recommendations tailored to behavioral patterns rather than generic segments. Even code search benefits from embeddings — developers can find relevant snippets or functions by describing what they want, rather than guessing at exact class or variable names.

Natural language understanding is another domain where embeddings are indispensable. For tasks like sentiment analysis, spam detection, or topic classification, embeddings give machine learning models a dense, rich input that reflects the actual meaning of a document. Chatbots and virtual assistants use sentence embeddings to parse user intent more flexibly, understanding that “Will it rain tomorrow?” and “What’s tomorrow’s weather?” point to the same underlying question. Embeddings also play a foundational role in modern translation systems and summarizers by enabling the model to grasp the semantic structure of language across contexts.

In computer vision, embeddings offer similar capabilities for understanding images. Instead of comparing pixel-by-pixel, systems embed images based on visual features. This allows for image search and recommendation that feels perceptual — finding photos that look similar, not just ones with matching metadata or colors. Facial recognition systems, too, compare embeddings of facial features rather than raw images, making identification both faster and more robust. Even tasks like suggesting visually similar clothing or artwork rely on these embeddings of style and shape.

Embeddings also shine in tasks involving anomaly detection. In fraud detection, for example, transactions are embedded into a common space. Most legitimate behaviors form dense clusters, while unusual or fraudulent ones stand out as distant outliers. Network monitoring systems can embed traffic patterns or logs and alert on anything that deviates from known norms. This technique generalizes well to system monitoring and failure detection — embedding logs, metrics, or traces allows subtle errors to be caught earlier by recognizing unexpected changes in behavior.

When it comes to organizing or exploring large datasets, embeddings provide a powerful way to discover structure. Customer segmentation becomes more insightful when you embed users by their actual behavior and preferences rather than arbitrary attributes. Similarly, document clustering (whether for legal documents, support tickets, or research papers ) works far better when driven by semantic embeddings. And when you want to visualize complex, high-dimensional data, embedding it first and then projecting it into 2D or 3D can reveal surprising patterns and relationships.

Personalization is another area that benefits deeply from embeddings. A user can be represented not just by their demographics but by a rich embedding based on what they do, what they click, what they linger on. This allows systems to dynamically adapt content (like product listings or news articles) to each individual. Interfaces can shift in real time based on the user’s position in this semantic space.

At the heart of all these applications is the power of embeddings to reduce high-dimensional sparse data into something structured and semantically meaningful.

How Are Embeddings Created?

Imagine you have some raw data and you want to turn it into a form your system can understand and compare. Creating embeddings means feeding that data into a neural network that transforms it into a compact vector capturing its essential meaning. Whether you’re using a pretrained model or training your own, this process lets you convert complex inputs into numbers your application can easily work with.

Pretrained vs. Custom Models

Most developers start with pretrained models, and for good reason. They’re fast to integrate, require no training data of your own, and are often more than enough for general-purpose tasks. Services like OpenAI’s embedding APIs, models from Hugging Face’s sentence-transformers, or tools like CLIP for image embeddings provide plug-and-play solutions. You pass in a piece of text, and they return a vector. That’s it.

But sometimes, general-purpose isn’t good enough. Maybe your application deals with highly specialized language like legal contracts, medical records, customer support tickets full of domain-specific jargon. In these cases, pretrained models might miss subtle distinctions that matter in your context. That’s where custom-trained models come in. By fine-tuning a model on your own labeled data, you can teach it to embed your inputs in a way that reflects your domain’s unique semantics.

Fine-tuning takes more work, you’ll need representative training data, some ML infrastructure, and a bit of patience. But the result is a model that “speaks your language” and generates embeddings that are tuned to your problem space.

How to Generate an Embedding in Practice

Let’s look at how to actually turn a piece of text into a vector. Whether you prefer a hosted API or want everything to run locally, you can get from raw text to embedding in just a few lines of code.

If you’re already using OpenAI services, generating an embedding is straightforward. You need an API key and the openai Python package:

from openai import OpenAI

# Initialize the client with your API key
client = OpenAI(api_key="your-api-key")

response = client.embeddings.create(
model="text-embedding-3-small",
input="Help, I can't log into my account"
)

embedding = response.data[0].embedding
print(embedding[:5]) # prints first few values, e.g., [0.128, -0.442, ..., 0.209]

The result is a 1536-dimensional vector that captures the meaning of the input sentence. You can store this in a database, feed it into a search index, or compare it with other vectors using cosine similarity.

For local processing or private infrastructure, sentence-transformers is a great option. Install it with pip install sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

text = "Help, I can't log into my account"
embedding = model.encode(text)

print(embedding[:5]) # again, just to show a preview

This will give you a 384-dimensional vector that works well for many tasks, especially if you’re dealing with short texts like titles, comments, or support tickets.

So What’s the Difference?

OpenAI models tend to be larger and trained on broader data, so they may give more nuanced embeddings, especially for abstract or less common inputs. On the other hand, sentence-transformers are free to run, lightweight, and often good enough for practical use.

In both cases, you don’t need to train anything yourself. Embedding generation has become almost plug-and-play.

When You Might Fine-Tune

Pretrained embedding models are powerful out of the box, but there are cases where they fall short, especially when working in a highly specific or specialized context.

If you’re dealing with domain-specific language (say, clinical terms in medicine, contract jargon in legal documents, or dense financial reports) off-the-shelf models might not “understand” your content in a useful way. Fine-tuning helps adapt the embeddings to reflect the unique patterns and semantics of your field.

The same applies when your product or platform has company-specific terminology, UI text, or user interactions that don’t appear in general datasets. For example, if your users tend to write messages like “can’t launch zFlow panel,” the model won’t know what “zFlow” means unless it’s exposed to it during training.

However, fine-tuning comes with added complexity: you need labeled or structured data, machine learning know-how, and access to compute infrastructure. For many teams, this is where collaboration with ML engineers begins. But for others, using a high-quality pretrained model is often good enough and much simpler to integrate.

Where to Run the Models

After choosing an embedding model, the next practical question is: where should it live and run? For many developers, the simplest option is to run small models locally on a laptop for prototyping or within backend servers for production. Lightweight models from the sentence-transformers library, such as all-MiniLM-L6-v2, can easily run on CPU or modest GPU setups and generate embeddings with minimal overhead.

However, as your needs grow or as you shift toward more powerful or customizable models, you may want to consider running larger models with dedicated infrastructure. For example, a developer working with the LLaMA 3 8B Instruct model for text embeddings might host it using Ollama, a tool that makes it easy to run and query large language models on local machines or servers. This gives you control over latency, removes external dependencies, and is particularly useful when privacy or cost is a concern.

Here’s a practical example. Let’s say you’ve set up LLaMA 3 8B with Ollama. Generating an embedding with it is just as straightforward as using sentence_transformers:

import ollama

response = ollama.embed(
model='llama3',
input='Help, I can\'t log into my account'
)

vector = response['embedding']
print(vector[:5]) # show the first few dimensions

For higher throughput or enterprise-scale workloads, another option is vLLM, a high-performance inference engine designed for serving large language models efficiently. vLLM allows you to stream inference at scale, making it a great choice for backend services handling large volumes of embedding requests.

If your application backend isn’t written in Python, you might be wondering whether it’s practical to work with embeddings without relying on a Python stack. The good news is: yes, it’s entirely doable, and you have a few solid paths forward, depending on your needs.

One common approach is to treat the embedding logic as a separate service, usually implemented in Python (where the ecosystem is mature), and expose it over a lightweight HTTP API. This way, your backend doesn’t need to run Python code directly. Instead, it simply makes HTTP calls to the embedding service, much like it would to any external API. You can containerize this Python service using Docker, deploy it alongside your Node.js app, and scale it independently.

However, if you prefer a fully self-contained solution, there are some emerging libraries that support local inference. For instance, one of the most promising for JavaScript is @xenova/transformers, which runs some transformer models using ONNX. While it currently supports only a limited range of models and is not suitable for high-load production usage, it’s surprisingly capable for lightweight tasks and proof-of-concept work. There’s also the option of calling out to tools like Ollama via its HTTP interface from backend applications. Ollama abstracts model loading and inference so you can get embeddings with a simple fetch call, all without writing a line of Python.

Using Embeddings with pgvector

Once embeddings are generated, the next challenge is where to store them and how to use them effectively within an application. While vector databases like Pinecone or Weaviate are popular choices, many engineers prefer not to introduce entirely new infrastructure, especially when they are already relying on a robust relational database. Enter pgvector, a PostgreSQL extension that offers a powerful and pragmatic solution.

With pgvector, developers can store and search embeddings directly within PostgreSQL. This enables similarity search to be performed alongside traditional structured SQL queries, using standard table joins, filters, and indexes. It's a seamless way to bridge semantic search with conventional database design.

At the core of pgvector lies a new data type called vector(n), where n represents the dimensionality of the embedding. For instance, when using a model like text-embedding-3-small from OpenAI, the resulting embeddings are 1,536-dimensional vectors. The corresponding column in your schema would therefore be declared as vector(1536).

To enable pgvector in your database, you run:

CREATE EXTENSION IF NOT EXISTS vector;

You can then define a table like this:

CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
embedding vector(1536)
);

On the backend, after generating an embedding using OpenAI’s API, you can store it directly with Python and psycopg:


from openai import OpenAI
import psycopg

# Initialize OpenAI client
client = OpenAI(api_key="your-api-key")
conn = psycopg.connect("postgresql://user:pass@localhost/db")

# Generate the embedding
text = "Help, I can't log into my account"
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
vector = response.data[0].embedding

# Insert into the table
with conn.cursor() as cur:
cur.execute(
"INSERT INTO documents (title, content, embedding) VALUES (%s, %s, %s)",
("Login Issue", text, vector)
)
conn.commit()

conn.close()

To find the most relevant documents for a new query, you would embed the query string and then rank rows by vector distance. Cosine similarity is the most common metric and is supported directly using the <-> operator:

SELECT id, title, content
FROM documents
ORDER BY embedding <-> '[0.128, -0.442, ..., 0.209]'::vector
LIMIT 5

For performance, you can build an approximate nearest neighbor index using ivfflat:

CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Update PostgreSQL query planner statistics
ANALYZE documents;

The first instruction creates an ivfflat index on the embedding column of the documents table to speed up similarity searches. The ivfflat index type is designed for approximate nearest neighbor queries. Instead of scanning every vector, it groups them into clusters (in this case, 100 clusters, as set by lists = 100) and searches only the most relevant ones. This makes searches much faster, especially when dealing with large datasets. The vector_cosine_ops part specifies that the index should use cosine similarity, which is often preferred when comparing embeddings, since it focuses on the direction of the vectors rather than their exact values.

Note that cosine distance requires embeddings to be normalized to unit length before insertion. Some libraries return normalized vectors; otherwise, you can normalize them manually in Python:

import numpy as np

def normalize(vec):
norm = np.linalg.norm(vec)
return vec if norm == 0 else (vec / norm)

That said, there are practical considerations. All vectors in a column must have the same dimensionality, which must match the model you’re using. Changing embedding models later usually requires a full reindex of the data. Also, while pgvector supports ANN indexing, it is not distributed, so for extremely large-scale vector search workloads, dedicated systems like Qdrant or Pinecone may still be better suited.

Sorting and Filtering with Embeddings in PostgreSQL

Once you’ve set up a pgvector-powered embedding search, a natural next step is to combine it with other features of your application: filtering by category, sorting by time or popularity, or applying user-specific conditions. But here’s where you’ll run into a few important limitations and trade-offs.

PostgreSQL’s ivfflat index is designed specifically for one purpose: finding the nearest neighbors to a given vector, using the <-> operator. This means you can very efficiently do something like:

SELECT * FROM products
ORDER BY embedding <-> '[0.12, 0.98, ..., -0.33]'::vector
LIMIT 10;

But that’s where its indexing capabilities stop. You cannot use ivfflat indexes for sorting by other fields like created_at or rating, nor can you create compound indexes combining the vector and another column. So if your application needs to show results by both relevance and freshness, you’ll need to handle this manually.

The common workaround is a two-step query. First, you retrieve a larger pool of the most relevant matches using the vector similarity search, say, the top 100 items. Then, you apply traditional filtering, sorting, or pagination logic on that subset, such as ordering by created_at or filtering out content:

SELECT *
FROM (
SELECT *
FROM articles
WHERE status = 'published'
ORDER BY embedding <-> '[...]'::vector
LIMIT 100
) AS sub
ORDER BY published_at DESC
LIMIT 10;

This approach works because the initial vector search narrows the field to only reasonably relevant candidates, making it feasible to do more complex operations on a smaller, manageable set. It’s not perfect though. If your filters are too strict, you might end up with too few results, and if your candidate pool is too small, you risk missing good matches. But in practice, this trade-off is often acceptable, and tuning the candidate size gives you control over the balance between relevance and flexibility.

For use cases requiring deep integration of filtering, ranking, and vector search, you might consider specialized vector databases such as Qdrant, Weaviate, or FAISS. These platforms offer stronger native support for combining metadata filters with approximate nearest neighbor search. However, if your requirements fit within PostgreSQL’s capabilities, pgvector remains a solid and surprisingly powerful option — just keep its limitations in mind.

Conclusion

I hope this article gives you the confidence to start using embeddings in your projects. Dive in, experiment, and enjoy discovering what they can do for you!


Embeddings for Everyday Developers was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Alexander Korostin


Print Share Comment Cite Upload Translate Updates
APA

Alexander Korostin | Sciencx (2025-07-25T14:18:41+00:00) Embeddings for Everyday Developers. Retrieved from https://www.scien.cx/2025/07/25/embeddings-for-everyday-developers/

MLA
" » Embeddings for Everyday Developers." Alexander Korostin | Sciencx - Friday July 25, 2025, https://www.scien.cx/2025/07/25/embeddings-for-everyday-developers/
HARVARD
Alexander Korostin | Sciencx Friday July 25, 2025 » Embeddings for Everyday Developers., viewed ,<https://www.scien.cx/2025/07/25/embeddings-for-everyday-developers/>
VANCOUVER
Alexander Korostin | Sciencx - » Embeddings for Everyday Developers. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/07/25/embeddings-for-everyday-developers/
CHICAGO
" » Embeddings for Everyday Developers." Alexander Korostin | Sciencx - Accessed . https://www.scien.cx/2025/07/25/embeddings-for-everyday-developers/
IEEE
" » Embeddings for Everyday Developers." Alexander Korostin | Sciencx [Online]. Available: https://www.scien.cx/2025/07/25/embeddings-for-everyday-developers/. [Accessed: ]
rf:citation
» Embeddings for Everyday Developers | Alexander Korostin | Sciencx | https://www.scien.cx/2025/07/25/embeddings-for-everyday-developers/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.