Latency Slayer: a Redis 8 semantic cache gateway that makes LLMs feel instant

This content originally appeared on DEV Community and was authored by Mohit Agnihotri

This is a submission for the Redis AI Challenge: Real-Time AI Innovators.

What I Built

Latency Slayer is a tiny Rust reverse-proxy that sits in front of any LLM API.

It uses embeddings + vector search in Redis 8 to detect “repeat-ish” prompts and return a cached answer instantly. New prompts are answered once by the LLM and stored with per-field TTLs, so only the response expires while metadata persists.

Why it matters: dramatically lower latency and cost, with transparent drop-in integration for any chat or RAG app.

Core tricks

Redis Query Engine + HNSW vectors (COSINE) to find semantically similar earlier prompts.
Hash field expiration (HSETEX / HGETEX) so we can expire just the “response” field without deleting the whole hash.
Redis Streams for real-time hit-rate & latency metrics, rendered in a tiny dashboard.

Demo

Video: https://youtu.be/lA-d4WO0Fjg
Repo: https://github.com/mohitagnihotri/latency_slayer

Screenshots:

How I Used Redis 8

Vector search (HNSW, COSINE) on a HASH document that stores an embedding field (FP32, 1536-d from OpenAI text-embedding-3-small).
Per-field TTL on hashes: HSETEX to set the response field and its TTL in a single step; HGETEX to read and optionally refresh TTLs. This gives granular cache lifetimes without deleting other fields (like usage or model metadata).
Redis Streams: XADD analytics:cache per request; the dashboard subscribes and renders hit rate, token savings, and latency deltas in real time.

Data model (simplified)

cache:{fingerprint} → Hash fields: prompt, resp, meta, usage, created_at (with resp having its own TTL)
vec:{fingerprint} → Vector field + tags (model, route, user)
Stream: analytics:cache with {event, hit, latency_ms, tokens_saved}

Why Redis 8?

New field-level expiration commands on hashes make cache lifecycle clean and safe.
New int8 vectors keep memory low and speed high.
Battle-tested Streams/PubSub give us real-time observability with a tiny footprint.

What’s next

Prefetch: predict likely next prompts and warm them proactively.
Hybrid filters: combine vector similarity + tags (model/route) for stricter cache hits.
Cold-start tuning: adapt hit threshold by route and user cohort.
Currently storing FP32 vectors for simplicity; INT8 quantization is planned to lower memory and speed up search

This content originally appeared on DEV Community and was authored by Mohit Agnihotri

Print Share Comment Cite Upload Translate Updates

APA

Mohit Agnihotri | Sciencx (2025-08-10T17:57:45+00:00) Latency Slayer: a Redis 8 semantic cache gateway that makes LLMs feel instant. Retrieved from https://www.scien.cx/2025/08/10/latency-slayer-a-redis-8-semantic-cache-gateway-that-makes-llms-feel-instant/

MLA

" » Latency Slayer: a Redis 8 semantic cache gateway that makes LLMs feel instant." Mohit Agnihotri | Sciencx - Sunday August 10, 2025, https://www.scien.cx/2025/08/10/latency-slayer-a-redis-8-semantic-cache-gateway-that-makes-llms-feel-instant/

HARVARD

Mohit Agnihotri | Sciencx Sunday August 10, 2025 » Latency Slayer: a Redis 8 semantic cache gateway that makes LLMs feel instant., viewed ,<https://www.scien.cx/2025/08/10/latency-slayer-a-redis-8-semantic-cache-gateway-that-makes-llms-feel-instant/>

VANCOUVER

Mohit Agnihotri | Sciencx - » Latency Slayer: a Redis 8 semantic cache gateway that makes LLMs feel instant. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/08/10/latency-slayer-a-redis-8-semantic-cache-gateway-that-makes-llms-feel-instant/

CHICAGO

" » Latency Slayer: a Redis 8 semantic cache gateway that makes LLMs feel instant." Mohit Agnihotri | Sciencx - Accessed . https://www.scien.cx/2025/08/10/latency-slayer-a-redis-8-semantic-cache-gateway-that-makes-llms-feel-instant/

IEEE

" » Latency Slayer: a Redis 8 semantic cache gateway that makes LLMs feel instant." Mohit Agnihotri | Sciencx [Online]. Available: https://www.scien.cx/2025/08/10/latency-slayer-a-redis-8-semantic-cache-gateway-that-makes-llms-feel-instant/. [Accessed: ]

rf:citation

» Latency Slayer: a Redis 8 semantic cache gateway that makes LLMs feel instant | Mohit Agnihotri | Sciencx | https://www.scien.cx/2025/08/10/latency-slayer-a-redis-8-semantic-cache-gateway-that-makes-llms-feel-instant/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.