LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything ๐ŸŽ‰

Introducing our team’s latest creation – a revolutionary approach to local RAG applications

TL;DR: We built LEANN, the world’s most “lightweight” semantic search backend that achieves 97% storage savings compared to traditional solutions while main…


This content originally appeared on DEV Community and was authored by Yichuan Wang

Introducing our team's latest creation - a revolutionary approach to local RAG applications

TL;DR: We built LEANN, the world's most "lightweight" semantic search backend that achieves 97% storage savings compared to traditional solutions while maintaining high accuracy and performance. Perfect for privacy-focused RAG applications on your local machine.

๐Ÿš€ Quick Start

Want to try it right now? Run this single command on your MacBook:

uv pip install leann

๐Ÿ“š Repository & Paper

What is RAG Everything?

RAG (Retrieval-Augmented Generation) has become the first true "killer application" of the LLM era. It seamlessly integrates private data that wasn't part of the training set into large model inference pipelines.

Privacy scenarios are absolutely the most important deployment direction - especially for your personal data and in highly sensitive domains like healthcare and finance.

RAG Everything starts from the most essential needs of personal laptops. We natively support a bunch of out-of-the-box scenarios (currently supporting macOS and Linux, Windows users need WSL):

๐Ÿ” Supported Applications

1. File System RAG

Replace Spotlight Search entirely. Spotlight not only consumes disk space but only does keyword matching. We transform it into a semantic search powerhouse.

2. Apple Mail RAG

Easily find answers to personal questions (like "How many courses should Berkeley EECS freshmen take in their first semester?").

3. Google Browser History RAG

Track down those vague search records you suddenly forgot - the ones you only have a fuzzy impression of.

4. WeChat Chat History RAG

This is what I use most! I've used LEANN to summarize conversations with friends and extract research ideas + slides. We implemented a small hack to bypass WeChat's encrypted database and extract chat records - don't worry, everything stays local with zero leakage.

5. Claude Code Semantic Search Enhancement ๐Ÿ”ฅ

One of Claude Code's biggest pain points is that it's always grepping and finding nothing. LEANN is one of the first open-source projects to bring true semantic search to Claude Code through an MCP server - enabling it with just one line of code.

These are just the scenarios we think have the most "potential" - we'll continuously integrate more features based on user feedback until it becomes a personalized local Agent that remembers your LLM memory and masters all your private data.

Why LEANN? The Technical Deep Dive

The Problem with Current Vector Databases

Current mainstream vector databases excel in latency - most queries complete within 10ms-100ms even with millions of data points. In RAG's search + generation pipeline, search time is "far below" generation time, especially with reasoning models and long chain-of-thought processes.

Latency isn't the bottleneck in RAG - storage is.

The most important RAG deployment scenario is privacy, especially on personal computers where resources are naturally scarce. Consider this reality check:

For high recall in text RAG, you need fine chunk sizes โ†’ embedding storage becomes 3-10x the original text size โ†’ Real example: 70GB raw data โ†’ 220GB+ index storage

Our Solution: Trade Storage for Compute

LEANN makes a bold design choice: replace storage with recomputation.

Core Innovation

Key Observation: In graph-based indices, a query actually accesses very few nodes โ†’ Why store all embeddings?

Our pipeline:

  1. Build a normal vector store
  2. Delete all embeddings, keeping only the Proximity Graph to record relationships between data chunks
  3. Convert memory loading to recomputation during inference
  4. Leverage lightweight embedding models for efficient graph-based recomputation

Graph Structure Pruning

We observed significant visit skewness patterns in post-RNG graphs. Our strategy:

  • Keep high-degree nodes to ensure connectivity
  • Limit out-edges for low-degree nodes while allowing unlimited in-edges
  • Use heuristics to preserve only essential high-degree nodes

Results That Matter

โœ… 97%+ reduction in index size

โœ… <2 seconds retrieval time on 3090-level hardware

โœ… 90%+ Top-3 recall on real RAG benchmarks

โœ… Zero vector storage - all in 200GB+ embedding spaces

Note: Under this high compression rate, PQ, OPQ, and even state-of-the-art RaBitQ cannot guarantee high accuracy - proven in our paper.

Performance Optimizations

  • Adaptive pipeline combining coarse-grained and accurate search
  • Efficient GPU batching for better utilization
  • ZMQ communication using distances instead of embeddings
  • CPU/GPU overlapping
  • Selective caching of high-degree nodes

The Vision: RAG Everything

We're continuously maintaining this open-source project at Berkeley SkyLab with full-stack optimization across algorithms, applications, system design, vector databases, and kernel acceleration.

Our Goals

๐ŸŽฏ Seamlessly connect all your private data

๐Ÿง  Build long-term local AI memory and agents

๐Ÿ’ป Zero cloud dependency, low-cost operation

Technical Details & Future Work

If you want to dive deeper into implementation details, check our arXiv paper and repository. I can write a follow-up post covering all implementation specifics if there's interest.

We hope LEANN inspires more vector search researchers to think about vector databases from a different angle, especially in popular RAG settings. We were fortunate to discuss our work at SIGMOD/ICML vector search workshops this year and received great recognition from the community.

Get Involved

  • โญ Star our repository
  • ๐Ÿค Contribute to the project
  • ๐Ÿ”— Join our Berkeley SkyLab team

Ready to transform your local machine into a RAG powerhouse?

uv pip install leann

What private data would you want to RAG first? Drop a comment below! ๐Ÿ‘‡

Tags

#rag #vectordatabase #semanticsearch #privacy #opensource #machinelearning #ai


This content originally appeared on DEV Community and was authored by Yichuan Wang


Print Share Comment Cite Upload Translate Updates
APA

Yichuan Wang | Sciencx (2025-08-17T00:47:15+00:00) LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything ๐ŸŽ‰. Retrieved from https://www.scien.cx/2025/08/17/leann-the-worlds-most-lightweight-semantic-search-backend-for-rag-everything-%f0%9f%8e%89/

MLA
" » LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything ๐ŸŽ‰." Yichuan Wang | Sciencx - Sunday August 17, 2025, https://www.scien.cx/2025/08/17/leann-the-worlds-most-lightweight-semantic-search-backend-for-rag-everything-%f0%9f%8e%89/
HARVARD
Yichuan Wang | Sciencx Sunday August 17, 2025 » LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything ๐ŸŽ‰., viewed ,<https://www.scien.cx/2025/08/17/leann-the-worlds-most-lightweight-semantic-search-backend-for-rag-everything-%f0%9f%8e%89/>
VANCOUVER
Yichuan Wang | Sciencx - » LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything ๐ŸŽ‰. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/08/17/leann-the-worlds-most-lightweight-semantic-search-backend-for-rag-everything-%f0%9f%8e%89/
CHICAGO
" » LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything ๐ŸŽ‰." Yichuan Wang | Sciencx - Accessed . https://www.scien.cx/2025/08/17/leann-the-worlds-most-lightweight-semantic-search-backend-for-rag-everything-%f0%9f%8e%89/
IEEE
" » LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything ๐ŸŽ‰." Yichuan Wang | Sciencx [Online]. Available: https://www.scien.cx/2025/08/17/leann-the-worlds-most-lightweight-semantic-search-backend-for-rag-everything-%f0%9f%8e%89/. [Accessed: ]
rf:citation
» LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything ๐ŸŽ‰ | Yichuan Wang | Sciencx | https://www.scien.cx/2025/08/17/leann-the-worlds-most-lightweight-semantic-search-backend-for-rag-everything-%f0%9f%8e%89/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.