Solving response Token 25k limit Wall: Introducing mcp-cache

This content originally appeared on DEV Community and was authored by Swapnil Surdi

I've been working with Claude and MCP servers extensively—building web automation, analyzing codebases, automating testing workflows. But I kept hitting the same frustrating wall:

Error: Response exceeds maximum allowed tokens (25,000)

The Problem

Modern applications generate massive responses:

Web page DOMs: 1.3MB+ (154K tokens)
GitHub PR diffs: 36K tokens (44% over limit)
Figma exports: 351K tokens (1,300% over)

Every time I asked Claude to analyze a real web page. Not because the AI couldn't handle it—because MCP had a hard ceiling at 25,000 tokens.

The Real-World Impact

Looking at GitHub issues across popular MCP servers, I found hundreds of developers facing identical problems:

Chrome MCP: "screenshot always gives 'exceeds maximum tokens' error"
GitHub MCP: "get_pull_request_diff fails for any substantial PR"
Playwright MCP: "DOM content returns 'Conversation Too Long' error"

The pattern was clear: MCP works beautifully for toy examples. Breaks on real-world complexity.

The Solution: mcp-cache

I built mcp-cache—a universal response manager that wraps any MCP server and solves the token limit automatically.

How it works:

Claude Desktop
    ↓
mcp-cache (transparent proxy)
├─ Intercepts large responses
├─ Caches full data locally
├─ Returns summary + query tools
└─ AI searches cached data on demand
    ↓
Target MCP Server (unchanged)

Before mcp-cache:

→ "Get the DOM and find payment forms"
❌ Error: Response exceeds maximum length

After mcp-cache:

→ "Get the DOM and find payment forms"
✅ Cached as resp_xyz (1.2MB)
→ "Show forms with 'payment' in action"
✅ Found 3 forms

Zero Configuration

The best part? It's completely transparent:

# Instead of:
npx @playwright/mcp@latest

# Just add mcp-cache:
npx @hapus/mcp-cache npx @playwright/mcp@latest

That's it. No server modifications. No client changes.

Works with ANY MCP server:

✅ Playwright, Chrome, GitHub, Filesystem
✅ Python, Node.js, Go, Rust servers
✅ Your custom MCP servers

Real Results

Since integrating mcp-cache:

E-Commerce Testing:

✅ Full accessibility trees cached (was: 250K token errors)
✅ AI queries specific elements from 1.2MB+ responses
✅ Complex multi-page flows automated successfully

Performance:

⚡ <10ms overhead for normal responses
⚡ <200ms for cached queries
⚡ 90%+ cache hit rate

What's Next

Current: Local file-based caching
Coming: Redis-backed distributed caching for teams
Vision: Vector embeddings + semantic search

Imagine:

🏢 Organization-wide shared cache
🔍 Semantic search: "Find pages similar to our checkout flow"
📊 Compliance audit trails
🧠 Knowledge graphs from cached responses

Key Technical Highlights

Client-Aware Intelligence:

Auto-detects client (Claude Desktop, Cursor, Cline)
Adjusts token limits accordingly
No manual configuration needed

Powerful Query Interface:

// Text search
query_response('resp_id', 'submit button')

// JSONPath for structured data
query_response('resp_id', '$.div[?(@.class=="navbar")]')

// Regex patterns
query_response('resp_id', '/href=".*\\.pdf"/')

Try It Today

npm install -g @hapus/mcp-cache

# Or use directly:
npx @hapus/mcp-cache <your-server-command>

Links:

⭐ GitHub: https://github.com/swapnilsurdi/mcp-cache
📦 npm: @hapus/mcp-cache

Looking For

✅ Testers - Try it with your MCP workflows
✅ Feedback - What features would help you most?
✅ Contributors - Interested in building Redis/vector DB layers?
✅ Use cases - What are you trying to automate?

This started as a side project to scratch my own itch. Now I'm hoping it helps others facing the same problem.

This content originally appeared on DEV Community and was authored by Swapnil Surdi

Print Share Comment Cite Upload Translate Updates

APA

Swapnil Surdi | Sciencx (2025-09-30T20:12:52+00:00) Solving response Token 25k limit Wall: Introducing mcp-cache. Retrieved from https://www.scien.cx/2025/09/30/solving-response-token-25k-limit-wall-introducing-mcp-cache/

MLA

" » Solving response Token 25k limit Wall: Introducing mcp-cache." Swapnil Surdi | Sciencx - Tuesday September 30, 2025, https://www.scien.cx/2025/09/30/solving-response-token-25k-limit-wall-introducing-mcp-cache/

HARVARD

Swapnil Surdi | Sciencx Tuesday September 30, 2025 » Solving response Token 25k limit Wall: Introducing mcp-cache., viewed ,<https://www.scien.cx/2025/09/30/solving-response-token-25k-limit-wall-introducing-mcp-cache/>

VANCOUVER

Swapnil Surdi | Sciencx - » Solving response Token 25k limit Wall: Introducing mcp-cache. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/09/30/solving-response-token-25k-limit-wall-introducing-mcp-cache/

CHICAGO

" » Solving response Token 25k limit Wall: Introducing mcp-cache." Swapnil Surdi | Sciencx - Accessed . https://www.scien.cx/2025/09/30/solving-response-token-25k-limit-wall-introducing-mcp-cache/

IEEE

" » Solving response Token 25k limit Wall: Introducing mcp-cache." Swapnil Surdi | Sciencx [Online]. Available: https://www.scien.cx/2025/09/30/solving-response-token-25k-limit-wall-introducing-mcp-cache/. [Accessed: ]

rf:citation

» Solving response Token 25k limit Wall: Introducing mcp-cache | Swapnil Surdi | Sciencx | https://www.scien.cx/2025/09/30/solving-response-token-25k-limit-wall-introducing-mcp-cache/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.