This content originally appeared on DEV Community and was authored by Swapnil Surdi
I've been working with Claude and MCP servers extensively—building web automation, analyzing codebases, automating testing workflows. But I kept hitting the same frustrating wall:
Error: Response exceeds maximum allowed tokens (25,000)
The Problem
Modern applications generate massive responses:
- Web page DOMs: 1.3MB+ (154K tokens)
- GitHub PR diffs: 36K tokens (44% over limit)
- Figma exports: 351K tokens (1,300% over)
Every time I asked Claude to analyze a real web page. Not because the AI couldn't handle it—because MCP had a hard ceiling at 25,000 tokens.
The Real-World Impact
Looking at GitHub issues across popular MCP servers, I found hundreds of developers facing identical problems:
- Chrome MCP: "screenshot always gives 'exceeds maximum tokens' error"
- GitHub MCP: "get_pull_request_diff fails for any substantial PR"
- Playwright MCP: "DOM content returns 'Conversation Too Long' error"
The pattern was clear: MCP works beautifully for toy examples. Breaks on real-world complexity.
The Solution: mcp-cache
I built mcp-cache—a universal response manager that wraps any MCP server and solves the token limit automatically.
How it works:
Claude Desktop
↓
mcp-cache (transparent proxy)
├─ Intercepts large responses
├─ Caches full data locally
├─ Returns summary + query tools
└─ AI searches cached data on demand
↓
Target MCP Server (unchanged)
Before mcp-cache:
→ "Get the DOM and find payment forms"
❌ Error: Response exceeds maximum length
After mcp-cache:
→ "Get the DOM and find payment forms"
✅ Cached as resp_xyz (1.2MB)
→ "Show forms with 'payment' in action"
✅ Found 3 forms
Zero Configuration
The best part? It's completely transparent:
# Instead of:
npx @playwright/mcp@latest
# Just add mcp-cache:
npx @hapus/mcp-cache npx @playwright/mcp@latest
That's it. No server modifications. No client changes.
Works with ANY MCP server:
- ✅ Playwright, Chrome, GitHub, Filesystem
- ✅ Python, Node.js, Go, Rust servers
- ✅ Your custom MCP servers
Real Results
Since integrating mcp-cache:
E-Commerce Testing:
- ✅ Full accessibility trees cached (was: 250K token errors)
- ✅ AI queries specific elements from 1.2MB+ responses
- ✅ Complex multi-page flows automated successfully
Performance:
- ⚡ <10ms overhead for normal responses
- ⚡ <200ms for cached queries
- ⚡ 90%+ cache hit rate
What's Next
Current: Local file-based caching
Coming: Redis-backed distributed caching for teams
Vision: Vector embeddings + semantic search
Imagine:
- 🏢 Organization-wide shared cache
- 🔍 Semantic search: "Find pages similar to our checkout flow"
- 📊 Compliance audit trails
- 🧠 Knowledge graphs from cached responses
Key Technical Highlights
Client-Aware Intelligence:
- Auto-detects client (Claude Desktop, Cursor, Cline)
- Adjusts token limits accordingly
- No manual configuration needed
Powerful Query Interface:
// Text search
query_response('resp_id', 'submit button')
// JSONPath for structured data
query_response('resp_id', '$.div[?(@.class=="navbar")]')
// Regex patterns
query_response('resp_id', '/href=".*\\.pdf"/')
Try It Today
npm install -g @hapus/mcp-cache
# Or use directly:
npx @hapus/mcp-cache <your-server-command>
Links:
- ⭐ GitHub: https://github.com/swapnilsurdi/mcp-cache
- 📦 npm: @hapus/mcp-cache
Looking For
✅ Testers - Try it with your MCP workflows
✅ Feedback - What features would help you most?
✅ Contributors - Interested in building Redis/vector DB layers?
✅ Use cases - What are you trying to automate?
This started as a side project to scratch my own itch. Now I'm hoping it helps others facing the same problem.
This content originally appeared on DEV Community and was authored by Swapnil Surdi
Swapnil Surdi | Sciencx (2025-09-30T20:12:52+00:00) Solving response Token 25k limit Wall: Introducing mcp-cache. Retrieved from https://www.scien.cx/2025/09/30/solving-response-token-25k-limit-wall-introducing-mcp-cache/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.