This content originally appeared on DEV Community and was authored by TC Ricks
Connecting to disparate source systems is a solved problem. Tools like FiveTran, AirByte, and merge.dev sink thousands of hours of engineering work into supporting connectors to every software system on the planet, and most teams who supporting 1,000s of connectors can use one of these tools under the hood to get there.
Once data is connected, you still have the problem of data representation. Each of your source systems (like HubSpot and Intercom, or your billing system and product database), will represent data in their own way. That’s what you get when you connect to APIs, you must speak the language of the API provider. It’s what you get when you connect to MCP servers, too. It’s the same structure, and presents data the same way.
If you connect an LLM to 5 different MCP servers and then ask for information regarding your customer, say Acme Incorporated , your answer will be degraded by the fact that each source system might represent customers differently. Maybe in HubSpot, companies are represented by their primary domain, getacme.com . Maybe in your billing system, they’re represented by the “Doing-business-as” name; Acme. And then maybe in your product, there are three subsidiaries that have engaged and are submitted support requests in Intercom under the names of those subsidiaries.
For those who are dead-set on AI enablement, there are two options available to solving this problem:
- Resolve Entities on Inference: Trust the LLM to expend enough tokens in “thought” to scan through all of the data available in each source system via its MCP connections to exhaustively search around the Acme Incorporated concept to figure out where data related to Acme lives in your systems, and then report on it
- Resolve Entities Ahead of Time: Use traditional data processing techniques to reveal gaps in entity resolution prior to inference, and then use inputs from business stakeholders and data experts to resolve entities in an integrated “source of truth” layer. LLMs refer to this layer on inference, instead of raw source systems.
If you just upload small amounts of data to Chat-GPT or Claude, it seems like the first option will work great. LLMs are very good at this semantic search and conceptual construction work. Your CSVs fit entirely into the context window of the LLM, and every token emitted is constructively contributing to grouping like elements together.
The moment you start working on larger datasets, the first option falls apart. For large scale search-and-compute, LLM-driven crawling is painfully slow compared to traditional data processing techniques. Semantic search works great at larger scales, for what it does, but is different altogether from reasoning occurring within one context window.
You could view this as an issue of context window limitations, but I think this is short-sighted. The context window is what it is. In the medium term, at least, it seems like it’ll always be a factor we need to design around, in the same way that RAM is still a factor to design around no matter how much memory we manage to get on our laptops. I think the context window is just the wrong tool for the job. Use token-based compute for what its good at, and use classical compute for what it’s good at: large scale data processing.
At AstroBee, we don’t foresee a world where MCP alone is the primary way to get strong AI capabilities across your enterprise. We see MCP working great for specific tool calling, and faltering when data needs to be reconciled across tools in order to make good decisions.
AstroBee integrates your data for you, so that entity resolution across your different systems happens before inference, and with the right tools: a mixture of SQL-driven integration, semantic search, and a touch of LLM-driven reasoning. Then, when AI agents work over your enterprise data, they work with pre-integrated data, delivering significantly better results and making significantly better decisions than before.
To see an alpha research preview of AstroBee, visit us at https://astrobee.ai. If you’d like to discuss our approach and give feedback or special requests, contact me directly at support@astrobee.ai.
This content originally appeared on DEV Community and was authored by TC Ricks

TC Ricks | Sciencx (2025-08-11T18:21:39+00:00) Where MCP falls short: data integration in the AI world. Retrieved from https://www.scien.cx/2025/08/11/where-mcp-falls-short-data-integration-in-the-ai-world/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.