Second, a question. Computer Use and JSON mode are great for creating a quasi-API for legacy software which offers no integration possibilities. Can MCP better help with legacy software interactions, and if so, in what ways?
The benefit would be that to the application connecting to your MCP server, it just looks like any other integration, and you can encapsulate a lot of the complexity of Computer Use under the hood.
If you explore this, we'd love to see what you come up with!
And the GitHub repo: https://github.com/modelcontextprotocol
Internally we have seen people experiment with a wide variety of different integrations from reading data files to managing their Github repositories through Claude using MCP. Alex's post https://x.com/alexalbert__/status/1861079762506252723 has some good examples. Alternatively please take a look at https://github.com/modelcontextprotocol/servers for a set of servers we found useful.
https://github.com/anaisbetts/mcp-youtube
Claude doesn't support YouTube summaries. I thought that was annoying! So I added it myself, instead of having to hope Anthropic would do it
In addition, I recommend looking at the specification documentation at https://spec.modelcontextprotocol.io. This should give you a good overview of how to implement a client. If you are looking to see an implemented open source client, Zed implements an MCP client: https://github.com/zed-industries/zed/tree/main/crates/conte...
If you have specific questions, please feel free to start a discussion on the respective https://github.com/modelcontextprotocol discussion, and we are happy to help you with integrating MCP.
Is it versioned? ie. does this release constitute an immutable protocol for the time being?
Thanks for your hard work! "LSP for LLMs" is a fucking awesome idea
It's not exactly immutable, but any backwards incompatible changes would require a version bump.
We don't have a roadmap in one particular place, but we'll be populating GitHub Issues, etc. with all the stuff we want to get to! We want to develop this in the open, with the community.
From what I’ve seen, OpenAI attempted to solve the problem by partnering with an existing company that API-fys everything. This feels looks a more viable approach, if compared to effectively starting from scratch.
One bit of constructive feedback: the TypeScript API isn't using the TypeScript type system to its fullest. For example, for tool providers, you could infer the type of a tool request handler's params from the json schema of the corresponding tool's input schema.
I guess that would be assuming that the model is doing constrained sampling correctly, such that it would never generate JSON that does not match the schema, which you might not want to bake into the reference server impl. It'd mean changes to the API too, since you'd need to connect the tool declaration and the request handler for that tool in order to connect their types.
Could I convince you to submit a PR? We'd love to include community contributions!
Is there a recommended resource for building MCP client? From what I've seen it just mentions Claude desktop & co are clients. SDK readme seems to cover it a bit but some examples could be great.
The best starting point are the respective client parts in the SDK: https://github.com/modelcontextprotocol/typescript-sdk/tree/... and https://github.com/modelcontextprotocol/python-sdk/tree/main..., as well as the official specification documentation at https://spec.modelcontextprotocol.io.
If you run into issues, feel free to open a discussion in the respective SDK repository and we are happy to help.
(I've been fairly successful in taking the spec documentation in markdown, an SDK and giving both to Claude and asking questions, but of course that requires a Claude account, which I don't want to assume)
I'm looking at integrating MCP with desktop app. The spec (https://spec.modelcontextprotocol.io/specification/basic/tra...) mentions "Clients SHOULD support stdio whenever possible.". The server examples seem to be mostly stdio as well. In the context of a sandboxed desktop app, it's often not practical to launch a server as subprocess because:
- sandbox restrictions of executing binaries
- needing to bundle binary leads to a larger installation size
Would it be reasonable to relax this restriction and provide both SSE/stdio for the default server examples?
I can totally see your concern about sandboxed app, particularly for flatpack or similar distribution methods. I see you already opened a discussion https://github.com/modelcontextprotocol/specification/discus..., so let's follow up there. I really appreciate the input.
(and then having a smol node/bun/go/whatever app that can sit in front of any server that handles stdio - or a listening socket for a server that can handle multiple clients - and translates the protocol over to SSE or websockets or [pick thing you want here] lets you support all such servers with a single binary to install)
Not that there aren't advantages to having such things baked into the server proper, but making 'writing a new connector that works at all' as simple as possible while still having access to multiple approaches to talk to it seems like something worthy of consideration.
[possibly I should've put this into the discussion, but I have to head out in a minute or two; anybody who's reading this and engaging over there should feel free to copy+paste anything I've said they think is relevant]
In case of Claude Desktop App, I assume the decision which MCP-server's tool to use based on the end-user's query is done by Claude LLM using something like ReAct loop. Are the prompts and LLM-generated tokens involved inside "Protocol Handshake"-phase available for review?
1. The sampling documentation is confusing. "Sampling" means something very specific in statistics, and I'm struggling to see any connection between the term's typical usage and the usage here. Perhaps "prompt delegation" would be a more obvious term to use.
Another thing that's confusing about the sampling concept is that it's initiated by a server instead of a client, a reversal of how client/server interactions normally work. Without concrete examples, it's not obvious why or how a server might trigger such an exchange.
2. Some information on how resources are used would be helpful. How do resources get pulled into the context for queries? How are clients supposed to determine which resources are relevant? If the intention is that clients are to use resource descriptions to determine which to integrate into prompts, then that purpose should be more explicit.
Perhaps a bigger problem is that I don't see how clients are to take a resource's content into account when analyzing its relevance. Is this framework intentionally moving away from the practice of comparing content and query embeddings? Or is this expected to be done by indices maintained on the client?
After reading the Python server tutorial, it looks like there is some tool calling going on, in the old terminology. That makes more sense. But none of the examples seem to indicate what the protocol is, whether it's a RAG sort of thing, do I need to prompt, etc.
It would be nice to provide a bit more concrete info about capabilities and what the purposes is before getting into call diagrams. What do the arrows represent? That's more important to know than the order that a host talks to a server talks to a remote resource.
I think this is something that I really want and want to build a server for, but it's unclear to me how much more time I will have to invest before getting the basic information about it!
The gist of it is: you have an llm application such as Claude desktop. You want to have it interact (read or write) with some system you have. MCP solves this.
For example you can give the application the database schema as a “resource”, effectively saying; here is a bunch of text, do whatever you want with it during my chat with the llm. Or you can give the application a tool such as query my database. Now the model itself can decide when it wants to query (usually because you said: hey tell me what’s in the accounts table or something similar).
It’s “bring the things you care about” to any llm application with an mcp client
It's not introducing new capabilities, just solving the NxM problem, hopefully leading to more tools being written.
(At least that's how I understand this. Am I far off?)
On tools specifically, we went back and forth about whether the other primitives of MCP ultimately just reduce to tool use, but ultimately concluded that separate concepts of "prompts" and "resources" are extremely useful to express different _intentions_ for server functionality. They all have a part to play!
It would probably be helpful for many of your readers if you had a focused document that addressed specifically that motivating question, together with illustrated examples. What does MCP provide, and what does it intend to solve, that a tool calling interface or RPC protocol can't?
The N×M problem may simply be moved rather than solved:
- Instead of N×M direct integrations
- We now have N MCP client implementations
- M MCP server implementations
This feels similar to SOAP but might be more of a lower level protocol similar to HTTP itself. Hard to tell with the implementation examples being pretty subjective programs in python. - A host never talks to a server directly, only via a Client (which is presumably a human). The host has or is the LLM (app).
- A server only supplies context data (readonly), in the form of tool call, direct resource URL, or pre populated prompt. It can call back to a client directly, for example to request something from the hosts LLM.
- A client sits in the middle, representing the human in the loop. It manages the requests bidirectionally
It seems mostly modeled around the security boundaries, rather than just AI capabilities domains. The client is always in the loop, the host and server do not directly communicate.https://github.com/modelcontextprotocol/servers/blob/main/sr...
How can an add on that works with arbitrary "servers" tell the difference between these two tools? Without being able to tell the difference you can't really build a generic way to ask for confirmation in the application that is using the server...
{
name: "create_directory",
description:
"Create a new directory or ensure a directory exists. Can create multiple " +
"nested directories in one operation. If the directory already exists, " +
"this operation will succeed silently. Perfect for setting up directory " +
"structures for projects or ensuring required paths exist. Only works within allowed directories.",
inputSchema: zodToJsonSchema(CreateDirectoryArgsSchema) as ToolInput,
},
{
name: "list_directory",
description:
"Get a detailed listing of all files and directories in a specified path. " +
"Results clearly distinguish between files and directories with [FILE] and [DIR] " +
"prefixes. This tool is essential for understanding directory structure and " +
"finding specific files within a directory. Only works within allowed directories.",
inputSchema: zodToJsonSchema(ListDirectoryArgsSchema) as ToolInput,
},
import anthropic
client = anthropic.Anthropic()
response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1024, [mcp_server]=... ## etc.? ... )
(forgive me if you know this and are asking a different question, but:)
I don't know how familiar you are with LLMs, but "context" used in that context generally has the pretty clear meaning of "the blob of text you give in between (the text of) the system prompt and (the text of) the user prompt"[1], which acts as context for the user's request (hence the name). Very often this is the conversation history in chatbot-style LLMs, but it can include stuff like the content of text files you're working with, or search/function results.
[1] If you want to be pedantic, technically each instance of "text" should say "tokens" there, and the maximum "context" length includes the length of both prompts.
For instance, when there is something wrong for MCP host, it query all data from database and transfer it to host, all data will be leaked.
It's hard to totally prevent this kind of problem when interacting with local data, But, Is there some actions to prevent this kind of situations for MCP?
(Closest I can find is zed/cody but those aren't really general purpose)
this is really cool stuff. I just started to write a server and I have a few questions. Not sure if HN is the right place, so where would you suggest to ask them?
Anyway, if there is no place yet, my questions are:
- In the example https://modelcontextprotocol.io/docs/first-server/python , what is the difference between read_resources and call_tool. In both cases the call the fetch_weather function. Would be nice to have that explained better. I implemented in my own server only the call_tool function and Claude seems to be able to call it.
- Where is inputSchema of Tool specified in the docs? It would be nice if inputSchema would be explained a bit better. For instance how can I make a list of strings field that has a default value.
- How can i view the output of logger? It would be nice to see somewhere an example on how to check the logs. I log some stuff with logger.info and logger.error but I have no clue where I can actually look at it. My work around now is to log to a local file and tail if..
General feedback
- PLEASE add either automatic reload of server (hard) or a reload button in the app (probably easier). Its really disrupting to the flow when you have ot restart the app on any change.
- Claude Haiku never calls the tools. It just tells me it can't do it. Sonnet can do it but is really slow.
- The docs are really really version 0.1 obviously :-) Please put some focus on it...
Overall, awesome work!
Thanks
I'm looking at a PostgreSQL integration here: https://github.com/modelcontextprotocol/servers/tree/main/sr...
I have a case in mind where I would like to connect to multiple databases. Does the integration endpoint specification in claude_desktop_config.json allow us to pass some description so as to differentiate different databases? How?
MCP: I've gotten 2,415 times smarter since then.
For example:
When and how should notifications be sent and how should they be handled?
---
It's a lot more like LSP.
If you're writing an LSP for a language, you're implementing the necessaries according to the protocol (when to show errors, inlay hints, code fixes, etc.) - it's not deciding on its own.
My app has a simple drop down box where users can pick whatever LLM they want to to use (OpenAI, Perplexity, Gemini, Anthropic, Grok, etc)
However if they've done something worthy of putting into LangChain, then I do hope LangChain steals the idea and incorporates it so that all LLM apps can use it.
Lots of companies open source some of their internal code, then say it's "officially a protocol now" that anyone can use, and then no one else ever uses it.
If they have new "tools" that's great however, but only as long as they can be used in LangChain independent of any "new protocol".
We’re building an in terminal coding agent and our next step was to connect to external services like sentry and github where we would also be making a bespoke integration or using a closed source provider. We appreciate that they have mcp integrations already for those services. Thanks Anthropic!
https://sourcegraph.com/blog/cody-supports-anthropic-model-c...
I guess I can do this for my local file system now?
I also wonder if I build an LLM powered app, and currently simply to RAG and then inject the retrieved data into my prompts, should this replace it? Can I integrate this in a useful way even?
The use case of on your machine with your specific data, seems very narrow to me right now, considering how many different context sources and use cases there are.
From the link:
> To help developers start exploring, we’re sharing pre-built MCP servers for popular enterprise systems like Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer.
I guess the reason for this local focus is, that it's otherwise hard to provide access to local files. Which is a decently large use-case.
Still it feels a bit complicated to me.
However, it's not quite a complete story yet. Remote connections introduce a lot more questions and complexity—related to deployment, auth, security, etc. We'll be working through these in the coming weeks, and would love any and all input!
It would be a lot more interesting to write a server for this if this allowed any model to interact with my data. Everyone would benefit from having more integration and you (anthropic) still would have the advantage of basically controlling the protocol.
The Model Context Protocol initial release aims to solve the N-to-M relation of LLM applications (mcp clients) and context providers (mcp servers). The application is free to choose any model they want. We carefully designed the protocol such that it is model independent.
Here's one for performing GitHub actions: https://cookbook.openai.com/examples/chatgpt/gpt_actions_lib...
> It would be a lot more interesting if I could connect this to my github in the web app and claude automatically has access to my code repositories.
In which case the "API" would be governed by a contract between Anthropic and Github, to which you're a third party (read: sharecropper).
Interoperability on the web has already been mostly killed by the practice of companies integrating with other companies via back-channel deals. You are either a commercial partner, or you're out of the playground and no toys for you. Them starting locally means they're at least reversing this trend a bit by setting a different default: LLMs are fine to integrate with arbitrary code the user runs on their machine. No need to sign an extra contact with anyone!
I can see the value of something like DSPy where there is some higher level abstractions in wiring together a system of llms.
But this seems like an abstraction that doesn't really offer much besides "function calling but you use our python code".
I see the value of language server protocol but I don't see the mapping to this piece of code.
That's actually negative value if you are integrating into an existing software system or just you know... exposing functions that you've defined vs remapping functions you've defined into this intermediate abstraction.
When you think about it, function calling needs its own local state (embedded db) to scale efficiently on larger contexts.
I'd like to see all this become open source / standardized.
If integrations are required to unlock value, then the platform with the most prebuilt integrations wins.
The bulk of mass adopters don't have the in-house expertise or interest in building their own. They want turnkey.
No company can build integrations, at scale, more quickly itself than an entire community.
If Anthropic creates an integration standard and gets adoption, then it either at best has a competitive advantage (first mover and ownership of the standard) or at worst prevents OpenAI et al. from doing the same to it.
(Also, the integration piece is the necessary but least interesting component of the entire system. Way better to commodify it via standard and remove it as a blocker to adoption)
If this protocol gets adoption we'll probably add compatibility.
Which would bring MCP to local models like LLama 3 as well as other cloud providers competitors like OpenAI, etc
We've been keeping quiet, but I'd be happy to chat more if you want to email me (also in bio)
So does the entire db get fed into the context? Or is there another layer in between. What if the database is huge, and you want to ask the AI for the most expensive or best selling items? With RAG that was only vaguely possible and didnt work very well.
Sorry I am a bit new but trying to learn more.
This is about tool usage - the thing where an LLM can be told "if you want to run a SQL query, say <sql>select * from repos</sql> - the code harness well then spot that tag, run the query for you and return the results to you in a chat message so you can use them to help answer a question or continue generating text".
Would love to chat with you if you are open about possible collab.
I am frank [at] glama.ai
“”” You may not access or use, or help another person to access or use, our Services in the following ways: … To develop any products or services that compete with our Services, including to develop or train any artificial intelligence or machine learning algorithms or models. “””
Now let's see a similar abstraction on the client side - a unified way of connecting your assistant to Slack, Discord, Telegram, etc.
One thing that some people may not realize is that right now there's a MASSIVE amount of effort duplication around developing something that could maybe end up looking like MCP. Everyone building an LLM agent (or pseudo-agent, or whatever) right now is writing a bunch of boilerplate for mapping between message formats, tool specification formats, prompt templating, etc.
Now, having said that, I do feel a little bit like there's a few mistakes being made by Anthropic here. The big one to me is that it seems like they've set the scope too big. For example, why are they shipping standalone clients and servers rather than client/server libraries for all the existing and wildly popular ways to fetch and serve HTTP? When I've seen similar mistakes made (e.g. by LangChain), I assume they're targeting brand new developers who don't realize that they just want to make some HTTP calls.
Another thing that I think adds to the confusion is that, while the boilerplate-ish stuff I mentioned above is annoying, what's REALLY annoying and actually hard is generating a series of contexts using variations of similar prompts in response to errors/anomalies/features detected in generated text. IMO this is how I define "prompt engineering" and it's the actual hard problem we have to solve. By naming the protocol the Model Context Protocol, I assumed they were solving prompt engineering problems (maybe by standardizing common prompting techniques like ReAct, CoT, etc).
Regarding the standalone servers, I suspect they’re aiming for usability over elegance in the short term. It’s a classic trade-off: get the protocol in people’s hands to build momentum, then refine the developer experience later.
I also don't see any of that implementation as "boilerplate". Yes there's a lot of similar code being written right now but that's healthy co-evolution. If you have a look at the codebases for Langchain and other LLM toolkits you will realize that it's a smarter bet to just roll your own for now.
You've definitely identified the main hurdle facing LLM integration right now and it most definitely isn't a lack of standards. The issue is that the quality of raw LLM responses falls apart in pretty embarrassing ways. It's understood by now that better prompts cannot solve these problems. You need other error-checking systems as part of your pipeline.
The AI companies are interested in solving these problems but they're unable to. Probably because their business model works best if their system is just marginally better than their competitor.
In an ideal world gemini (or any other 1M token context model) would have an internal 'save snapshot' option so one could resume a blank conversation after 'priming' the internal state (activations) with the whole code base.
Anthropic is playing the "open standard" card because they want to win over some developers. (and that's good from that pov)
Putting this out there puts OpenAI on the clock to release their own alternative or adopt this, because otherwise they run the risk of engineering leaders telling their C-suite that Anthropic is making headway towards better frontier model integration and OpenAI is the costlier integration to maintain.
Also, I wonder if you could build some kind of open source mapping layer from their protocol to OpenAI's. That way OpenAI could support the protocol even if they don't want to.
where do you get that number?
Here, Anthropic is first. If everyone starts using MCP today, any alternative OpenAI comes out with in a few months time probably won’t be able to dislodge it.
His high level summary is that this boils down to a "list tools" RPC call, and a "call tool" RPC call.
It is, indeed, very smart and very simple.
It appears that clients retrieve prompts from a server to hydrate them with context only, to then execute/complete somewhere else (like Claude Desktop, using Anthropic models). The server doesn’t know how effective the prompt will be in the model that the client has access to. It doesn’t even know if the client is a chat app, or Zed code completion.
In the sampling interface - where the flow is inverted, and the server presents a completion request to the client - it can suggest that the client uses some model type /parameters. This makes sense given only the server knows how to do this effectively.
Given the server doesn’t understand the capabilities of the client, why the asymmetry in these related interfaces?
There’s only one server example that uses prompts (fetch), and the one prompt it provides returns the same output as the tool call, except wrapped in a PromptMessage. EDIT: lols like there are some capabilities classes in the mcp, maybe these will evolve.
https://modelcontextprotocol.io/docs/concepts/prompts
https://spec.modelcontextprotocol.io/specification/server/pr...
… but TLDR, if you think of them a bit like slash commands, I think that's a pretty good intuition for what they are and how you might use them.
This level of generality has been attempted before (e.g. RDF and the semantic web, REST, SOAP) and I'm not sure what's fundamentally different about how this problem is framed that makes it more tractable.
It’s basically a standardized way to wrap you Openapi client with a standard tool format then plug it in to your locally running AI tool of choice.
This is huge, as long as there's a single standard and other LLM providers don't try to release their own protocol. Which, historically speaking, is definitely going to happen.
Yes, very much this; I'm mildly worried because the competition in this space is huge and there is no shortage of money and crazy people who could go against this.
But, it’s a bandit group.
I'd say that thing you're feeling comes from witnessing an LLM vendor, for the first time in history, actually being serious about function calling and actually wanting people to use it.
It smells like the thinking is that you (the developer) can grab from a collection of very broad data connectors, and the agent will be able to figure out what to do with them without much custom logic in between. Maybe I’m missing something
This has always been the idea behind tools/function calling in LLMs.
What MCP tries to solve is the NxM problem - every LLM vendor has their own slightly different protocols for specifying and calling tools, and every tool supplier has to handle at least one of them, likely with custom code. MCP aims to eliminate custom logic at the protocol level.
With the LLM being able to tap up to date context (like LSP), you won't need that back-and-forth dance. This will massively improve code generations.
I think I have a complete picture.
Here is a quickstart for anyone who is just getting into it.
https://glama.ai/blog/2024-11-25-model-context-protocol-quic...
I managed to get it to load: https://archive.ph/7DALF
In the "Protocol Handshake" section of what's happening under the hood - it would be great to have more info on what's actually happening.
For example, more details on what's actually happening to translate the natural language to a DB query. How much config do I need to do for this to work? What if the queries it makes are inefficient/wrong and my database gets hammered - can I customise them? How do I ensure sensitive data isn't returned in a query?
Let’s say I’ve got a “widgets” table and I want the system to tell me how many “deprecated widgets” there are, but there is no convenient “deprecated” flag on the table—it’s defined as a Rails scope on the model or something (business logic).
The DB schema might make it possible to run a simple query to count widgets or whatever, but I just don’t have a good mental model of how these systems might work with “business logic” type things.
You need to know:
1. The claude_desktop_config.json needs a top-level mcpServer key, as described here: https://github.com/modelcontextprotocol/servers/pull/46/comm...
2. If you did this correctly the, after you run Claude Desktop, you should see a small 'hammer' icon (with a number next to it) next to the labs icon, in the bottom right of the 'How can Claude help you today?' box.
https://modelcontextprotocol.io/docs/concepts/sampling
It's crazy. Sadly not yet implemented in Claude Desktop client.
Has OpenCtx ever gained much traction?
But today we already have lots of enterprise customers building their own OpenCtx providers and/or using the `openctx.providers` global settings in Sourcegraph to configure them in the current state. OpenCtx has been quite valuable already here to our customers.
https://github.com/rusiaaman/wcgw/blob/main/src/wcgw/client/...
Already getting value out of it.
Devil's advocating for conversation's sake: at the end of the day, the user and client app want very little persistent data coming from the server - if nothing else than the client is expecting to store chats as text, with external links or Potemkin placeholders for assets like files.
tl;dr—you can build & ship a CLI without needing an API. Just drop Terminalwire into your server, have your users install the thin client, and you’ve got a CLI.
I’m currently focused on getting the distribution and development experience dialed in, which is why I’m working mostly with Rails deployments at the moment, but I’m open to working with large customers who need to ship a CLI yesterday in any language or runtime.
If you need something like this check it out at https://terminalwire.com or ping me [email protected].
Is there any good arch diagram for one of the examples of how this protocol may be used?
I couldn’t find one easily…
I appreciate the design which left the implementation of servers to the community which doesn't lock you into any particular implementation, as the protocol seems to be aiming to primarily solve the RPC layer.
One major value add of MCP I think is a capability extension to a vast amount of AI apps.
Open API (aka swagger) based function calling is standard already for sync calls, and it solves the NxM problem. I'm wondering if the proposed value is that MCP is async.
OpenAPI support for async: https://swagger.io/docs/specification/v3_0/callbacks/
The Core architecture [1] documentation is given in terms of TypeScript or Python abstractions, adding a lot of unnecessary syntactic noise for someone who doesn't use these languages. Very thin on actual conceptual explanation and full of irrelevant implementation details.
The 'Your first server'[2] tutorial is given in terms of big chunks of python code, with no explanation whatsoever, eg:
Add these tool-related handlers:
...100 lines of undocumented code...
The code doesn't even compile.
I don't think this is ready for prime time yet so I'll move along for now.[1] https://modelcontextprotocol.io/docs/concepts/architecture [2] https://modelcontextprotocol.io/docs/first-server/python
Required for
- corporate data sources, e g. Salesforce
- APIs with key limits and non-trivial costs
- personal data sources e.g. email
It appears that all auth is packed into the MCP config, e.g. slack token: https://github.com/modelcontextprotocol/servers/tree/main/sr...
Curious:
1. Authentication and authorization is left as a TODO: what is the thinking, as that is necessary for most use?
2. Ultimately, what does MCP already add or will add that makes it more relevant than OpenApI / a pattern on top?
the docs aren't super clear yet wrt. how one might actually implement the connection. do we need to implement another set of tools to provide to the API and then have that tool call the MCP server? maybe i'm missing something here?