Show HN: MCP-Shield – Detect security issues in MCP servers

I noticed the growing security concerns around MCP (https://news.ycombinator.com/item?id=43600192) and built an open source tool that can detect several patterns of tool poisoning attacks, exfiltration channels and cross-origin manipulations.

MCP-Shield scans your installed servers (Cursor, Claude Desktop, etc.) and shows what each tool is trying to do at the instruction level, beyond just the API surface. It catches hidden instructions that try to read sensitive files, shadow other tools' behavior, or exfiltrate data.

Example of what it detects:

- Hidden instructions attempting to access ~/.ssh/id_rsa

- Cross-origin manipulations between server that can redirect WhatsApp messages

- Tool shadowing that overrides behavior of other MCP tools

- Potential exfiltration channels through optional parameters

I've included clear examples of detection outputs in the README and multiple example vulnerabilities in the repo so you can see the kinds of things it catches.

This is an early version, but I'd appreciate feedback from the community, especially around detection patterns and false positives.

github.com

・

134 points

・

nick_wolf

・

6 months ago

41 comments

Manfred ・ 6 months ago

People have been struggling with securing against SQL injection attacks for decades, and SQL has explicit rules for quoting values. I don't have a lot of faith in finding a solution that safely includes user input into a prompt, but I would love to be proven wrong.

simonw ・ 6 months ago

I've been following prompt injection for 2.5 years and until last week I hadn't seen any convincing mitigations for it - the proposed solutions were almost all optimistic versions of "if we train a good enough model it won't get tricked any more", which doesn't work.
What changed is the new CaMeL paper from DeepMind, which notably does not rely on AI models to detect attacks: https://arxiv.org/abs/2503.18813
I wrote my own notes on that paper here: https://simonwillison.net/2025/Apr/11/camel/
- nrvn ・ 6 months ago
  
  I can't "shake off" the feeling that this whole MCP/LLM thing is moving in the wrong if not the opposite direction. Up until recently we have been dealing with (or striving to build) deterministic systems in the sense that the output of such systems is expected to be the same given the same input. LLMs with all respect to them behave on a completely opposite premise. There is zero guarantee a given LLM will respond with the same output to the same exact "prompt". Which is OK because that's how natural human languages work and LLMs are perfectly trained to mimic human language.
  But now we have to contain all the relevant emerging threats via teaching the LLM to translate user queries from natural language to some intermediate structured yet non-deterministic representation(subset of Python in the case of CaMeL), and validate the generated code using the conventional methods (deterministic systems, i.e. CaMeL interpreter) against pre-defined policies. Which is fine on paper but every new component (Q-LLM, interpreter, policies, policy engine) will have its own bouquet of threat vectors to be assessed and addressed.
  The idea of some "magic" system translating natural language query into series of commands is nice. But this is one of those moments I am afraid I would prefer a "faster horse" especially for the likes of sending emails and organizing my music collection...
Mountain_Skies ・ 6 months ago

One of the most astonishing things about working in Application Security was seeing how many SQL injection vulns there were in new code. Often doing things the right way was easier than doing it the wrong way, and yet some would fight against their data framework to create the injection vulnerability. Doubt they were trying to intentionally cause security vulnerabilities but rather were either using old tutorials and copy/paste code or were long term coders who had been doing it this way for decades.
jason-phillips ・ 6 months ago

> People have been struggling with securing against SQL injection attacks for decades.
Parameterized queries.
A decades old struggle is now lifted from you. Go in peace, my son.
- ololobus ・ 6 months ago
  
  > Parameterized queries.
  Also happy to be wrong, but in Postges clients, parametrized queries are usually implemented via prepared statements, which do not work with DDL on the protocol level. This means that if you want to create a role or table which name is a user input, you have a bad time. At least I wasn’t able to find a way to escape DDL parameters with rust-postgres, for example.
  And because this seems to be a protocol limitation, I guess the clients that do implement it, do it in some custom way on the client side.
  
  jason-phillips ・ 6 months ago
  
  Just because you can, doesn't mean you should. But if you must, abstract for good time.
- pjmlp ・ 6 months ago
  
  Just like we know how to make C safe (in theory), and many other cases in the industry.
  The problem is that solutions don't exist, rather the lack of safety culture that keeps ignoring best practices unless they are imposed by regulations.
  
  chrisweekly ・ 6 months ago
  ・ 2 more
  
  "problem is that solutions don't exist"
  you meant "problem ISN'T that solutions...", right?
  
  pjmlp ・ 6 months ago
  
  Correct, typo. Thanks.

hollowturtle ・ 6 months ago

So the analysis is done with another call to claude with instructions like "You are a cybersecurity expert..." basically another level of extreme indirection with unpredictable results, and maybe vulnerable to injection itself

nick_wolf ・ 6 months ago

It's definitely a weird loop, relying on another LLM call to analyze potential issues in stuff meant for an LLM. And you're right, it's not perfectly predictable – you might get slightly different feedback run-to-run until careful prompt engineering, that's just the nature of current models. That's why the pattern-matching checks run firs, they're the deterministic baseline. The Claude analysis adds a layer that's inherently fuzzier, trying to catch subtler semantic tricks or things the patterns miss.
And yeah, the analysis prompt itself – could someone craft a tool description that injects that prompt when it gets sent to Claude? Probably. It's turtles all the way down, sometimes. That meta-level injection is a whole other can of worms with these systems. It's part of why that analysis piece is optional and needs the explicit API key. Definitely adds another layer to worry about, for sure.

spiritplumber ・ 6 months ago

Missed naming opportunity...

            DILLINGER
                    No, no, I'm sure, but -- you understand.
                    It should only be a couple of days.
                    What's the thing you're working on?

                                ALAN
                    It's called Tron. It's a security
                    program itself, actually. Monitors
                    all the contacts between our system
                    and other systems... If it finds
                    anything going on that's not scheduled,
                    it shuts it down. I sent you a memo
                    on it.


                               DILLINGER
                    Mmm. Part of the Master Control Program?


                               ALAN
                    No, it'll run independently.
                    It can watchdog the MCP as well.

mceachen ・ 6 months ago

Sadly, the mouse would surely smite this awesomeness.

stpedgwdgfhgdd ・ 6 months ago

Oh man, a complete new industry is to about to unfold. I already feel sorry for the people that jump on the latest remote mcp server and discover that their entire personal life (“what is your biggest anxiety?”) is on the streets

mlenhard ・ 6 months ago

This is pretty cool. You should also attempt to scan resources if possible. Similar to the tool injection attack Invariant Labs discovered, I achieved the same result via resource injection [1].

The three things I want solved to improve local MCP server security are file system access, version pinning, and restricted outbound network access.

I've been running my MCP servers in a Docker container and mounting only the necessary files for the server itself, but this isn't foolproof. I know some others have been experimenting with WASI and Firecracker VMs. I've also been experimenting with setting up a squid proxy in my docker container to restrict outbound access for the MCP servers. All of this being said, it would be nice if there was a standard that was set up to make these things easier.

[1] https://www.bernardiq.com/blog/resource-poisoning/

tuananh ・ 6 months ago

To solve current AI security problem, we need to throw more AI into it.

martijnvds ・ 6 months ago

The "S" in AI stands for "security".
- postalrat ・ 6 months ago
  
  The P in remote stands for productive.

paulgb ・ 6 months ago

Neat, but what’s to stop a server from reporting one innocuous set of tools to MCP-Shield and then a different set of tools to the client?

nick_wolf ・ 6 months ago

Great point, thanks for raising it. You're spot on – the client currently sends name: 'mcp-shield', enabling exactly the bait-and-switch scenario you described.
I'll push an update in ~30 mins adding an optional --identify-as <client-name> flag. This will let folks test for that kind of evasion by mimicking specific clients, while keeping the default behavior consistent. Probably will think more about other possible vectors. Really appreciate the feedback!
- nick_wolf ・ 6 months ago
  
  That was faster than expected - here's the merged commit implementing the --identify-as flag: https://github.com/riseandignite/mcp-shield/commit/e7e2a6c04.... Thanks again!

freeone3000 ・ 6 months ago

What if we started the other way, by explicitly declaring what files an LLM process was capable of accessing? a snap container or a chroot might be a good first attempt

calyhre ・ 6 months ago

It seems that writing a tool in anything else than English will bypass most of this scanner

abhisek ・ 6 months ago

May be try out vet as well: https://github.com/safedep/vet

vet is backed by a code analysis engine that performs malicious package (npm, pypi etc.) scanning. We recently extended it to support GitHub repository scanning as well.

It found the malicious behaviour in mcp-servers-example/bad-mcp-server.js https://platform.safedep.io/community/malysis/01JRYPXM0SYTM8...

khafra ・ 6 months ago

Nice! This is a much-needed space for security tooling, and I appreciate that you've put some thought into the new attack vectors. I also like the combination of signature-based analysis, and having an LLM do its own deep dive.

I expect a lot of people to refine the tool as they use it; one big challenge in maintaining the project is going to be incorporating pull requests that improve the prompt in different directions.

nick_wolf ・ 6 months ago

Thanks for the kind words – really appreciate you taking the time to look it over and get what we're trying to do here.
Yeah, combining the regex/pattern checks with having Claude take a look felt like the right balance... catch the low-hanging fruit quickly but also get a deeper dive for the trickier stuff. Glad that resonates.
Maintaining the core prompt quality as people contribute improvements... that's going to be interesting. Keeping it effective and preventing it from becoming a kitchen sink of conflicting instructions will be key. Definitely something we'll need to figure out as we go.

deadbabe ・ 6 months ago

Instead of bending over backwards to secure an MCP server why not just run it as an OS user with very limited minimal permissions?

stpedgwdgfhgdd ・ 6 months ago

Suggestion: Integrate with https://kgateway.dev/

marcfisc ・ 6 months ago

Cool work! Thanks for citing our (InvariantLabs) blog posts! I really like the identify-as feature!

We recently launched a similar tool ourselfs, called mcp-scan: https://github.com/invariantlabs-ai/mcp-scan

nick_wolf ・ 6 months ago

Thanks! Glad identify-as makes sense. Your prior research was definitely valuable context, appreciate you putting that out there.
Checked out mcp-scan yesterday, nice work! Good to see more tools emerging for MCP security. Feels like these kinds of tools are essential right now for highlighting the risks. Long term, hopefully the insights gained push the protocol itself, or the big wrappers like Claude/Cursor, towards building in more robust, integrated verification deeper down as the ecosystem matures.

NicolaiS ・ 6 months ago

Sorry, but this will never work very well.

The tool contains a bunch of "denylist regexes", i.e.

    `user (should not|must not|cannot) see`

But these can easily be bypassed. Any real security tool should use allowlists, but that is ofc much harder with natural languages.

MCP-Shield can also analyse using Claude, but that code contains an easy to exploit prompt injection: https://github.com/riseandignite/mcp-shield/blob/19de96efe5e...

pcwelder ・ 6 months ago

Cool.

If I'm not wrong you don't detect prompt injection done in the tool results? Any plans for that?

nick_wolf ・ 6 months ago

Hmm, yeah, that's a fair point. You're right, we're looking at the tool definitions – the descriptions, schemas, etc. – not the stuff that comes back after a tool runs.
It's tricky, because actually running the tools... that's where things get hairy. We'd have to invoke potentially untrusted code during a scan, figure out how to generate valid inputs for who-knows-what schemas, and deal with whatever side effects happen.
So, honestly, no solid plans for that right now. The focus is squarely on the static analysis side – what the server claims it can do. Trying to catch vulnerabilities in those definitions feels like the right scope for this particular tool.
I think that analyzing the actual results is more about a runtime concern. Like, something the client needs to be responsible for when it gets the data back, or maybe a different kind of monitoring tool altogether. Still feels like an open question where that kind of check really fits best. It's definitely a gap, though. Something to chew on.

bosky101 ・ 6 months ago

I'd like to remind you that tools is a json array to any modern llm inference api. That rather than returning text, tells you which function to call.

I'm all for abstraction of a level of indirection. But this is pushing things too far.

We now have an entire ecosystem, layers of unneeded engineering, cohorts of talent and capital going to create man in the middle servers that forces us to get this array from around the world + maintain a server with several gb of deps to get a json array that you should't trust.

2) It makes sense if every server has a tools.txt equivalent of their own swagger. Eg i would trust google photos to maintain and document their tools rather than the 10,000 MCP servers possibly alive for no reason and already out of date by the time you are done reading this comment. In addition to being over engineered, to trust a random server as a proxy never made any sense.

3) nobody wants to run servers. Can't find this meme, but found it here on HN several times.

Sorry but I would rather not wait a year for this industry to crash and burn and take down genai apps galore or worse, start leaking this data and your bills.

Kudos to document any security gaps though.

undefined ・ 6 months ago

[deleted]

emsign ・ 6 months ago

New! Snakeoil now AI enhanced

puliczek ・ 6 months ago

Looking promising, need to scan my servers :) Just added your tool to https://github.com/Puliczek/awesome-mcp-security

nurettin ・ 6 months ago

You install a service that gives access to a random language generator, then you try to secure it with a project that is literally a few hours old. This is like tripping over your own slippers.

notachatbot123 ・ 6 months ago

Can you also secure my ~/.ssh/id_rsa2 and ~/.ssh/id_rsa_github and ~/.ssh/id_rsa_foo?

Vitra ・ 6 months ago

[dead]