Dependabot has some value IME, but all naïve tools that only check software and version numbers against a vulnerability database tend to be noisy if they don’t then do something else to determine whether your code is actually exposed to a matching vulnerability.
One security checking tool that has genuinely impressed me recently is CodeQL. If you’re using GitHub, you can run this as part of GitHub Advanced Security.
Unlike those naïve tools, CodeQL seems to perform a real tracing analysis through the code, so its report doesn’t just say you have user-provided data being used dangerously, it shows you a complete, step-by-step path through the code that connects the input to the dangerous usage. This provides useful, actionable information to assess and fix real vulnerabilities, and it is inherently resistant to false positives.
Presumably there is still a possibility of false negatives with this approach, particularly with more dynamic languages like Python where you could surely write code that is obfuscated enough to avoid detection by the tracing analysis. However, most of us don’t intentionally do that, and it’s still useful to find the rest of the issues even if the results aren’t perfect and 100% complete.
Agreed, codeql has been amazing. But it's important to not replace type checkers and linters with it. it complements them, it doesn't replace them.
Certain languages don't have enough "rules" (forgot the term) either. This is the only open/free SAST I know of, if there are others I'd be interested as well.
My hope+dream is for Linux distros to require checks like this to pass for anything they admit to their repo.
CodeQL was a good help on some projects, but more recently, our team has been increasingly frustrated by the thing to the point of turning it off.
The latest drop in the bucket was a comment adding a useless intermediate variable, with the justification being “if you do this, you’ll avoid CodeQL flagging you for the problem”.
Sounds like slight overfitting to the data!
So, CodeQL found a vulnerability in your code, you avoided the warning by adding an intermediate variable (but ignored the vulnerability), and you are frustrated with CodeQL, not the person who added this variable?
If I read it correctly, the comment suggesting the intermediate variable was from CodeQL itself.
> Dependabot has some value IME, but all naïve tools that only check software and version numbers against a vulnerability database tend to be noisy if they don’t then do something else to determine whether your code is actually exposed to a matching vulnerability.
For non-SaaS products it doesn’t matter. Your customer’s security teams have their own scanners. If you ship them vulnerable binaries, they’ll complain even if the vulnerable code is never used or isn’t exploitable in your product.
This is true and customers do a lot of unfortunate things in the name of security theatre. Sometimes you have to play the cards you’ve been dealt and roll with it. However, educating them about why they’re wasting significant amounts of money paying you to deal with non-problems does sometimes work as a mutually beneficial alternative.
We had a Python "vulnerability" that only existed on 32-bit platforms, which we don't use in our environment, but do you think we could get the cyber team to understand that?
Nope.
CodeQL has been disappointing with Kotlin, it lagged behind the official releases by about two months, blocking our update to Kotlin 2.3.0
> it is inherently resistant to false positives
By Rice's Theorem, I somehow doubt that.
No engine can be 100% perfect of course, the original comment is broadly accurate though. CodeQL builds a full semantic database including types and dataflow from source code, then runs queries against that. QL is fundamentally a logic programming language that is only concerned with the satisfiably of the given constraint.
If dataflow is not provably connected from source to sink, an alert is impossible. If a sanitization step interrupts the flow of potentially tainted data, the alert is similarly discarded.
The end-to-end precision of the detection depends on the queries executed, the models of the libraries used in the code (to e.g., recognize the correct sanitizers), and other parameters. All of this is customizable by users.
All that can be overwhelming though, so we aim to provide sane defaults. On GitHub, you can choose between a "Default" and "Extended" suite. Those are tuned for different levels of potential FN/FP based on the precision of the query and severity of the alert.
Severities are calculated based on the weaknesses the query covers, and the real CVE these have caused in prior disclosed vulnerabilities.
QL-language-focused resources for CodeQL: https://codeql.github.com/
Sorry, I don’t understand the point you’re making. If CodeQL reports that you have a XSS vulnerability in your code, and its report includes the complete and specific code path that creates that vulnerability, how is Rice’s theorem applicable here? We’re not talking about decidability of some semantic property in the general case; we’re talking about a specific claim about specific code that is demonstrably true.
Rice’s theorem applies to any non-trivial semantic property.
Looking at the docs, I’m not really sure CodeQL is semantic in the same sense as Rices theorem. It looks syntactic more than semantic.
Eg breaking Rices theorem would require it to detect that an application isn’t vulnerable if it contains the vulnerability but only in paths that are unreachable. Like
I’m not at a PC right now, but I’d be curious if CodeQL thinks that’s vulnerable or not.if request.params.limit > 1000: throw error # 1000 lines of code if request.params.limit > 1000: call_vulnerable_code()It’s probably demonstrably true that there is syntactically a path to the vulnerability, I’m a little dubious that it’s demonstrably true the code path is actually reachable without executing the code.
> We’re not talking about decidability of some semantic property in the general case; we’re talking about a specific claim about specific code
Is CodeQL special cased for your code? I very much doubt that. Then it must work in the general case. At that point decidability is impossible and at best either false positives or false negatives can be guaranteed to be absent, but not both (possibly neither of them!)
I don't doubt CodeQL claims can be demonstrably true, that's still coherent with Rice's theorem. However it does mean you'll have false negatives, that is cases where CodeQL reports no provable claim while your code is vulnerable to some issues.
OK, but all I said before was that CodeQL’s approach where it supplies a specific example to support a specific problem report is inherently resistant to false positives.
Clearly it is still possible to generate a false positive if, for example, CodeQL’s algorithm thinks it has found a path through the code where unsanitised user data can be used dangerously, but in fact there was a sanitisation step along the way that it didn’t recognise. This is the kind of situation where the theoretical result about not being able to determine whether a semantic property holds in all cases is felt in practical terms.
It still seems much less likely that an algorithm that needs to produce a specific demonstration of the problem it claims to have found will result in a false positive than the kind of naïve algorithms we were discussing before that are based on a generic look-up table of software+version=vulnerability without any attempt to determine whether there is actually a path to exploit that vulnerability in the real code.
Rice's Thm just says that you can't have a sound and complete static analysis. You can happily have one or the other.
CodeQL seems to raise too many false-positives in my experience. And it seems there is no easy way to run it locally, so it's a vendor lock-in situation.
Heyo, I'm the Product Director for detection & remediation engines, including CodeQL.
I would love to hear what kind of local experience you're looking for and where CodeQL isn't working well today.
As a general overview:
The CodeQL CLI is developed as an open-source project and can run CodeQL basically anywhere. The engine is free to use for all open-source projects, and free for all security researchers.
The CLI is available as release downloads, in homebrew, and as part of many deployment frameworks: https://github.com/advanced-security/awesome-codeql?tab=read...
Results are stored in standard formats and can be viewed and processed by any SARIF-compatible tool. We provide tools to run CodeQL against thousands of open-source repos for security research.
The repo linked above points to dozens of other useful projects (both from GitHub and the community around CodeQL).
The vagaries of the dual licensing discourages a lot of teams working on commercial projects from kicking the tires on CodeQL and generally hinders adoption for private projects as well: are there any plans to change the licensing in the future?
Nice, I for one didn't know about this. Thanks a bunch for chiming in!
CodeQL seems to raise too many false-positives in my experience.
I’d be interested in what kinds of false positives you’ve seen it produce. The functionality in CodeQL that I have found useful tends to accompany each reported vulnerability with a specific code path that demonstrates how the vulnerability arises. While we might still decide there is no risk in practice for other reasons, I don’t recall ever seeing it make a claim like this that was incorrect from a technical perspective. Maybe some of the other types of checks it performs are more susceptible to false positives and I just happen not to have run into those so much in the projects I’ve worked on.
The previous company I was working at (6 months ago) had a bunch of microservices, most in python using fastapi and pydantic. At one point the security team tuned on CodeQL for a bunch of them, and we just got a bunch of false positives for not validating a UUID url path param to a request handler. In fact the parameter was typed in the handler function signature, and fastapi does validate that type. But in this strange case, CodeQL knew that these were external inputs, but didn't know that fastapi would validate that path param type, so it suggested adding redundant type check and bail-out code, in 100s of places.
The patterns we had established were as simple, basic, and "safe" as practical, and we advised and code-reviewed the mechanics of services/apps for the other teams, like using database connections/pools correctly, using async correctly, validating input correctly, etc (while the other teams were more focused on features and business logic). Low-level performance was not really a concern, mostly just high-level db-queries or sub-requests that were too expensive or numerous. The point is, there really wasn't much of anything for CodeQL to find, all the basic blunders were mostly prevented. So, it was pretty much all false-positives.
Of course, the experience would be far different if we were more careless or working with more tricky components/patterns. Compare to the base-rate fallacy from medicine ... if there's a 99% accurate test across a population with nothing for it to find, the "1%" false positive case will dominate.
I also want to mention a tendency for some security teams to decide that their role is to set these things up, turn them on, cover their eyes, and point the hose at the devs. Using these tools makes sense, but these security teams think it's not practical for them to look at the output and judge the quality with their own brains, first. And it's all about the numbers: 80 criticals, 2000 highs! (except they're all the same CVE and they're all not valid for the same reason)
Interesting, thanks. In the UUID example you mentioned, it seems the CodeQL model is missing some information about how FastAPI’s runtime validation works and so not drawing correct inferences about the types. It doesn’t seem to have a general problem with tracking request parameters coming into Python web frameworks — in fact, the first thing that really impressed me about CodeQL was how accurate its reports were with some quite old Django code — but there is a lot more emphasis on type annotations and validating input against those types at runtime in FastAPI.
I completely agree about the problem of someone deciding to turn these kinds of scanning tools on and then expecting they’ll Just Work. I do think the better tools can provide a lot of value, but they still involve trade-offs and no tool will get everything 100% right, so there will always be a need to review their output and make intelligent decisions about how to use it. Scanning tools that don’t provide a way to persistently mark a certain result as incorrect or to collect multiple instances of the same issue together tend to be particularly painful to work with.
Bumping version of dependencies doesn't guarantee any improved safety as new versions can introduce security issues (otherwise we wouldn't have a need of patching old versions that used to be new).
If you replace a dependency that has a known vulnerability with a different dependency that does not, surely that is objectively an improvement in at least that specific respect? Of course we can’t guarantee that it didn’t introduce some other problem as well, but not fixing known problems because of hypothetical unknown problems that might or might not exist doesn’t seem like a great strategy.
I think he's referring to this part of the article:
> Dependencies should be updated according to your development cycle, not the cycle of each of your dependencies. For example you might want to update dependencies all at once when you begin a release development cycle, as opposed to when each dependency completes theirs.
and is arguing in favor of targeted updates.
It might surprise the younger crowd to see the number of Windows Updates you wouldn't have installed on a production machine, back when you made choices at that level. From this perspective Tesla's OTA firmware update scheme seems wildly irresponsible for the car owner.
Maybe. But at least everyone being on the same (new) version makes things simpler, compared to everyone being on different random versions, of what ever used to be current when they were written.