For context, I'm building a $500 pentesting solution for small dev teams. Read about the why's here.
Benchmarking
This week I ran the agent against Juice Shop, a deliberately-vulnerable Angular app the security industry uses as a test bed. It's so well-known it's borderline a meme. Dozens of known issues across every category in the OWASP Top 10.
The agents found zero.
My first assumption was that something had broken in the scan pipeline. I checked the logs. The agent had run cleanly. It had identified endpoints, made requests, attempted exploits. It just hadn't found anything worth reporting. Which on Juice Shop is impossible.
The bug turned out to be smaller and more embarrassing than I expected.
What was actually wrong
The recon layer of the tool reads the landing page HTML to figure out what kind of application it's looking at. Is this a WordPress site? An Angular SPA? A Django app? A Spring Java service? Knowing the framework matters because it shapes which exploit categories get prioritized, which paths get probed, which known-bad configurations get checked.
The fingerprinter looked at the landing HTML for telltale signs: src and href attributes pointing at framework-specific files, meta tags, version strings, common path patterns. For most applications this works fine. WordPress sites announce themselves loudly. Django renders server-side and leaves fingerprints in the markup. Spring leaves its standard error pages and actuator paths.
Juice Shop's landing HTML is, in its entirety, an <app-root> element and a favicon link. That's it. All the framework signals (the polyfill files, the runtime bundle, the Angular CLI fingerprints) live inside JavaScript files that load after the page renders. The fingerprinter never saw them, because it only read the initial HTML.
So the chain of failures looked like this. The agent correctly identified "this is a web application." It then incorrectly identified the framework as "unknown." Because the framework was unknown, the scan ran only the most generic, framework-agnostic checks, the ones unlikely to find anything on a custom-built application that doesn't match common known-bad patterns. Result: zero findings.
The fix took an afternoon. Pass the JavaScript URLs that the JS-extraction layer already discovered into the fingerprinter. Add signatures that match Angular CLI's output naming patterns (both hashed names like polyfills.abc123.js and unhashed names like polyfills.js. Juice Shop uses the unhashed variant). Re-run the scan. Findings everywhere they should be.
Why this is the most embarrassing class of bug
If a security tool fails to find a known issue, that's bad. If a security tool fails silently to find a whole class of known issues, on a whole class of targets, that's worse.
Nothing in the system flagged that the scan was incomplete. No error, no warning, no "framework not detected, falling back to generic mode" log line. The scan ran end-to-end, returned a clean zero-findings report, and looked exactly like a successful run against a hardened application.
This is the kind of bug that small dev teams should be most worried about in any security tool they buy. False positives are noisy and annoying, but they're visible. You can argue with them. False negatives are silent. They tell you everything's fine when it isn't.
And the specific false negative I just shipped is the worst possible one for my actual market. The whole reason small dev teams need a tool like this is that they're building modern web applications. Angular, React, Vue. Single-page apps where most of the application logic lives in JavaScript that loads after the initial page render. Exactly the class of app my tool was silently bad at.
The wider test
Once I'd fixed the SPA bug, I wanted to make sure the foundations actually held up across a variety of stacks. So I ran the agent against five other intentionally-vulnerable applications:
- An old PHP app representing classic LAMP-stack vulnerabilities
- A Spring Java app with the kind of misconfigurations enterprise codebases accumulate
- A Node API designed around broken access control patterns
- A WordPress installation with vulnerable plugins
- A Python GraphQL app with introspection and authorization issues
Across all six targets, Juice Shop plus those five, the agent produced 64 validated findings. Total scan time: 17 minutes.
Those numbers are useful context. They confirm the foundations are working. But the lesson of the week wasn't in the 64 findings. It was in the zero.
What I'm taking from this
The deterministic layer of the tool can be quietly wrong for an entire class of targets and not trigger a single alarm. Everything else can look fine (the agent runs cleanly, the report generates, the dashboard shows green) and the tool can still be useless for the customer in front of you.
The only way I caught this was by running against a target I hadn't tested before. My existing test set was biased toward applications with rich landing-page HTML. Adding one Angular SPA to the mix exposed a blind spot that had been there from day one.
The new rule, internally: every new customer's stack is a target I haven't tested before. The first scan on any customer is half pentest, half product validation, until I've seen enough variety in the wild to actually trust that the foundations hold.
There's a broader principle hiding in here too, which I think applies to any AI-driven product right now. The parts of your system that look deterministic, the fingerprinting and classification and routing, are often the parts most likely to fail silently. The LLM-driven parts of my tool fail loudly when they fail, because they hallucinate or contradict themselves or run out of budget. The deterministic regex-based fingerprinter failed quietly for months and nobody noticed. Treating the boring parts of the system with as much suspicion as the fancy parts is going to be a recurring theme of this build log.
What I'd love to hear
If you're running a single-page app (Angular, React, Vue) and you've ever wondered whether the security tools you've used actually understand the front-end shape of your app, I'd love to hear what you've been using and how it's gone. Most automated security tools were designed in an era of server-rendered HTML and they handle modern frontends with varying degrees of awareness. I'm curious what other people's experience has been.
If that's you, book a 15-minute call or reply on LinkedIn.
Week 3 next Tuesday.
About the author

Join my mailing list
Stay up to date with everything Skripted.
Sign up for periodic updates on #IaC techniques, interesting AWS services and serverless.

