Monthly Archives: May 2026

Inheriting a Legacy Security Nightmare — And Actually Fixing It

A field note on what it looks like to walk into a five-year-old codebase with no original authors, a fresh security audit, and a deadline.


The Setup

A few months ago, a leader I respect reached out. They had recently inherited ownership of an enterprise web platform — a system that had been quietly doing its job for nearly five years. The original architects, the engineers who built it, and most of the product stakeholders had long since moved on. What remained was a working product, a handful of documentation fragments, and no living memory of why certain decisions were made.

Then the security reports started landing.

Not one or two findings. Dozens. SQL injection. Authentication bypasses. Token issuance endpoints that had no business existing. Some of them had been open for over a year.

They asked if I could help. I said yes. This is what I learned.


The Codebase

The system is a multi-repo monorepo — several Angular front-end apps, a Node/Express server layer for each, and a shared microservices backend. Each layer was built at a different time by a different team. Some modules are polished. Some have the unmistakable fingerprints of “I needed to ship this by Friday.”

The repos had diverged. Region-specific variants had been forked from the main app and then evolved independently. One module had a routing file with a comment block from 2021 that referenced a wiki page that no longer existed.

This is normal. Five years is a long time in software.


Finding 1: The Authentication Surface

The first class of issues was in the front-end layer — two apps that together form the entry point for the platform.

Both apps had a POST handler that would accept a user identifier from the request body and mint a signed JWT in response. No session validation. No verification that the caller was actually logged in. Just: “you sent a user ID, here is a token.”

This is the kind of code that makes complete sense in the context it was written. Someone needed to bridge an external authentication system into a Node app. They had a form POST coming in, they trusted it was legitimate, and they wrote code that acted on that trust. At the time, behind a corporate VPN, it probably felt fine.

Five years later, with a formal third-party security audit in play, it was a critical finding.

The fix wasn’t obvious. The naive answer — “just validate the session server-side” — required the session cookie to be present on the request. But cookies are scoped by path, and the app wasn’t running under the right path for the session cookie to be sent. The fix required a coordinated change: relocate the first app to a new path that fell inside the cookie’s scope, redirect legacy traffic at the routing layer, and rebuild the authentication primitive from scratch using a database session lookup.

Then, for the second app: since it lives at a different path and can never receive the session cookie, it needed a different approach entirely. The answer was to stop minting tokens there completely. Remove the endpoint. Enforce a gate that only accepts tokens already issued by the first app. One authority, one trust anchor.

Shipping both took about two weeks across three PRs. The hardest part wasn’t the code — it was understanding the trust model well enough to know what the code should be.


Finding 2: SQL Injection at Scale

While the auth work was wrapping up, the second wave of security reports arrived.

Dozens of SQL injection findings across the backend microservices.

The root cause was the same in every single case. A query template with a named placeholder. A service method that substituted user input directly into the query string using string replacement. Then that string, with user data baked in, passed directly to the database driver’s non-parameterized execution path.

// The vulnerable pattern - repeated across the codebase
const query = queries.FETCH_DATA.replace('@param', userInput);
await db.execute(query);

The fix was equally uniform:

// The fixed pattern
await db.executeWithParams(queries.FETCH_DATA, { param: userInput });

The DAO layer already had a parameterized execution method. It was sitting right there, unused for this pattern, presumably because the original author didn’t know it existed or the pattern hadn’t been established yet when these service files were written.

File after file. Same fix. Different parameter names.

This is what happens when a pattern becomes a convention without being reviewed. The first person writes it one way. The next person copies it. Five years later you have dozens of instances of the same mistake, all surfaced in a single audit sweep.


What I Actually Learned

1. Legacy codebases aren’t broken — they’re contextless.

Every strange decision I found had a reason. It just wasn’t written down, or the person who knew it had left. The code isn’t wrong. It’s just been outlived by the threat model it was written against.

2. Understanding the trust model is the whole job.

The hardest part of the auth fix wasn’t writing the validation logic. It was sitting down and drawing out: who mints tokens? who verifies them? what cookies are available where and why? Once that was clear, the code was almost mechanical. Before that clarity, every PR felt risky.

3. Consistent bad patterns are actually a gift.

Dozens of SQL injection findings sounds catastrophic. But because the root cause was identical in every case, fixing them was systematic. One engineer, a clear pattern, and enough time. Inconsistent vulnerabilities — each with a different root cause — would have been far harder.

4. Documentation debt compounds like financial debt.

The hours spent reverse-engineering why the path routing worked the way it did, why certain cookies were set, why a particular environment flag existed — all of that was time not spent fixing things. Every future engineer who touches this system will either pay that cost again or benefit from the ADRs and notes I’m leaving behind.

5. Coming in as an outsider is a superpower.

I had no attachment to the original decisions. I could ask “why does this exist?” without anyone feeling criticized. The leader who brought me in had the same openness. That made hard conversations — “this needs to be deleted, not patched” — much easier than they might have been.


The Tool That Was Actually in the Room

I want to be honest about something that doesn’t get talked about enough in engineering write-ups: I didn’t do this alone, and the co-author wasn’t human.

I used an AI coding assistant — specifically Cursor with Claude — as a hands-on collaborator throughout this entire remediation. Not as an autocomplete tool. Not to generate boilerplate. As an actual working partner that wrote code, reasoned about trust models, drafted architecture decision records, and produced PR-ready diffs.

Here’s what the actual working loop looked like, as honestly as I can describe it:

  1. I’d share the problem — a security finding, a broken behaviour, a constraint I’d discovered in the codebase.
  2. The AI would propose a solution — sometimes code, sometimes a question back at me, sometimes a structured document laying out options.
  3. I’d push back. “This doesn’t work because the cookie isn’t scoped here.” “This will cause an infinite redirect.” “The infrastructure config is wrong — look at where the volume actually mounts.”
  4. The AI would revise. Not defensively. Just: absorb the feedback, update the model, try again.
  5. Repeat until I was confident enough to push.

Looking at the git log across both repos, 13 commits landed over the course of about a week. The path relocation, the session validation gate, the impersonation detection, the JWT entry gate on the second app, the infrastructure fix, the redirect loop fix — all of it went through this loop. The AI wrote most of the initial code. I caught the gaps. Together we got to something I was willing to put in production.

What surprised me wasn’t that the AI could write the code. That part I expected. What surprised me was how useful it was to have something that could hold the full context of the problem — multiple ADRs, a security finding write-up, a Node server file, a routing config — and reason across all of it at once. The kind of reasoning that would normally require a senior engineer who’d been on the codebase for months.

That said: the AI was wrong, regularly. It made assumptions about the environment. It occasionally proposed fixes that would have introduced new problems. It didn’t know things I knew from reading an infrastructure config at 11pm. The loop only worked because I knew enough to catch those gaps.

Reproducing production locally — the nginx proxy trick. Before any browser-driven testing could work, I had to solve a subtler problem: cookie-scoped authentication simply cannot be tested with separate localhost ports. A cookie scoped to a specific path on one port is invisible to an app running on a different port. To faithfully reproduce the production cookie flow, CORS behaviour, and CSRF constraints on my laptop, I stood up a local nginx reverse proxy on a custom hostname with path-based routing — unifying the home app, the reports app, and the microservice backend under a single origin, mirroring production exactly. Only then did the cookie handoff between apps behave the way it would in the real environment.

Testing without writing tests. One thing that genuinely surprised me: I never wrote a single line of Playwright test code. Cursor has an MCP integration that connects the AI agent directly to a live Chrome instance via Chrome DevTools Protocol. Combined with an agent browser tool, I could describe what I wanted to verify — “log in, navigate to the protected page, confirm the JWT gate blocks unauthenticated requests” — and the agent would drive the browser, execute the flow, take screenshots, and report back. End-to-end test coverage for security-critical flows, with zero test-authoring overhead on my end.

Attacking my own fixes. On the offensive side, I used Burp Suite Community Edition to validate the fixes before shipping. Replaying captured requests with tampered payloads, testing the SQL injection patterns against the parameterized endpoints, probing the JWT gate with malformed and expired tokens. If Burp could break it, the fix wasn’t done. If it couldn’t, I had reasonable confidence the patch held. Having both the AI writing the defence and a real attack tool probing it created a tighter feedback loop than code review alone would have.

Which is, I think, the actual lesson: AI assistance in this kind of work isn’t about replacing engineering judgment. It’s about removing the friction between having a judgment and turning it into working, tested, and validated code. That friction — the “I know what needs to happen but now I need to write 80 lines of middleware, a test suite, and then manually click through the app to verify it” friction — is where a lot of security work quietly stalls.

It didn’t stall here.


Where Things Stand

The auth fixes are in production. The SQL injection remediation is in progress. The codebase has more documentation today than it did three months ago. The security findings are being closed.

The original team who built this shipped something that ran reliably for five years and served a large internal user base. That’s genuinely hard to do. What I’m doing now isn’t a criticism of them — it’s just the next chapter.

Legacy systems don’t need heroes. They need patience, curiosity, and someone willing to read a five-year-old routing config until it makes sense.


Written by a developer who spent way too long reading cookie scope documentation and came out the other side with opinions.

When Googling My Own Product Sent Visitors to a Prayer App: A Debugging Story With My AI Pair

The setup that broke my Friday afternoon

I was checking my own SEO. I typed “quizwrap” into Google. My site, QuizWrap — a free quiz-maker for students — showed up as the very first result. Great.

I clicked it.

A Ho’oponopono prayer counter loaded.

That’s a completely different app I run on the same server, and visitors looking for QuizWrap were landing on it instead. Worse, I quickly noticed a related issue: visiting https://smartdisha.co.in/ directly threw a TLS certificate error in the browser.

Two bugs, both on the same VPS, both involving the nginx reverse proxy that fronts everything. I sat down with Claude (Anthropic’s coding agent inside Claude Code) and we dug in together. What follows is the story of that debugging session — both the technical findings and what it was like to pair-debug with an AI.


The architecture (and a quick glossary)

A quick mental model so the rest of this makes sense.

A single VPS hosts three sites behind one system nginx — a popular web server that, in this setup, acts as a reverse proxy: a traffic cop sitting in front that takes incoming HTTPS requests and forwards them to the right internal app.

  • quizwrap.com — my quiz app
  • prayer.quizwrap.com — a small prayer counter
  • smartdisha.co.in — a separate site on the same box

Some traffic flows through a CDN before reaching origin, some doesn’t. Each domain has its own free Let’s Encrypt TLS certificate (the thing that makes the little padlock icon appear in your browser), and nginx is configured with one server block per domain.

A few terms I’ll keep using:

  • TLS — the encryption layer behind HTTPS. The “S” in HTTPS.
  • Certificate — a small file that proves a server owns the domain it claims to. Browsers reject the connection if the cert doesn’t match the domain.
  • SNI (Server Name Indication) — the most important term in this whole post. When your browser opens a TLS connection to smartdisha.co.in, it whispers the hostname it wants before the encryption is set up, so the server knows which certificate to present. One server can host many domains on the same IP, and SNI is how it picks the right cert. If SNI says one thing and the server returns the wrong cert, the browser shows a security warning and refuses to load the page.
  • Server block — nginx’s term for “the config chunk that handles requests for one domain.” Each domain has one (or several).
  • server_name directive — the line inside a server block that lists which hostnames that block is responsible for. If no block claims a hostname, nginx silently picks a default block as a fallback.

Bug #1: www.quizwrap.com was serving the prayer app

The detective work

Before touching anything, Claude pulled response headers from both URLs in parallel:

curl -sI https://www.quizwrap.com/
curl -sI https://smartdisha.co.in/

Two response bodies came back with identical fingerprints:

www.quizwrap.com smartdisha.co.in
ETag "69d2087a-332" "69d2087a-332"
Content-Length 818 818
Last-Modified same date same date
Title in body Ho'oponopono Counter Ho'oponopono Counter

(An ETag is a unique fingerprint a web server attaches to a file’s response — like a checksum. Two responses with the same ETag are byte-for-byte the same file.)

Same file, served to two different domains. Now we knew it was an nginx routing question, not a DNS or CDN issue.

Reading the configs over SSH

I had Claude SSH into my server (passwordless key auth — read-only operations, no sudo) and dump the three nginx configs. The first thing it spotted:

# /etc/nginx/sites-available/quizwrap.com
server {
    server_name quizwrap.com;
    ...
}

server_name quizwrap.com — not quizwrap.com www.quizwrap.com. There was no server block anywhere on the box claiming www.quizwrap.com. When a request arrived at my server saying “this is for www.quizwrap.com, nginx had no rule that named that hostname, so it fell back to the first SSL block in alphabetical order — the one for prayer.quizwrap.com, which is what serves the prayer app.

That’s how a Google click on www.quizwrap.com ended up rendering Ho’oponopono. nginx was doing exactly what it was told; what it was told just didn’t include the www version of my domain.

The fix

A one-liner:

sudo sed -i 's/server_name quizwrap.com;/server_name quizwrap.com www.quizwrap.com;/' \
  /etc/nginx/sites-available/quizwrap.com
sudo nginx -t && sudo systemctl reload nginx

A test confirmed it:

HTTP/2 200
last-modified: Sun, 30 Nov 2025 15:42:27 GMT   ← quizwrap build, not the prayer one
<title>QuizWrap - FREE Study Quiz Maker for Students</title>

Then a defensive follow-up: re-issue the Let’s Encrypt cert to cover the www version too, so the cert chain stays internally consistent. (A single cert can list multiple hostnames in a field called the Subject Alternative Name, or SAN — that’s just “the list of domains this cert is valid for.”) One certbot command added www.quizwrap.com to the cert. Done.

Bug #1: 5 minutes from “what is happening” to “fixed.”

Bug #2 was not like that.


Bug #2: smartdisha.co.in and the certificate that wouldn’t come right

The symptom

Browsers refused https://smartdisha.co.in/ with a cert error. openssl s_client showed why:

$ echo | openssl s_client -servername smartdisha.co.in -connect smartdisha.co.in:443 2>/dev/null \
    | openssl x509 -noout -subject -ext subjectAltName

subject=CN = prayer.quizwrap.com
DNS:prayer.quizwrap.com

The browser asked for smartdisha.co.in (via SNI), and the server handed back a certificate that says “I’m prayer.quizwrap.com.” That’s a name mismatch, so the browser refuses the connection — you’ve probably seen the resulting “Your connection is not private” error page. At first I thought the fix was going to be just as quick as the www one.

It wasn’t.

Two hours of dead ends

Here’s the parade of “that should have fixed it”:

  1. Re-issue the cert? sudo certbot --nginx -d smartdisha.co.in — certbot reported there was an existing cert and offered to reinstall. Reinstalled. No change. Browser still got prayer’s cert.
  2. Maybe nginx didn’t reload cleanly. sudo systemctl reload nginx. No change.
  3. Inspect the cert file directly.
    sudo openssl x509 -in /etc/letsencrypt/live/smartdisha.co.in/fullchain.pem \
        -noout -subject -ext subjectAltName
    subject=CN = smartdisha.co.in
    DNS:smartdisha.co.in

    The file on disk was correct. nginx just wasn’t serving it.

  4. Maybe the workers cached an old cert. sudo systemctl restart nginx. No change.
  5. Check nginx -T for the loaded config. The smartdisha SSL block was fully loaded, with the right server_name, the right listen 443 ssl;, and the right cert path. Everything looked correct. Still no change.

At one point I checked ps and noticed three nginx master processes — two with nginx -g daemon off; (the Docker-container telltale) and one system nginx. Claude initially flagged this as the smoking gun: maybe a Docker container was intercepting TLS. We confirmed via ss -tlnp that the system nginx was actually the only thing on port 443; the Docker nginxes were just internal app servers behind it. Wrong turn — but a reasonable one.

My moment of skepticism

I sent Claude a screenshot of my DNS panel with the message:

“Before we go chase our tail. Check the configuration attached.”

This was the right instinct. I was tired of theories that weren’t panning out. Stepping back to verify a load-bearing assumption — is the request path for this domain actually what we think it is? — confirmed we were looking at the right place, but it could just as easily have caught us going the wrong way for another hour.

Lesson: when you’re three theories deep and none have stuck, your AI assistant doesn’t always notice it’s in a loop. Pushing back is your job.

The breakthrough: probing SNI directly

Claude wrote a small loop that asked nginx, in plain English: “If a browser tells you it wants hostname X, which certificate do you hand back?” It does this once for each domain on the box.

ssh my-server 'for sni in <each-hostname-on-the-box>; do
  printf "SNI=%-30s -> " "$sni"
  echo | openssl s_client -servername "$sni" -connect localhost:443 2>/dev/null \
    | openssl x509 -noout -subject 2>/dev/null
done'
SNI=smartdisha.co.in           -> CN = prayer.quizwrap.com    ❌
SNI=www.quizwrap.com           -> CN = quizwrap.com            ✓
SNI=quizwrap.com               -> CN = quizwrap.com            ✓
SNI=prayer.quizwrap.com        -> CN = prayer.quizwrap.com     ✓
SNI=nonexistent.example.com    -> CN = prayer.quizwrap.com     (default fallback)

There it was. smartdisha.co.in was being treated identically to a totally unknown hostname. It wasn’t a cert problem at all — the cert file on disk was perfectly fine. nginx just wasn’t recognizing smartdisha.co.in as a hostname it knew about. Both unknown hostnames and smartdisha.co.in fell through to the same default fallback block (prayer, which is alphabetically first), which is why both got prayer’s cert.

The actual root cause

With that clue, Claude re-read all three nginx configs side-by-side and found the only structural difference:

Block IPv6 listen IPv4 listen
prayer listen [::]:443 ssl ipv6only=on; listen 443 ssl;
quizwrap listen [::]:443 ssl; (dual-stack) listen 443 ssl;
smartdisha — missing — listen 443 ssl;

A bit of background to read that table: every server on the internet has two kinds of addresses available — older IPv4 (the familiar 1.2.3.4 style) and newer IPv6 (the longer ::1 style). nginx’s listen directive tells it which addresses to accept connections on. listen 443 ssl; means “IPv4 only.” listen [::]:443 ssl; means “IPv6,” but on Linux it can also quietly accept IPv4 connections at the same time — that’s what “dual-stack” means.

quizwrap’s listen [::]:443 ssl; (without ipv6only=on) creates one of these dual-stack sockets. Internally, nginx groups server blocks by which socket they’re attached to, and uses that grouping to decide who handles each incoming connection. smartdisha, lacking any IPv6 listen line of its own, ends up in a different group than the dual-stack one, and inside that group the prayer block (alphabetically first) becomes the default catch-all. Even though smartdisha’s server block is loaded and looks correct, the grouping means SNI lookups for smartdisha.co.in arrive at a group where smartdisha isn’t listed — and fall back to prayer.

Subtle, weird, and exactly the kind of thing nginx -t (the config syntax checker) won’t catch, because the syntax is fine.

The fix

Make smartdisha’s listen directives match the others:

sudo sh -c '
  cp /etc/nginx/sites-available/smartdisha.co.in /etc/nginx/sites-available/smartdisha.co.in.bak
  sed -i "/^    listen 443 ssl; # managed by Certbot$/i\\    listen [::]:443 ssl;" \
    /etc/nginx/sites-available/smartdisha.co.in
  nginx -t && systemctl reload nginx && echo DONE
'

Re-running the SNI probe afterwards:

SNI=smartdisha.co.in           -> CN = smartdisha.co.in        ✓

curl https://smartdisha.co.in/ succeeded with full TLS validation, no -k flag needed. The browser was happy.


What it was actually like to debug this with an AI

A few things stood out about the collaboration that I want to share.

Claude was great at the things I’m bad at. It pulled response headers from two domains in parallel, parsed cert subjects out of openssl s_client output, and noticed immediately that two responses had the same ETag — something I’d have read past. The structured diff between three nginx configs at the end (the listener-table comparison) was exactly the kind of thing my eyes glaze over after the second config file.

I was great at the things Claude is bad at. When we got stuck on Bug #2, Claude proposed three theories in a row, each plausible, none correct. The Docker-container theory in particular was a confidently-stated wrong answer. I knew that side of my own infrastructure well enough to be unimpressed. My push-back (“before we chase our tail”) was what reset the direction.

Security boundaries actually got enforced. When I offered Claude my sudo password to speed things up, it explicitly refused and explained why (the password would be in the chat transcript, in shell process listings, and a single leak compromises the whole server). It walked me through the alternatives — running the destructive commands myself in my own terminal, or scoping a passwordless sudoers rule for nginx-related commands only. Reading the full advice, I ended up just running each sudo command in my own shell and pasting the result. Slower, but at no point did a privileged credential cross a boundary it shouldn’t.

Transparency mattered. Halfway through Bug #2 I told Claude “I can’t see the commands you’re executing on my server.” It immediately listed every SSH command it had run and committed to printing each new command before executing it. That changed the dynamic — it stopped feeling like Claude was off doing things in the dark and started feeling like a teammate sharing their screen.

Knowing when to escalate to a one-shot fix. After multiple roundtrips of “paste this, paste that,” I asked Claude to drive over SSH so I could stop copy-pasting. It moved the read-only diagnostics to its own SSH connection and packaged the one mutating step into a single sudo block I could paste once and approve once. The friction of the back-and-forth dropped massively.


Lessons that generalise

A few things I’m taking away from this:

  1. Identical ETags across two domains = the same file is being served. If two of your sites unexpectedly look the same, that single header probably solves the mystery before you read a line of config.
  2. server_name is a registration, not just a label. If a hostname isn’t named in any block, nginx won’t error — it’ll silently pick a default and serve someone else’s content.
  3. nginx -t passing means valid syntax. It does not mean what you intended. All three configs in this story passed nginx -t with no warnings while half-broken.
  4. Mixing listen [::]:443 ssl; (dual-stack) and listen 443 ssl; (IPv4-only) across server blocks is a footgun. Either go all-dual-stack or all-with-ipv6only=on. Mixing changes the listener topology in ways that affect SNI dispatch.
  5. The openssl s_client -servername X -connect Y:443 probe is a debugging superpower. It’s a one-line command that simulates exactly what a browser does — say “I want hostname X” via SNI, and see which certificate the server returns. Whenever an HTTPS-served domain is misbehaving, this probe will often tell you the answer in five lines.
  6. Pair-debugging with an AI works best when you stay in the loop. Treat its theories as drafts, not conclusions. Push back when you smell drift. Make it show its work.

The whole session was somewhere between two and three hours. By the end my SEO problem was gone, my secondary domain’s TLS was clean, and I had a much better mental model of how nginx makes SNI decisions across mixed-listener configurations. Worth the afternoon.


Total commands run on the server during this session: about 30. Total commands run with sudo: 5. Total credentials shared with the AI: zero.