Category Archives: Microservices

Inheriting a Legacy Security Nightmare — And Actually Fixing It

A field note on what it looks like to walk into a five-year-old codebase with no original authors, a fresh security audit, and a deadline.


The Setup

A few months ago, a leader I respect reached out. They had recently inherited ownership of an enterprise web platform — a system that had been quietly doing its job for nearly five years. The original architects, the engineers who built it, and most of the product stakeholders had long since moved on. What remained was a working product, a handful of documentation fragments, and no living memory of why certain decisions were made.

Then the security reports started landing.

Not one or two findings. Dozens. SQL injection. Authentication bypasses. Token issuance endpoints that had no business existing. Some of them had been open for over a year.

They asked if I could help. I said yes. This is what I learned.


The Codebase

The system is a multi-repo monorepo — several Angular front-end apps, a Node/Express server layer for each, and a shared microservices backend. Each layer was built at a different time by a different team. Some modules are polished. Some have the unmistakable fingerprints of “I needed to ship this by Friday.”

The repos had diverged. Region-specific variants had been forked from the main app and then evolved independently. One module had a routing file with a comment block from 2021 that referenced a wiki page that no longer existed.

This is normal. Five years is a long time in software.


Finding 1: The Authentication Surface

The first class of issues was in the front-end layer — two apps that together form the entry point for the platform.

Both apps had a POST handler that would accept a user identifier from the request body and mint a signed JWT in response. No session validation. No verification that the caller was actually logged in. Just: “you sent a user ID, here is a token.”

This is the kind of code that makes complete sense in the context it was written. Someone needed to bridge an external authentication system into a Node app. They had a form POST coming in, they trusted it was legitimate, and they wrote code that acted on that trust. At the time, behind a corporate VPN, it probably felt fine.

Five years later, with a formal third-party security audit in play, it was a critical finding.

The fix wasn’t obvious. The naive answer — “just validate the session server-side” — required the session cookie to be present on the request. But cookies are scoped by path, and the app wasn’t running under the right path for the session cookie to be sent. The fix required a coordinated change: relocate the first app to a new path that fell inside the cookie’s scope, redirect legacy traffic at the routing layer, and rebuild the authentication primitive from scratch using a database session lookup.

Then, for the second app: since it lives at a different path and can never receive the session cookie, it needed a different approach entirely. The answer was to stop minting tokens there completely. Remove the endpoint. Enforce a gate that only accepts tokens already issued by the first app. One authority, one trust anchor.

Shipping both took about two weeks across three PRs. The hardest part wasn’t the code — it was understanding the trust model well enough to know what the code should be.


Finding 2: SQL Injection at Scale

While the auth work was wrapping up, the second wave of security reports arrived.

Dozens of SQL injection findings across the backend microservices.

The root cause was the same in every single case. A query template with a named placeholder. A service method that substituted user input directly into the query string using string replacement. Then that string, with user data baked in, passed directly to the database driver’s non-parameterized execution path.

// The vulnerable pattern - repeated across the codebase
const query = queries.FETCH_DATA.replace('@param', userInput);
await db.execute(query);

The fix was equally uniform:

// The fixed pattern
await db.executeWithParams(queries.FETCH_DATA, { param: userInput });

The DAO layer already had a parameterized execution method. It was sitting right there, unused for this pattern, presumably because the original author didn’t know it existed or the pattern hadn’t been established yet when these service files were written.

File after file. Same fix. Different parameter names.

This is what happens when a pattern becomes a convention without being reviewed. The first person writes it one way. The next person copies it. Five years later you have dozens of instances of the same mistake, all surfaced in a single audit sweep.


What I Actually Learned

1. Legacy codebases aren’t broken — they’re contextless.

Every strange decision I found had a reason. It just wasn’t written down, or the person who knew it had left. The code isn’t wrong. It’s just been outlived by the threat model it was written against.

2. Understanding the trust model is the whole job.

The hardest part of the auth fix wasn’t writing the validation logic. It was sitting down and drawing out: who mints tokens? who verifies them? what cookies are available where and why? Once that was clear, the code was almost mechanical. Before that clarity, every PR felt risky.

3. Consistent bad patterns are actually a gift.

Dozens of SQL injection findings sounds catastrophic. But because the root cause was identical in every case, fixing them was systematic. One engineer, a clear pattern, and enough time. Inconsistent vulnerabilities — each with a different root cause — would have been far harder.

4. Documentation debt compounds like financial debt.

The hours spent reverse-engineering why the path routing worked the way it did, why certain cookies were set, why a particular environment flag existed — all of that was time not spent fixing things. Every future engineer who touches this system will either pay that cost again or benefit from the ADRs and notes I’m leaving behind.

5. Coming in as an outsider is a superpower.

I had no attachment to the original decisions. I could ask “why does this exist?” without anyone feeling criticized. The leader who brought me in had the same openness. That made hard conversations — “this needs to be deleted, not patched” — much easier than they might have been.


The Tool That Was Actually in the Room

I want to be honest about something that doesn’t get talked about enough in engineering write-ups: I didn’t do this alone, and the co-author wasn’t human.

I used an AI coding assistant — specifically Cursor with Claude — as a hands-on collaborator throughout this entire remediation. Not as an autocomplete tool. Not to generate boilerplate. As an actual working partner that wrote code, reasoned about trust models, drafted architecture decision records, and produced PR-ready diffs.

Here’s what the actual working loop looked like, as honestly as I can describe it:

  1. I’d share the problem — a security finding, a broken behaviour, a constraint I’d discovered in the codebase.
  2. The AI would propose a solution — sometimes code, sometimes a question back at me, sometimes a structured document laying out options.
  3. I’d push back. “This doesn’t work because the cookie isn’t scoped here.” “This will cause an infinite redirect.” “The infrastructure config is wrong — look at where the volume actually mounts.”
  4. The AI would revise. Not defensively. Just: absorb the feedback, update the model, try again.
  5. Repeat until I was confident enough to push.

Looking at the git log across both repos, 13 commits landed over the course of about a week. The path relocation, the session validation gate, the impersonation detection, the JWT entry gate on the second app, the infrastructure fix, the redirect loop fix — all of it went through this loop. The AI wrote most of the initial code. I caught the gaps. Together we got to something I was willing to put in production.

What surprised me wasn’t that the AI could write the code. That part I expected. What surprised me was how useful it was to have something that could hold the full context of the problem — multiple ADRs, a security finding write-up, a Node server file, a routing config — and reason across all of it at once. The kind of reasoning that would normally require a senior engineer who’d been on the codebase for months.

That said: the AI was wrong, regularly. It made assumptions about the environment. It occasionally proposed fixes that would have introduced new problems. It didn’t know things I knew from reading an infrastructure config at 11pm. The loop only worked because I knew enough to catch those gaps.

Reproducing production locally — the nginx proxy trick. Before any browser-driven testing could work, I had to solve a subtler problem: cookie-scoped authentication simply cannot be tested with separate localhost ports. A cookie scoped to a specific path on one port is invisible to an app running on a different port. To faithfully reproduce the production cookie flow, CORS behaviour, and CSRF constraints on my laptop, I stood up a local nginx reverse proxy on a custom hostname with path-based routing — unifying the home app, the reports app, and the microservice backend under a single origin, mirroring production exactly. Only then did the cookie handoff between apps behave the way it would in the real environment.

Testing without writing tests. One thing that genuinely surprised me: I never wrote a single line of Playwright test code. Cursor has an MCP integration that connects the AI agent directly to a live Chrome instance via Chrome DevTools Protocol. Combined with an agent browser tool, I could describe what I wanted to verify — “log in, navigate to the protected page, confirm the JWT gate blocks unauthenticated requests” — and the agent would drive the browser, execute the flow, take screenshots, and report back. End-to-end test coverage for security-critical flows, with zero test-authoring overhead on my end.

Attacking my own fixes. On the offensive side, I used Burp Suite Community Edition to validate the fixes before shipping. Replaying captured requests with tampered payloads, testing the SQL injection patterns against the parameterized endpoints, probing the JWT gate with malformed and expired tokens. If Burp could break it, the fix wasn’t done. If it couldn’t, I had reasonable confidence the patch held. Having both the AI writing the defence and a real attack tool probing it created a tighter feedback loop than code review alone would have.

Which is, I think, the actual lesson: AI assistance in this kind of work isn’t about replacing engineering judgment. It’s about removing the friction between having a judgment and turning it into working, tested, and validated code. That friction — the “I know what needs to happen but now I need to write 80 lines of middleware, a test suite, and then manually click through the app to verify it” friction — is where a lot of security work quietly stalls.

It didn’t stall here.


Where Things Stand

The auth fixes are in production. The SQL injection remediation is in progress. The codebase has more documentation today than it did three months ago. The security findings are being closed.

The original team who built this shipped something that ran reliably for five years and served a large internal user base. That’s genuinely hard to do. What I’m doing now isn’t a criticism of them — it’s just the next chapter.

Legacy systems don’t need heroes. They need patience, curiosity, and someone willing to read a five-year-old routing config until it makes sense.


Written by a developer who spent way too long reading cookie scope documentation and came out the other side with opinions.

From One Big App to Many Small Ones: A Developer’s Guide to Containers

Picture this: you’ve built a successful web application that started small but has grown into something amazing. Users love it, your team has expanded, and everything seems great. But there’s a problem lurking beneath the surface. Every time you want to add a new feature or fix a bug, it takes forever. Deploying updates feels like performing surgery on a patient who’s wide awake. Sound familiar?

If you’re nodding your head, you’re dealing with what developers call a “monolith” – an application where everything is bundled together in one massive codebase. While monoliths work great when you’re starting out, they can become a real headache as your app grows. The good news? There’s a proven way to solve this problem using something called containers.

What’s the Problem with Big Applications?

Think of a monolithic application like a huge department store where everything is connected. The clothing section shares the same checkout system as electronics, the inventory system controls everything from shoes to smartphones, and if you want to renovate the toy section, you might accidentally break the jewelry department.

In software terms, this means:

  • When one part of your app breaks, it can bring down everything else
  • Adding new features requires testing the entire application
  • Scaling becomes expensive because you have to scale everything, even if you only need more power for one feature
  • Different teams end up stepping on each other’s toes

Enter Containers: Your App’s New Best Friend

Containers are like moving each department of that massive store into its own building. Each department (or service) can operate independently, but they can still communicate with each other when needed. If the toy store needs renovation, the electronics store keeps running without interruption.

In technical terms, a container packages your application code along with everything it needs to run – like a lunch box that contains not just your sandwich, but also the plate, napkin, and utensil you need to eat it.

The Step-by-Step Journey: From Chaos to Order

1. Take a Good, Hard Look at What You Have

Before you start tearing apart your application, you need to understand what you’re working with. This is like creating a detailed floor plan of that massive department store before you start moving things around.

Spend time examining your codebase to identify different functional areas. Most applications naturally group into sections like:

  • User accounts and login systems
  • Payment processing
  • Email notifications
  • Data reporting
  • Content management

Draw these relationships out on paper or in a diagramming tool. You’ll be surprised how much this simple exercise reveals about your application’s structure.

2. Put Your Entire App in a Container First

Here’s where most people make a mistake: they immediately try to break everything apart. Don’t do that. Instead, take your entire monolithic application and put it in a container first.

This is like moving your entire department store into a standardized building before you start separating departments. It solves a huge problem called “environment inconsistency” – the dreaded “it works on my computer but not on yours” syndrome.

When your app runs the same way on your laptop, your colleague’s computer, and your production servers, you eliminate countless headaches and mysterious bugs.

3. Pick Your First Target Carefully

Now comes the fun part: choosing which piece to extract first. This decision is crucial and should be strategic, not random.

Look for parts of your application that are:

  • Self-contained (they don’t depend heavily on other parts)
  • Relatively simple
  • Not critical to your core business logic

Great first candidates include:

  • Authentication systems (login/logout functionality)
  • Email notification services
  • File upload handlers
  • Search functionality

Successfully extracting your first service is like winning your first game – it builds confidence and teaches you the process for future extractions.

4. Bring in the Orchestra Conductor

As you create more containers, managing them manually becomes like trying to conduct a symphony orchestra by shouting instructions. You need a proper conductor, and in the container world, that’s Kubernetes.

Kubernetes is a platform that automatically handles:

  • Starting and stopping your containers
  • Distributing traffic between multiple copies of the same service
  • Restarting failed containers
  • Scaling services up or down based on demand

For beginners, consider starting with simpler alternatives like Docker Compose for development, then moving to managed Kubernetes services offered by cloud providers like Google Cloud, AWS, or Microsoft Azure.

5. Automate Everything from Day One

One of the biggest mistakes teams make is leaving deployment as a manual process. This is like insisting that every product in your store be moved by hand instead of using conveyor belts and automated systems.

Set up automated pipelines that:

  • Test your code automatically when you make changes
  • Build container images without human intervention
  • Deploy to testing environments instantly
  • Notify you if anything goes wrong

This automation eliminates human error and makes deployments so routine that they become boring – which is exactly what you want.

6. Untangle the Database Web

Databases are often the trickiest part of breaking up a monolith. In our department store analogy, this is like having a single cash register system that every department has been modifying over the years.

The key principle is simple: each service should own its data. Instead of letting multiple services directly access the same database tables, establish clear boundaries. If Service A needs data from Service B, it should ask politely through an API rather than sneaking into Service B’s database.

You don’t necessarily need separate physical databases immediately, but you must enforce these ownership rules in your code. For shared data like user sessions, move them to dedicated systems like Redis that are designed for sharing.

7. Make Your System Observable

When you had one big application, finding problems was like debugging issues in a single room. With multiple services, it’s like troubleshooting problems across an entire shopping mall. You need security cameras, intercoms, and monitoring systems everywhere.

Implement comprehensive observability from the start:

  • Logging: Ensure every service writes detailed logs about what it’s doing
  • Monitoring: Track metrics like response times, error rates, and resource usage
  • Health checks: Each service should be able to report whether it’s healthy
  • Distributed tracing: Follow requests as they travel between services

Tools like Prometheus for metrics collection, Grafana for dashboards, and Jaeger for tracing make this much easier than building everything from scratch.

8. Rinse and Repeat

With your first service successfully extracted and your tooling in place, you’re ready to continue the process. Each subsequent extraction becomes easier because:

  • You’ve learned the patterns and potential pitfalls
  • Your automation pipelines are already set up
  • Your team has gained confidence and experience
  • You have monitoring and observability systems in place

Gradually, your monolith shrinks while your collection of focused, independent services grows.

The Light at the End of the Tunnel

This transformation isn’t just about technology – it changes how your entire team works. Instead of everyone working on one massive codebase and stepping on each other’s toes, different teams can own different services. The frontend team can deploy their changes without waiting for the backend team to finish theirs. The payments team can scale their service during Black Friday without affecting the recommendation engine.

Common Pitfalls to Avoid

Don’t try to do everything at once: The temptation to rewrite everything from scratch is strong, but it’s usually a mistake. Incremental change is safer and more sustainable.

Don’t ignore the human element: This transformation affects your entire team’s workflow. Invest in training and make sure everyone understands the new processes.

Don’t forget about data: Plan your database separation strategy early. It’s often the most complex part of the entire process.

Don’t skip monitoring: In a distributed system, observability isn’t optional – it’s essential for maintaining sanity.

Is This Journey Worth It?

Absolutely, but only if you’re facing the problems that containers solve. If your current setup works fine and you’re not experiencing scaling or development velocity issues, there’s no rush to change.

However, if you’re struggling with slow deployments, difficulty scaling, or teams blocking each other’s progress, containers and microservices can be transformative. You’ll gain:

  • Faster development cycles
  • Independent scaling of different components
  • Better fault isolation (one broken service doesn’t kill everything)
  • Technology flexibility (different services can use different programming languages or databases)
  • Easier team organization and ownership

The journey from monolith to containers isn’t always smooth, but with careful planning and incremental execution, it’s entirely achievable. Every successful transformation starts with a single step, and every monolith has the potential to evolve into something better.

Your users will notice faster updates and more reliable service. Your developers will thank you for making their work more enjoyable and productive. And you’ll sleep better knowing that a problem in one part of your system won’t bring down the entire application.

The path forward is clear – it’s time to start containerizing.

Localhost Tunnels

I have started using ngrok for setting up localhost tunnels directly from laptop. The basic idea is to start a web server in localhost and use ngrok to setup a tunnel to internet. This is very easy way to test local code and get it validated with other team mates. I will be looking out for an enterprise offering for this wonderful tool.

A very simple use case of ngrok is demonstrated in this video which is a tutorial for webhooks.

Nginx gzip compression and load balancing.

To tune performance of my REST endpoints in past I have enabled Gzip compression in my nginx server configuration. So technically a large json response becomes gzipped and the network latency as a result goes down.

There is a good documentation of this feature on the nginx website which does a pretty good job.

However there is a catch which prevents this technique from working on local development system (while the same config works in production linux instance). I finally found an answer as to why this doesn’t work in some of my local environment.

To do static load balancing I use the upstream concept of nginx which is documented again on the nginx website. The performance is reasonable and the implementation is quite simple for a requirement which needs a simple failover implementation. However for advanced implementation we can always go to haproxy which is very good open source load balancer.

Bye Bye Hystrix

Hystrix used to be my tool of choice for implementing circuit breakers in my Spring Boot application. However there has been literally no commits in Hystrix github repo for past 1 year. It seems Netflix has moved on to resilience4j which will be actively maintained.

Resilience4j commit log is quite active and a lot of active work is going on in it.

However it should be noted that there is another mature library named Sentinel which seems to be quite feature rich and very well supported. It has been battle tested by Alibaba which is huge. In my next project I would be considering both Sentinel and Resilience4J in my choice for a reliable circuit breaker for my application.