Broke in Production

 

The Day I Couldn’t Answer the Simplest Production Question

There’s a question every developer asks instinctively when something breaks:

“Is this also broken in production?”

It’s not a sophisticated question. It doesn’t require deep observability, distributed tracing, or a fancy dashboard. It’s the most basic sanity check in software development.

And yet, in some environments, answering it is surprisingly hard.


The Setup

You’re working in a dev environment. You make a change—or maybe you just stumble into a bug—and something clearly isn’t working.

Your next step should be obvious:

“Let me check production and see if this already exists.”

In a healthy engineering environment, this is trivial:

  • You log into production (or a read-only version of it), or
  • You hit the same endpoint in prod, or
  • You use a staging environment that mirrors production and runs the exact same version

Within minutes, you know:

  • ✅ “Yes, this is already broken in prod”
  • ✅ “No, this is something I just introduced”

That answer drives everything that follows.


The Reality

But instead of that simple flow, you hit something very different:

  • You don’t have access to production
  • There is no clearly defined production-equivalent environment
  • Environments are owned by different teams
  • Each environment may be running a different version
  • The configuration between environments is unknown or inconsistent

So now your simple question becomes an organizational problem instead of a technical one.

You start asking:

  • “Which environment is closest to production?”
  • “Who owns that environment?”
  • “What version is actually running there?”
  • “Can I get access?”
  • “Can I deploy my code there?”
  • “Will it even behave the same way?”

At some point, you may even consider rebuilding your code and pushing it into an environment you can access—just to approximate production behavior.

And then comes the worst part:

You don’t actually trust the answer you get.


When Environments Become Fiction

In theory, environments exist to increase confidence:

  • dev → test → staging → prod

In practice, when they diverge in code, configuration, ownership, and access, they stop being useful.

They become fictional representations of reality.

You’re no longer verifying behavior—you’re guessing.

  • “This should be like production…”
  • “This is probably close enough…”
  • “I think this is the same version…”

But “probably” is not good enough when debugging.


The Missing Capability

What’s missing here isn’t some advanced tooling. It’s a basic capability:

The ability for a developer to independently run or access the exact version of the application that is currently in production.

That’s it.

No coordination. No deployment gymnastics. No approval loops.

Just:

“Give me the thing that’s in production, and let me see what it does.”


Why This Matters More Than It Seems

At first glance, this might feel like a minor inconvenience. It’s not.

When developers cannot quickly answer, “Is this broken in prod?”:

1. Debugging Slows Down Dramatically

Every issue requires:

  • coordination
  • validation across environments
  • second-guessing results

Minutes turn into hours.


2. Confidence Drops

If environments don’t match production:

  • Fixes feel risky
  • Testing feels unreliable
  • Developers lose trust in the system

3. Ownership Gets Blurred

When access is restricted and environments are siloed:

  • Developers depend on other teams for basic checks
  • Responsibility becomes fragmented
  • Progress stalls behind organizational boundaries

4. Incidents Become Harder to Reason About

During actual outages, the inability to quickly reproduce production behavior becomes a serious liability.


How This Happens

This situation usually isn’t intentional. It emerges over time:

  • Environments are created by different teams
  • Configuration differences accumulate
  • Access gets locked down “for safety”
  • Deployment pipelines diverge
  • Ownership boundaries harden

Eventually:

No single place reliably represents production, and no single person can easily access it.


The Simple Fix (Not Easy, But Simple)

You don’t need to rebuild your platform to address this issue.

You just need to restore a core invariant:

There must be a place where developers can access or run the exact code that is in production.

That can take a few forms:

  • Read-only access to the production system
  • A staging environment is automatically deployed from the same artifact as production
  • The ability to spin up the production version on demand

The implementation may vary. The requirement does not.


The Bottom Line

If a developer cannot answer:

“Is this broken in production?”

quickly, independently, and with confidence—
Then the development environment is missing something fundamental.

This isn’t about maturity, scale, or tooling sophistication.

It’s about preserving a basic feedback loop:

  • observe behavior
  • compare environments
  • make informed decisions

When that loop breaks, everything downstream becomes slower, riskier, and more frustrating.

And all because a simple question became unnecessarily hard to answer.

Comments

Popular posts from this blog

Wolters Kluwer

Contingent worker - Code Games

The 5 AM Contract