The Risk of Cleaning Up Someone Else’s Mess

Sometimes faulty data has become valid data — and fixing it makes things worse.
A data engineer told me a story the other day.
They were doing some routine cleanup on a core database: migrations, column checks, the usual. Nothing fancy.
While doing his work, he spotted a something — one column in one table contained text with broken encodings. You’ve seen this before in issue #006 of this newsletter:
“We meet at the café.”
Annoying. But not hard to fix once you know what caused it. He traced the issue back to a CSV parser with the wrong encoding setting. Easy enough. His fix worked, and the text came through cleanly. Downstream, the code no longer had to convert the junk every time.
Job done. Everyone's happy, right?
Not quite.
Because his code wasn’t the only thing downstream. Other teams, other scripts, other dashboards were quietly depending on that very same broken data. Once the column was “fixed,” some of those flows broke.
The lesson was sharp:
You don’t just change a data source because you can. You change it only if you’re absolutely sure you know every place that data touches. And that’s almost never the case.
Why this matters beyond databases
This isn’t limited to database systems. I've seen many companies relying on a complex and often rather fragile workflow. A CSV export is opened in Excel, tweaked by a macro, emailed around, and after some more changes, uploaded to a dashboard. And someone in management relies on this dashboard for crucial business decisions.
So when you “clean it up,” you may not be fixing the system. You may be breaking the only version of it that works.
Data cleaning isn’t an end in itself; it only matters in context.
Three takeaways
- Start clean. The cheapest time to get encodings, formats, and structures right is at the point of entry. Garbage in, garbage everywhere.
- Think twice before cleaning. If the messy state has been stable for a long time, odds are someone relies on it. Understand the flow before you touch it.
- Document the mess. If you leave it in place, don’t just shrug. Write down why it’s wrong and why it stays that way. That saves the next well-meaning fixer (or your future self) from repeating the same mistake.
Or, in the engineer’s own words: Don’t fix it if it isn’t your job to fix it.
Messy data is tempting to “tidy up.” But in real systems, correctness isn’t absolute. Sometimes the wrong state has become the working state. The smarter move is knowing when to fix it, when to leave it — and making that choice visible.
Thanks for reading,
Stefan
🧮 The Missing Number
0 – A leading zero in account numbers broke Deutsche Bank’s new online banking system this summer. Most likely, somewhere along the data flow, the number was treated as an integer — and the zero quietly disappeared.