How to Confuse Everyone with Just One Word

This is issue #004 ofThe Missing Header — formerly known as the Tablecruncher Newsletter*

Recently, I got a CSV file with column names like stateamount, and the ever-popular date. Even after looking at the content for a while, their meaning wasn’t obvious.
What kind of “date” are we talking about? The creation date of a record? A last modified timestamp? Someone’s birthday? Without deep domain knowledge, it’s impossible to tell.

The same goes for state: Is it the status of a process? A U.S. state? The condition of a machine? And what does amountrefer to — money, weight, CO₂ emissions?

Here’s a simple rule: Use column names that explain themselves.
CSV files don’t support comments. So your best shot at communicating meaning is to use speaking column names. Instead of date, go with something like last_modified_date, or even better, last_modified_date_of_contract. And instead of state, try status_of_quality_check or review_state.

While we’re at it:
Please don’t use duplicate column names.
Yes, CSV allows it. But many tools — from databases to import scripts — don’t handle duplicates well. You’ll get unexpected results or hard-to-track bugs. Worse, someone else might silently lose data when importing your file.

Sure, you could write a separate README explaining your columns. But those sidecar files tend to disappear over time. And then, years later, someone opens the file and finds a lone date column staring back at them, void of meaning.

This advice doesn’t just apply to CSV files. The same principles hold for Excel sheets or JSON files. Good naming is your best — and often your only — form of documentation.

🧮 The Missing Number

232 millions — largest number of lines of a CSV file opened with Tablecruncher (as far as I know)

Thanks for reading,
Stefan

* I’ve rebranded the newsletter to avoid confusion with my CSV app. It’s now The Missing Header. Same author, same focus — all about the weird and wonderful world of tabular data.