Typical Misconceptions About CSV (That Bite You Later)

Hey there – and welcome to the very first issue of The Missing Header!

Formerly known as the Tablecruncher Newsletter, I’ve rebranded it to avoid confusion with the software project. Same author, same scope — still all about solving messy data problems.

I’m really glad you’re here. We’ll dive into the quirks, pitfalls, and head-scratchers of working with everyday data – starting today with the humble, messy CSV.

CSV files are everywhere. They're the duct tape of data exchange—lightweight, readable, and seemingly simple.

But "simple" is not the same as "well-defined."

Over the years, Tablecruncher users have told me again and again: working with CSV files often feels like debugging someone else's assumptions. And most of those assumptions come from five common misconceptions.

Let’s bust them.

Myth 1: CSV is a well-defined standard

You might think there’s a proper standard for CSV files. And there is an RFC (number 4180, for the curious), but even that document admits: “There is no formal specification in existence.”

The RFC doesn’t define how a CSV should be written—just how it’s usually written. That means you can’t assume a CSV file you receive follows any universal rulebook. Different tools, different creators, different results.

🧠 Takeaway: Always ask yourself: Who created this CSV, and what rules were they following (if any)? When in doubt, peek inside before loading.

Myth 2: CSV files always have a header row
Headers? Sometimes. But definitely not always.

While it’s good practice to include a header row—especially for files shared between humans—not all CSVs follow this convention. Worse, some files pretend to have headers but actually include multiple “header-like” rows, or even none at all.

🧠 Takeaway: Never assume the first row is a header. And if you’re automating imports, always give users a choice to skip or define headers explicitly.

Myth 3: All CSV files use commas as separators

The “C” in CSV does stand for comma, but… not so fast.

In many parts of the world—think Germany, France, Spain—the comma is used as a decimal separator. So in those contexts, using commas as column delimiters makes the file nearly unreadable. That’s why semicolons, tabs, or even pipes (|) often sneak in as field separators.

🧠 Takeaway: Just because it’s called a CSV doesn’t mean it uses commas. Always detect or confirm the delimiter before parsing.

Myth 4: Columns have well-defined types

If you’re used to Excel, it’s tempting to think of each cell as having a type—number, date, currency, whatever. But CSV doesn’t work that way.

CSV files contain no metadata at all. A number like 2024 could be a year, a product ID, or just text. It’s all just strings—what you see is what you get.

🧠 Takeaway: Type awareness is your job, not the file’s. Be cautious when importing CSVs into databases or tools that auto-guess types—they often guess wrong.

Myth 5: There are other types in a CSV than strings

Let’s underline that last point: a CSV has only strings.

Yes, it might look like a number or a date. But under the hood, it's just characters in a row. "123" is not the number 123—it’s the characters 1, 2, and 3. Want to sort or filter? You need to tell your tool how to interpret the data.

🧠 Takeaway: Treat every value in a CSV as a string and tell your toolset how to interpret these strings. Otherwise, things like sorting 10 before 2 will ruin your day.

🧮 The Missing Number

4180 – The number of the RFC that tries to define what a CSV is… and kind of gives up.

One more thing…

If you've ever had a "Wait, what?!" moment while opening a CSV file, you're not alone. The Missing Header is here to explore these everyday data messes—and how to handle them.

What’s your favorite data failure you’ve come across?
I’d love to hear your story—just hit reply.

Thanks for reading,
Stefan