The Otter Problem: When 350,000 Becomes Just 350

This is issue #005 of The Missing Header

About a year ago, I asked the AI search engine Perplexity a simple question:
“How many otters live in Europe?”

The answer?
350.

Now, the European otter is endangered in some regions — but not that endangered. There are far more than just 350 roaming rivers and wetlands across the continent.

So what happened?

A glance at the cited source revealed the problem immediately:
I had asked in German, so Perplexity pulled a German-language page that said:

„mehr als 350.000 Exemplare“
(“more than 350,000 specimens”)

Oops.

You see, in most of Europe, a period is used as a thousands separator, and a comma is used as a decimal separator. Perplexity stopped reading at the period and gave me 350 otters instead of the actual number: 350,000.

, . ; — Why We Can't Have Nice CSVs

This may sound like a harmless oddity, but these regional differences wreak havoc in the world of data — especially, but not only, when it comes to CSV files.

How would you read the number 1,234?

In English, that's one thousand two hundred thirty-four.
In German or French, that's one point two three four — a floating-point number.

Now imagine a CSV file containing European-style measurements, like this:

Object;Length (m)
Otter;1,234

In this case, 1,234 means one meter and 234 millimeters — not over a kilometer long!

To avoid confusion, many European CSVs use a semicolon (;) or tab as the separator, because the comma is already used by the number format. If commas were used for both fields and decimal values, you'd have to wrap all affected fields in quotes:

Object,Length (m)
Otter,"1,234"

This quickly becomes messy and hard to scan — especially when dealing with many columns and mixed formats.

Not Just Europe

I said “Europe uses commas,” but even that’s too simple:

  • The UK uses periods.
  • The German-speaking part of Switzerland uses periods, while the French-speaking part uses commas.
  • In Canada, it’s mixed — depending on region and language.

Don’t Let Numbers Perplex You

  1. Always inspect data carefully when it comes from another locale. Try to see how numbers are formatted before trusting them.
  2. Don’t use thousands separators in CSVs — they’re for human readability, not machine parsing.
  3. And most of all: Be aware. These formatting differences are subtle, but they matter.

If Perplexity had been more aware of how numbers are written in different regions, it wouldn’t have stopped at 350. It would have given me the full picture — all 350,000 of those delightful European otters.

🧮 The Missing Number

Yes, this issue was unusually… numeric. I’ll try to calm down. ;-)

Thanks for reading,
Stefan

PS: You may be receiving this newsletter as you subscribed to the Tablecruncher Newsletter some time ago. I’ve rebranded it to avoid confusion with the software project. Same author, same scope — still all about solving messy data problems.

Read more