Your Database is Blind to Short Words

Your Database is Blind to Short Words

MySQL’s hidden cutoff decides what’s visible — and what isn’t.

The other day I tried to find a souvenir online. I typed I love NY into the shop’s search field — and got a vast load of results for all kind of products with "love" printed on them, but none about New York…

Turns out, it wasn’t the city that was missing. It was the index.

The Hidden Cutoff

Most small or mid-sized websites use their database’s built-in full-text search instead of a real search engine like Elasticsearch.
It’s convenient — you can just add a FULLTEXT index to your MySQL table and you’re done.

But MySQL quietly decides what counts as a “word.”
On MyISAM tables, words shorter than four characters are ignored (ft_min_word_len = 4).
On InnoDB, the limit is three characters (innodb_ft_min_token_size = 3).

That means “mice” might be searchable, but “cat” might not — depending on which engine your table uses.
Short words simply never make it into the index, so when someone searches for them, the database has nothing to match against.
No warning. Just “no results.”

The good news: these limits can be tuned. You can lower the minimum token length to 2 or even 1 character if your content needs it.
The bad news: few people do.

Why? Because the people implementing the search often don’t control the database.
Developers test on their local machine, where they can tweak settings freely.
But production runs on a managed hosting service — a different MySQL server, with untouched defaults.
When the site goes live, searches suddenly break for short terms like NYUK, or AI, and no one knows why.

And fixing it isn’t trivial.
Changing the minimum token size means reconfiguring the database server — and restarting it.
Then every full-text index must be rebuilt.

MySQL’s documentation puts it casually:

“Then restart the server and rebuild your FULLTEXT indexes.”

In practice, that’s a nightmare.
The developers can’t restart the database; the admins don’t want downtime; and management wonders why anyone would risk production for two-letter words.

So the defaults stay.
And the search keeps pretending that “NY” doesn’t exist.

Beyond Websites

I once worked on a project where we had to match a table with millions of rows against itself based on text similarity.
A traditional SQL join would’ve meant trillions of comparisons — mathematically correct but practically impossible.
By introducing a MySQL full-text index and using MATCH ... AGAINST, we reduced the runtime from “never finishes” to a few minutes.
Suddenly, the task was not only possible but routine.

“Analyzing data” often really means “searching data.”
And for huge datasets, full-text indexing can turn an impossible search into an efficient one — if you know how it works and where its limits are.


Indexes define what’s visible.
If your index ignores short words, your search — and your analysis — will ignore them too.
Not because they’re gone, but because the system decided they don’t matter.

Full-text search is powerful, but only if you tame it.
Know its thresholds.
Know who controls the server.
And never assume that “no results” means “nothing there.”

Thanks for reading,
Stefan


🧮 The Missing Number

4 – Because MySQL thinks anything shorter isn’t worth indexing.