Hey! I recorded a video course!
If you like this article, you will also like the course! Check it out here!
Hey folks, welcome back to another blog post!
Did you know that:
When you go into the bathroom, you're American. When you come out of the bathroom, you're American. But what are you IN the bathroom? European!
🔗 The Problem
In previous articles, I wrote about how to do an efficient name search in Postgres using
GIN indexes. These work great, but once your user base becomes international, you have to deal with all the weird things other cultures add to their words. Like adding little ticks, arrows, dots, and waves above them, like
The problem with these special symbols is that they break your text search. For example, if I want to search for
José in my application, I might only type
Jose (without the accent). But Postgres treats
e differently and won’t return a user called
José. For Postgres,
José are not the same.
This is bad UX and not what we want. We want to return
José even though the search term was
Jose, so what can we do?
🔗 The Solution
Luckily, Postgres offers a function called unaccent for exactly this situation. It allows Postgres to treat words with and without accent equally (e.g.
e). Let’s see how we can use it in Ecto.
🔗 The Setup
This is our search function without
def search(name) do query = from( p in Person, where: ilike(p.name, ^search_term) ) Repo.all(query) end
This function won’t return a person with the name
José if we search for
Jose. Likewise, it won’t return a person with the name
Jose (no accent) if we search for
José (with an accent). Let’s fix the function.
🔗 The Migration
First, we need to add a new extension to Postgres. Add the following to a migration of your choosing:
def change do execute( "CREATE EXTENSION IF NOT EXISTS unaccent;", "DROP EXTENSION IF EXISTS unaccent;" ) end
If you run the migration, it will add the
unaccent extension to your Postgres instance. From now on, we can use the function
unaccent in our query.
🔗 The slow Solution
Now, let’s update our query function from above. We simply have to wrap the person’s name and the search term with
unaccent and Postgres will take care of the rest. This is how our search function looks now:
def search(name) do query = from( p in Person, where: fragment( "unaccent(?) ILIKE unaccent(?)", p.name, ^search_term ) ) Repo.all(query) end
From now on, our
search/1 function will also return a person called
José when we search for
josé. It will also return a person called
Jose (unaccented) if we search for
Josè (accented). Great success!
🔗 The Fast Solution
Sadly, there is one problem with our solution above: It’s slooooooooow 🐌.
If you have read my post about efficient name search, you know that we can speed up our query significantly by adding a
GIN index to it. Let’s add a GIN index but for the unaccented version of the names in our database.
First, we need to create an immutable version of Postgres’
unaccent function. If you use a function in a Postgres index, it must be IMMUTABLE. That means that the function cannot alter the database and will always return the same value given the same arguments.
unaccent function is not
SOLID. That means that it can’t change the database and will always return the same result given the same arguments, but its result might change across SQL statements. Imagine that I want to fetch the current time. That function would rely on the
timezone of my database. If I fetch the time once, change my database timezone, and fetch the time again, the results will change. The same goes for the
unaccent function because it depends on the dictionary that I use in my database. If I change my dictionary, the result of
unaccent might change, and therefore the function is only
SOLID and not
That means that we can’t use
unaccent directly when creating the index. However, we can create an
IMMUTABLE version of the function if we’re certain to never change the dictionary of our database (which you probably won’t anyways). Let’s do that now.
In our migration after your
CREATE EXTENSION ... call, add this:
execute( """ CREATE OR REPLACE FUNCTION f_unaccent(text) RETURNS text LANGUAGE sql IMMUTABLE PARALLEL SAFE AS $func$ SELECT public.unaccent('public.unaccent', $1) $func$; """, "DROP FUNCTION IF EXISTS f_unaccent(text);" )
We have now an immutable function called
f_unaccent which we can use to generate a
GIN index of our
person.name field. Add this after the snippet above:
execute( "CREATE EXTENSION IF NOT EXISTS pg_trgm;", "DROP EXTENSION IF EXISTS pg_trgm;" ) execute( """ CREATE INDEX persons_name_unaccented_gin_trgm_idx ON persons USING gin (f_unaccent(name) gin_trgm_ops) """, "DROP INDEX IF EXISTS persons_name_unaccented_gin_trgm_idx;" )
The first statement will add the Postgres Trigram extension to your Postgres database. We need this extension for creating the
GIN index in the second statement.
The second statement creates a
GIN index for the unaccented value in our
person.name field. That means that our
search/1 query above will now use a much faster
GIN index for searching for the correct person.
As a last step, we need to update our
search/1 function to use the new
f_unaccent(text) function instead of the built-in
unaccent function. This is the updated version of our function:
def search(name) do query = from( p in Person, where: fragment( "f_unaccent(?) ILIKE f_unaccent(?)", p.name, ^search_term ) ) Repo.all(query) end
If you want to double-check that your function uses the new index, add this code to the
def search(name) do query = # build your query # Add this 👇 Ecto.Adapters.SQL.explain(Repo, :all, query) |> IO.inspect() Repo.all(query) end
When you run the
search/1 function, you will see something like this in your terminal:
Bitmap Heap Scan on persons p0 Recheck Cond: (f_unaccent((name)::text) ~~* '%jose%'::text) -> Bitmap Index Scan on persons_name_unaccented_gin_trgm_idx Index Cond: (f_unaccent((name)::text) ~~* '%jose%'::text)
If you see this, congratulations! Your search function now uses the
GIN index for searching for the unaccented version of the name.
🔗 The Caveats
As always, there are caveats to this approach. If you’re interested into the nitty-gritty details of how this works and what other solutions exist, I can recommend the following resources:
- Does PostgreSQL support “accent insensitive” collations?
- Using ILIKE with unaccent and with only right end wildcard
- Putting unaccent in existing query
And that’s it! I hope you enjoyed this article! If you want to support me, you can buy my book or video course. Follow me on Twitter or subscribe to my newsletter below if you want to get notified when I publish the next blog post. Until the next time! Cheerio 👋