Faceted Search or how to build multi-level filtration on MySQL

Table of Content

When you browse through a shop it’s really useful to use this Faceted Search Filters. You just choose concrete filters and "vo-a-la", the shop returns desirable result. It works smoothly and fast – you’re happy.
Different system name this multi-level filtration in different ways. The most common names:

  • Faceted Search;
  • Layered Navigation;
  • Multiple filters.

But as developers, sometimes from a consumer, we turn into a creator and it’s our task to create that fast and smoothly Faceted Search. After several hours of googling your brain blows up trying to collect everything up. Here I’m going to sum up and provide my way of building such filtration system. I’m using MySQL, but actually it doesn’t matter you can simply convert it to any other RDB.

The task is to build Faceted Search through all company’s employees. There are tables for skills, occupations, locations, languages, etc. We need to count key metrics and return compact result.

The key moments I figured out while building my filters:

  • If your data is scattered through multiple tables you’ll end up with one MySQL query per table (filter) and then UNION all queries in one. If you have a product catalogue with well-organized attributes then your approach will be a bit different and faster.
  • Performance is the main trade-off. Apply as minimum JOINs as possible in getting your faceted filters.
  • JOIN all tables only when getting the main result.
  • If you operate tens of thousands records (or even hundreds of thousands), you don’t need an additional search engine like Solr.
  • Check each query performace separately before adding them to UNION.
  • Remember, you can ORDER only full result set of UNION, but no each subquery separately.
  • If you choose several values of one facet then use IN() function in your SQL query.
  • If you choose values from different facets then use AND condition in your query.
  • Use ID instead of text for filtering where it’s possible. It would be possible to apply indexes here.

Facets

Let’s jump to out tables and queries. For simplifying I get rid of intermediate tables, but in the real world situation you should have them.

We use INNER JOIN such as we’re interested only in facets that have at least one value.

skill

id user_id title
1 3 MS Word
2 4 MS Excel
3 5 PHP
4 5 JavaScript
5 5 MS Word
6 3 PHP
SELECT
    "1" AS `position`,
    "skill" AS `facet`,
    s.title AS `value`,
    COUNT(s.id) AS `count`
FROM `skill` AS `s`
INNER JOIN `user` AS `u` ON u.id = s.user_id
GROUP BY `s`.`title`
position facet value count
1 skill MS Word 2
1 skill MS Excel 1
1 skill PHP 2
1 skill JavaScript 1

occupation

We have occupation field in our user table, so there is no join here.

id name occupation
1 Elon Musk Entrepreneur
2 Cristiano Ronaldo Footballer
3 Lois Lane Journalist
4 John Smith Journalist
5 Clark Kent Hero
6 Bruce Wayne Hero
7 Peter Parker Hero
SELECT
    "2" AS `position`,
    "occupation" AS `facet`,
    u.occupation AS `value`,
    COUNT(u.id) AS `count`
FROM `user` AS `u`
GROUP BY `u`.`occupation`
position facet value count
2 occupation Entrepreneur 1
2 occupation Footballer 1
2 occupation Journalist 2
2 occupation Hero 3

language

id user_id title
1 1 English
2 2 Spanish
3 2 Italian
4 3 English
5 3 Japanese
6 4 English
7 5 English
8 6 English
9 7 Italian
10 7 English
SELECT
    "3" AS `position`,
    "language" AS `facet`,
    l.title AS `value`,
    COUNT(l.id) AS `count`
FROM `language` AS `l`
INNER JOIN `user` AS `u` ON u.id = l.user_id
GROUP BY `l`.`title`
position facet value count
3 language English 6
3 language Spanish 1
3 language Italian 2
3 language Japanese 1

Finally, wrap them up in a single UNION query and order by position and most frequent facets.

SELECT
    `sub` .*
FROM (
        (SELECT
            "1" AS `position`,
            "skill" AS `facet`,
            s.title AS `value`,
            COUNT(s.id) AS `count`
        FROM `skill` AS `s`
        INNER JOIN `user` AS `u` ON u.id = s.user_id
        GROUP BY `s`.`title`)
    UNION
        (SELECT
            "2" AS `position`,
            "occupation" AS `facet`,
            u.occupation AS `value`,
            COUNT(u.id) AS `count`
        FROM `user` AS `u`
        GROUP BY `u`.`occupation`)
    UNION
        (SELECT
            "3" AS `position`,
            "language" AS `facet`,
            l.title AS `value`,
            COUNT(l.id) AS `count`
        FROM `language` AS `l`
        INNER JOIN `user` AS `u` ON u.id = l.user_id
        GROUP BY `l`.`title`)
    ) AS `sub`
ORDER BY
    `position` ASC,
    `count` DESC
position facet value count
1 skill MS Word 2
1 skill PHP 2
1 skill MS Excel 1
2 occupation Hero 3
2 occupation Journalist 2
2 occupation Entrepreneur 1
2 occupation Footballer 1
3 language English 6
3 language Italian 2
3 language Spanish 1
3 language Japanese 1

Main query

The main query should join all tables, otherwise you wont be able to apply your filters and return correct result.

All written here make sense only in this example. If your situation is different and you use Solr, caching, conditional table joining, etc. the approach could be change. But I recommend to start from simpler solution and then improve it.

To get full result set of employees the SQL query would have the next structure:

SELECT u.*
FROM user AS u
LEFT JOIN skill AS s ON s.user_id = u.id
LEFT JOIN language AS l ON l.user_id = u.id

There is no join with the user table itself for occupation, because we’ll apply filters directly to user table.

When user select any facets through interface, for example "skill=PHP", you just apply required conditions and return the result.

SELECT u.*
FROM user AS u
LEFT JOIN skill AS s ON s.user_id = u.id
LEFT JOIN language AS l ON l.user_id = u.id
WHERE s.title = "PHP"
GROUP BY u.id

Facets filtering

It’s obvious when user apply any filters you need recalculate your facet filters. To do so, we can filter out only user.id from the main query above and pass them to each of our facet query separately. For example, the query to skill table would be:

SELECT
    "1" AS `position`,
    "skill" AS `facet`,
    s.title AS `value`,
    COUNT(s.id) AS `count`
FROM `skill` AS `s`
INNER JOIN `user` AS `u` ON u.id = s.user_id
WHERE u.id IN(5,3)
GROUP BY `s`.`title`
position facet value count
1 skill MS Word 2
1 skill PHP 2
1 skill JavaScript 1

The base implementation is ready and you can extend it as you want.

Useful links

Leave a Reply

Your email address will not be published. Required fields are marked *