When you browse through a shop it’s really useful to use this Faceted Search Filters. You just choose concrete filters and "vo-a-la", the shop returns desirable result. It works smoothly and fast – you’re happy.
Different system name this multi-level filtration in different ways. The most common names:
- Faceted Search;
- Layered Navigation;
- Multiple filters.
But as developers, sometimes from a consumer, we turn into a creator and it’s our task to create that fast and smoothly Faceted Search. After several hours of googling your brain blows up trying to collect everything up. Here I’m going to sum up and provide my way of building such filtration system. I’m using MySQL, but actually it doesn’t matter you can simply convert it to any other RDB.
The task is to build Faceted Search through all company’s employees. There are tables for skills, occupations, locations, languages, etc. We need to count key metrics and return compact result.
The key moments I figured out while building my filters:
- If your data is scattered through multiple tables you’ll end up with one MySQL query per table (filter) and then UNION all queries in one. If you have a product catalogue with well-organized attributes then your approach will be a bit different and faster.
- Performance is the main trade-off. Apply as minimum JOINs as possible in getting your faceted filters.
- JOIN all tables only when getting the main result.
- If you operate tens of thousands records (or even hundreds of thousands), you don’t need an additional search engine like Solr.
- Check each query performace separately before adding them to UNION.
- Remember, you can ORDER only full result set of UNION, but no each subquery separately.
- If you choose several values of one facet then use
IN()
function in your SQL query. - If you choose values from different facets then use
AND
condition in your query. - Use ID instead of text for filtering where it’s possible. It would be possible to apply indexes here.
Facets
Let’s jump to out tables and queries. For simplifying I get rid of intermediate tables, but in the real world situation you should have them.
We use INNER JOIN
such as we’re interested only in facets that have at least one value.
skill
id | user_id | title |
---|---|---|
1 | 3 | MS Word |
2 | 4 | MS Excel |
3 | 5 | PHP |
4 | 5 | JavaScript |
5 | 5 | MS Word |
6 | 3 | PHP |
SELECT
"1" AS `position`,
"skill" AS `facet`,
s.title AS `value`,
COUNT(s.id) AS `count`
FROM `skill` AS `s`
INNER JOIN `user` AS `u` ON u.id = s.user_id
GROUP BY `s`.`title`
position | facet | value | count |
---|---|---|---|
1 | skill | MS Word | 2 |
1 | skill | MS Excel | 1 |
1 | skill | PHP | 2 |
1 | skill | JavaScript | 1 |
occupation
We have occupation field in our user table, so there is no join here.
id | name | occupation |
---|---|---|
1 | Elon Musk | Entrepreneur |
2 | Cristiano Ronaldo | Footballer |
3 | Lois Lane | Journalist |
4 | John Smith | Journalist |
5 | Clark Kent | Hero |
6 | Bruce Wayne | Hero |
7 | Peter Parker | Hero |
SELECT
"2" AS `position`,
"occupation" AS `facet`,
u.occupation AS `value`,
COUNT(u.id) AS `count`
FROM `user` AS `u`
GROUP BY `u`.`occupation`
position | facet | value | count |
---|---|---|---|
2 | occupation | Entrepreneur | 1 |
2 | occupation | Footballer | 1 |
2 | occupation | Journalist | 2 |
2 | occupation | Hero | 3 |
language
id | user_id | title |
---|---|---|
1 | 1 | English |
2 | 2 | Spanish |
3 | 2 | Italian |
4 | 3 | English |
5 | 3 | Japanese |
6 | 4 | English |
7 | 5 | English |
8 | 6 | English |
9 | 7 | Italian |
10 | 7 | English |
SELECT
"3" AS `position`,
"language" AS `facet`,
l.title AS `value`,
COUNT(l.id) AS `count`
FROM `language` AS `l`
INNER JOIN `user` AS `u` ON u.id = l.user_id
GROUP BY `l`.`title`
position | facet | value | count |
---|---|---|---|
3 | language | English | 6 |
3 | language | Spanish | 1 |
3 | language | Italian | 2 |
3 | language | Japanese | 1 |
Finally, wrap them up in a single UNION query and order by position and most frequent facets.
SELECT
`sub` .*
FROM (
(SELECT
"1" AS `position`,
"skill" AS `facet`,
s.title AS `value`,
COUNT(s.id) AS `count`
FROM `skill` AS `s`
INNER JOIN `user` AS `u` ON u.id = s.user_id
GROUP BY `s`.`title`)
UNION
(SELECT
"2" AS `position`,
"occupation" AS `facet`,
u.occupation AS `value`,
COUNT(u.id) AS `count`
FROM `user` AS `u`
GROUP BY `u`.`occupation`)
UNION
(SELECT
"3" AS `position`,
"language" AS `facet`,
l.title AS `value`,
COUNT(l.id) AS `count`
FROM `language` AS `l`
INNER JOIN `user` AS `u` ON u.id = l.user_id
GROUP BY `l`.`title`)
) AS `sub`
ORDER BY
`position` ASC,
`count` DESC
position | facet | value | count |
---|---|---|---|
1 | skill | MS Word | 2 |
1 | skill | PHP | 2 |
1 | skill | MS Excel | 1 |
2 | occupation | Hero | 3 |
2 | occupation | Journalist | 2 |
2 | occupation | Entrepreneur | 1 |
2 | occupation | Footballer | 1 |
3 | language | English | 6 |
3 | language | Italian | 2 |
3 | language | Spanish | 1 |
3 | language | Japanese | 1 |
Main query
The main query should join all tables, otherwise you wont be able to apply your filters and return correct result.
All written here make sense only in this example. If your situation is different and you use Solr, caching, conditional table joining, etc. the approach could be change. But I recommend to start from simpler solution and then improve it.
To get full result set of employees the SQL query would have the next structure:
SELECT u.*
FROM user AS u
LEFT JOIN skill AS s ON s.user_id = u.id
LEFT JOIN language AS l ON l.user_id = u.id
There is no join with the user table itself for occupation, because we’ll apply filters directly to user table.
When user select any facets through interface, for example "skill=PHP", you just apply required conditions and return the result.
SELECT u.*
FROM user AS u
LEFT JOIN skill AS s ON s.user_id = u.id
LEFT JOIN language AS l ON l.user_id = u.id
WHERE s.title = "PHP"
GROUP BY u.id
Facets filtering
It’s obvious when user apply any filters you need recalculate your facet filters. To do so, we can filter out only user.id
from the main query above and pass them to each of our facet query separately. For example, the query to skill table would be:
SELECT
"1" AS `position`,
"skill" AS `facet`,
s.title AS `value`,
COUNT(s.id) AS `count`
FROM `skill` AS `s`
INNER JOIN `user` AS `u` ON u.id = s.user_id
WHERE u.id IN(5,3)
GROUP BY `s`.`title`
position | facet | value | count |
---|---|---|---|
1 | skill | MS Word | 2 |
1 | skill | PHP | 2 |
1 | skill | JavaScript | 1 |
The base implementation is ready and you can extend it as you want.
Useful links
- Faceted search, why the DevAPI could matter one day
- Multiple-filters real-time implementation with MySQL
- Faceted Search (solr) vs Good old filtering via PHP?
- Calculating product counts (efficiently) in faceted search with PHP, MySQL
- Doctrine and Generated Columns
- Can Doctrine define MySQL’s Generated columns?