September 07 2017, 23:02

Has anyone seen implementations of faceted search with dynamically calculated ranges? For instance, there’s a product price and a million products. After the search, only products in the range from 10 to 100 rubles are displayed. Can the system intelligently divide this range into several sub-ranges and display them in facets? Has anyone seen such a thing online? Usually, in search engines, you need to set ranges fixedly, and for example with prices, this doesn’t work. Is this more convenient than a slider in facets, which overall could play the same role?

UPDATE: moved from comments:

A very raw idea I have is this: make two additional queries: with sorting by price one way, the other way, and in both cases, take the first element. We’ll get a price range from minimum to maximum. Next, divide this range into N parts, and for each part, make a separate query to get the number of results (to display in parentheses). In the simplest case, this N is the maximum number of price groups. If in some cases of N we get zeros in the groups in the middle (the edges will definitely not have zeros), then we combine this group with the neighboring one, until the zero disappears. If performance allows, we can take a larger N, and already on the client side, combine results into a small number of groups, but in such a way that each group has approximately the same number of elements. Thus, the ranges we get will be uneven in boundaries, but uniform in content.

A small update. If you simply generate equal ranges by dividing the overall spread by N, then users get ugly ranges, with fractional numbers and such. Therefore, if Solr’s performance allows, you can also send requests for rounded ranges and compare results with the first. If the maximum spread between groups has increased by more than X%, then we make the rounding for this pair of groups weaker. Otherwise, we try rounding at a higher level.

For example, we get 10 numbers in the range from 1 to 3, and another five in the range from 56 to 57. The full range is from 1 to 57. Let’s take N=20. We make 20 requests to Solr with ranges 1..3.85, 3.85…7.7… and so on. We get 3 numbers in the first range and five numbers in the last, and in the others – zeros. We round 3.85 to 4. We make two requests (from zero to 4 and from 4 to 7.7). We get again 3 and 0 respectively. The spread from the previous ones is zero, so we keep the rounded value (4). We do the same crap with 53.15..57 – it rounds to 53..57. It results in three ranges (0,4), (4,53), (53, 57). The first and last have positive values, so we display them. The middle one has zeros, no need to display. Thus, we display facets 0..4 and 53..57. The downside here is that we can’t make 53 into 56 without increasing N. Question is, do we need to.

I have an idea to use a slider, and represent the ranges in the form of a histogram. Then what’s described above fits perfectly with the concept of histograms. N determines the number of columns in the histogram. Uneven numbers in ranges are no longer a problem.

Apparently, this is the prototype I will make. The histogram looks like an excellent solution.

Leave a comment