Decoding Complex Queries: A Transformative Approach to Search Functionality | December 17 2025, 03:25

Oh, I just solved a really cool problem. It’s tricky to explain though. But I’ll try.

So, the client has 10 search websites. They all use one index but throw different queries at it. To what the user enters, a very long and complex query is added, generated by a module on Sitecore. It includes template and page IDs that need to be included or excluded. Ultimately, it’s impossible to understand what’s going on there. There could be ten opening brackets and some randomly closing ones, but it worked with Coveo. Reformatting helped, but not much.

And each site has its own version of this. Meanwhile, the same IDs appear periodically. I first tried to manually figure this out, but it was a nightmare. Nothing helped. There are also nested conditions. For example, “exclude this template” not globally, but only if that field equals one.

Here’s what I did:

I wrote a script that parses this textual “mess” into an abstract syntax tree (AST). This allowed to turn an unreadable string into a structured JSON object, where it’s clear: here’s AND, there’s OR, and here — a specific condition.

Then I turned these conditions into Boolean algebra formulas. Using the SymPy library, I “fed” these formulas to simplification algorithms. Mathematics itself eliminated duplicates, collapsed excessive nesting, and removed conditions that are logically absorbed by others. As a result, the “trees” became flat and understandable.

In the attachment — the original tree and the simplified one.

To be sure that I didn’t break anything during simplification, I wrote a test generator. It takes the simplified logic, puts it back into a working curl, and checks whether the number of found documents (totalCount) matches the original request. The numbers matched — meaning, the logic is preserved 100%.

Having simplified and standardized structures for each site in hand, I built a comparison matrix. The script analyzed them and highlighted Common Core — conditions that are guaranteed to be required (or prohibited) on all sites without exception, and Specifics — unique “tails” that distinguish one site from another.

In the attached screenshot: REQ means that the condition is guaranteed to be met for any document that goes through this request. NOT — definitely not met. OPT — the condition is present in the request, but it’s not strict by itself. It only works in conjunction with something else. “.” — the condition is not mentioned in the request at all.

For 3 sites it responds instantly, for 10 it takes about 30 minutes.

And of course, all data in all screenshots are thoroughly obfuscated.

Leave a comment