Programming – Page 5 – Hi, I'm Rauf Aliev.

From Vision to Bookshelf: Launching “Recommender Algorithms” | October 13 2025, 11:54

Finally, I have released a book! It is called Recommender Algorithms — it contains more than 50 recommendation algorithms with mathematical explanations, detailed descriptions, and code examples.

It all started early in the spring in Germany, when I attended the ACM conference and made the first sketches of the book’s structure, analyzing reports on the RecSys stream. And now, six months later, the book has been published.

Why did it appear? Because there is no single, accessible source either online or in print where the recommendation algorithms of various types and purposes are thoroughly examined. There are articles focused on narrow aspects, but to collect and systematize the developments — from fundamental to the most recent — until now, it seems, no one has managed to do it for some reason. Maybe no one needed to. Suddenly, I found I needed to. I don’t know if I succeeded, but I am eager for your feedback.

Available on Amazon and Barnes and Noble. There is a Russian automatic translation (surprisingly, but very decent), but I do not know how to sell it yet.

https://www.testmysearch.com/books/recommender-algorithms.html?FB

(This is not my only book, but today — just about this one.)

Decoding Solr and Lucene: Engineering Insights and Algorithms | October 06 2025, 17:11

Preparing a book for publication on Solr&Lucene. What do you think about publishing such a translation on Amazon? 🙂

The book is about algorithms and under-the-hood engineering. I haven’t seen books from this angle yet, maybe someone will find it interesting.

Introducing the AI-Powered Text-to-Diagram Generator | September 30 2025, 20:57

While working on a book, I realized what kind of product I’m missing. It’s an AI diagram generator based on textual descriptions.

The idea is that the master document for the diagram is text. This textual description can be (and should be) quite detailed, so the generated diagram exactly matches the author’s vision. The diagram itself is not edited. That is, it can be edited – moving circles around, but ideally, after making changes, the system should update the text, generating from which will result in what the user adjusted.

The result — the diagram — should correspond as closely as possible to the description. If it does not match the description because, for example, it’s impossible to make a triangle with three obtuse angles, the system should do its best and provide a verbal response about what didn’t work. The user can then modify the task so that the system complies and produces the diagram correctly.

But then we understand that the author might have randomly achieved something that they liked with their flawed text. And if regenerated, it might turn out differently, and not necessarily better. Therefore —

You could ask the system to generate a diagram description from the diagram, which, if inputted back into the diagram generator, would result exactly in what the description was generated from. Yes, this description would be more verbose and complex, but it would more reliably describe the result.

So, from this point, you are no longer working with the diagram. You are working with text. If a diagram is needed — you simply compile the text into a diagram and it turns out as needed. But you don’t even work directly with the text. You work with this diagram-description text through an LLM, asking it to add some block, and the text changes, but changes in a way that everything doesn’t suddenly shift.

The final diagram should be in an object form, from which raster (PNG) or vector (SVG, EPS) images can be created.

It would also be great if such a system could take existing diagrams or diagram templates so that it could borrow styles and existing conventions on how to display what.

So, these are my fantasies. If anyone has ideas on how to implement this — let’s discuss 🙂

Crafting the Future of Recommender Systems: A Deep Dive into Algorithms and Implementation | September 26 2025, 21:17

I decided a while ago to write a book on recommendation algorithms. With mathematics, code examples, a repository, etc. English, of course.

Accordingly, I am looking for volunteer reviewers who are knowledgeable in the field. Also those who have experience with print-on-demand on Amazon.

There’s already about 200 pages of content. About three months of work left. Working title Recommender Algorithms in 2026: A Practitioner’s Guide. Roughly half of it is still in draft form, with the first 80 pages about 80% complete.

I’ve built a mechanism to publish in HTML and PDF simultaneously. The HTML version is fully functional, with navigation. The navigation block reflects the current section, and as you scroll, it shifts to the one in front of the reader. Clicking on a section, of course, teleports you to what you clicked on. It’s all completely automatic.

Exploring AI Search Agent: Revolutionizing Automated Browsing and Task Completion | August 19 2025, 01:21

In addition to the main product for search testing, I am developing an AI Search Agent in my leisure time. You only need to provide it with two pieces of information: a website to visit and a goal (described in a short paragraph). In other words, this thing is smart enough to function without any setup – just the site and the goal, and then it’s on its own.

How it works: This virtual agent generates search queries on its own, refines them based on the results obtained (for example, simplifies them), and analyzes how well they match the intended purpose. If suitable results are found, the agent can add items to the cart and place an order — if this is configured in the settings.

I’ve already written about this recently – today is just a slightly nicer demo. It will be even nicer as it is still being pulled from the middle of development, but you can already see how the page is analyzed, and there are initial results that can be used.

The agent can be used for several purposes. Firstly, it’s an excellent way to create ground truth—a set of queries with perfect results. These data can then be used for search testing without involving often slow and expensive large language models (LLM). Secondly, it helps to test the search functions before deploying them to users. Thirdly, the agent generates realistic usage data needed for training recommendation models that require authentic interactions.

The colorful rectangles in the video are the language of interaction of the agent with AI (or LLM). To understand where to click, the system annotates the page and sends a structured description of the page to AI—often along with a screenshot—so it can analyze everything and make a decision about the next action.

Exploring TestMySearch.com’s Virtual Shopper System | August 15 2025, 04:27

As part of the TestMySearch.com project, I am creating a “virtual shopper” system that simulates the behavior of a real user in an online store: it starts with an abstract goal (for example, “something bright and sexy for the gym”), turns it into a specific search query, performs the search on the site, and depending on the results, may either continue browsing or, with a certain probability, reformulate the query if the findings do not match the original goal; the system then evaluates the pages for their alignment with the initial idea, opens product cards, randomly changes parameters such as color or size, makes decisions about adding to the cart and placing an order, and may also leave the site, which allows generating many sessions similar to real ones overnight for testing search, filters, and recommendations even before live users arrive.

The system is fully automatic. That is, the browser in the video opens by itself, the search field appears by itself (i.e., independent of the site), the system itself concocts the text based on that very initial goal, then the facets and search results are displayed, which may also be in a form unpredictable to the system — but it still understands what is what, and makes decisions about whether to rephrase the query, select a facet or click on a search result. There is a certain probability that the virtual user will leave the site. If the query is reformulated, for example, this virtual user does not repeat queries that have already led to empty or irrelevant results, so within the session there is “memory”.

Navigating Code Generation with AI: Essential Skills for Programmers | August 04 2025, 14:28

I am currently using Gemini extensively for code generation, and I see a skill that programmers need to have to be successful in this field. It’s the ability to quickly read and understand someone else’s code, as well as explain why AI generation needs to be redone and how. For the former, you simply need to know the language very well and read “from the sheet,” because there will be little time to ponder. For the latter, you need to know patterns well and understand where they apply and where they do not. AI will still mess up using patterns inappropriately for a long time.

Moreover, a person will still need to understand “as a whole” 90% of the code generated by AI, and also manage to find time to comprehend each generated line of code. If you relax and miss it, the system may produce even working, but very poorly maintainable code. For instance, there is an unwritten rule that individual files should not contain so much code, and if it grows, you need to refactor, breaking one large into two or three. Sometimes this requires rewriting logic, but this rewriting is always aimed at one task – to simplify maintenance. And AI, while rewriting, also “improves” the code at the same time. And this is quite difficult to prohibit.

In addition, the very concept of LLM implies the limitation of the contextual window. Which gets filled with code very quickly. To create an illusion for the user that everything is working even with a large volume of code, LLMs are able to do preliminary processing, extracting only relevant pieces for processing and setting aside irrelevant ones, so that the relevant ones fit into the actual contextual window. But this process is very unreliable, and once it works, and the second time it turns out that something important was set aside, and as a result, the system did not see the whole picture and generated code, which includes a function very similar to the function set aside, and now we have two almost identical ones.

Besides, currently logic is distributed between the DB and the code. That is, data often controls the code. And data in LLMs simply often do not fit. There is too much of it. In the end, without programmers, current LLM architectures cannot cope. But the requirements for programmers’ qualifications will only increase with LLMs, not decrease. So yes, juniors should be worried, but leads not so much 🙂

DIY Wireless Reaction Game: Building Interactive Button-Based Activities | July 28 2025, 22:26

Who knows their way around electronics? Any recommendations?

I want to make a thing some weekend. A big bulbous button. It lights up – you smash it. The app records the time from when it lights up to when it’s smashed. There might be several buttons and they could be scattered – on walls or the floor. WIRELESS. They might light up randomly – this is controlled by the app (phone or computer). Metrics like average reaction time are calculated on the fly for different understandings of the word ‘average’. For instance, you could place buttons on the ground a few meters apart and invent a moving game for the kids. Or attach them to a wall and smash them with a ball. Basically, it’s a technical question.

How would you do it – dumb buttons on an nRF24L01+ chip or smart buttons on an esp32 microcontroller?

In the first case, every such module listens to the radio: as soon as a command with its ID arrives from the central node, it turns on the light. After the button is pressed, it sends back a “pressed” message. The timer is on the side of the central node. Each button has an Arduino Pro Mini + nRF24L01+, but there will also be a central hub with either nRF24L01+ and Arduino Uno, Mega or ESP32, which collects the data and is connected to the computer (Bluetooth or WiFi).

In the second case, the buttons are connected via Bluetooth (BLE) or WiFi. The brains of the button is the ESP32, which needs to be programmed through a programmer.

Cost-wise, both approaches are roughly the same minus the cost of arcade buttons and 3D printing, somewhere around $10-15 per button.

Exploring the Rarity of Reboots on My Mac: A Yearly Overview | July 11 2025, 20:18

On my Mac, there’s a command last reboot, which shows that I’ve rebooted my computer only four times over the past year.

Exploring the Technological Marvels of Tesla’s Full Self-Driving Capabilities | July 11 2025, 03:59

I read various engineering blogs about Tesla’s autopilot (FSD) — simply because for the last month and a half I’ve been almost constantly riding as if in a taxi — you set the destination and hardly ever need to intervene, the car travels from point A to point B completely independently. This is certainly the future.

Such systems exist not only at Tesla. For example, Mercedes has one (Drive Pilot). Others only help in traffic jams at best. Though Tesla seems to be the only one that works on all roads.

So, returning to engineering curiosities. Tesla has an AI model production on its “farm” called Dojo — an exaFLOP supercomputer on Tesla chips. Videos from cameras are fed into it, and it trains models that are then sent out for autonomous operation across the entire fleet of Tesla cars.

The FSD architecture comprises about 48 specialized neural networks, trained on Dojo, which together form about 1,000 different prediction tensors. Tesla is gradually moving from modular networks (object recognition + planning) to end-to-end training — directly converting video frames into steering trajectory/action. This is akin to a “black box” — the neural network learns directly from human behavior, without manual tuning of knobs; an extremely cool engineering solution, but, I suspect, complex to debug.

By the way, it is claimed that Tesla has switched from C++ to Python. And that this shift to end-to-end training has made 300,000 lines of C++ code unnecessary, where various corner cases and rules for resolving different scenarios were accounted for — now it’s at the model level.

Tesla has abandoned radar and ultrasonics, switching to purely camera solutions (Vision Only) with “Hardware 4” (HW4, FSD Computer 2): 16 GB RAM, 256 GB flash memory, performance 3–8× higher than HW3.

Assess the performance: 22 milliseconds to create a 3D scene with cars, pedestrians, cyclists around — information is collected from 8 cameras 36 times per second.

85 ms for the entire cycle from receiving the image to changing the plan and commands to the wheels. Fantastic!

More than 4 million Teslas on the roads collect data daily, and in the FSD Beta version, more than a billion miles of autonomous driving have been recorded. This “live” dataset is used to train networks on the most real-world scenarios, including rare “edge-case” incidents (strange accidents, road conditions, etc.).

In June 2025, Tesla for the first time delivered a Model Y from the factory in Austin to a customer’s home without a driver or remote operator — fully autonomously. This is very cool.

The Vision network not only analyzes the current frame but also stores features from previous ones (at a distance of ≈1 m). This allows it to remember recently crossed markings/signs, even if they have already left the field of view – very similar to human memory.