The Maddening Ambiguity of Mathematical Notation | December 02 2025, 15:30

If someone tells you that mathematics is an exact science, don’t believe them. Since I’m currently into data science as a hobby, I’m studying all sorts of things from different books and my brain is exploding at how this can happen in a science where every little detail should fit into a system, otherwise it goes by the wayside. Until it gets to notations. It’s a complete mess there. A set of dialects.

Take, for example, common logarithms. The “standard” for how to denote a logarithm depends on which room of the university you are in. In calculus and number theory, log(x) almost always means the natural logarithm ln(x) with base e. The derivative of e^x equals e^x. It’s “natural”. They’re too lazy to write ln. Yet, where decimal logarithms might appear (like in computer science), log(x) suddenly becomes decimal, and ln(x) is based on e.

The expected value E has an argument in square brackets. Meanwhile, the same square brackets in computer science are used for the step function 0/1.

Or if you see a vector – is it a column or a row? In classical mathematics, a vector is always a column. To multiply it by weights, we write T after the vector and then w for the weights. But in many papers, vectors are thought of as rows. And if you see y = xW+b, then x is not a column, because otherwise the dimensions wouldn’t match up. x here is a row. But in the next paper they write Wx+b. And there x is a column 🙂

Angle brackets . For the dot product, the symbol “⋅” is used, but it is hard to see, especially on a whiteboard, and I very often see that mathematicians use angle brackets for dot product. In general, angle brackets are used for the generalized concept of inner product, where the scalar product is a special case. signifies a certain abstract way to multiply a and b and get a number. Meanwhile, in quantum mechanics this would be written as . And for the scalar product, some use a circle with a dot or x in a circle.

And just for the sake of it, in Russia tangent is tg, while in the USA it’s tan. There’s also tan^-1 and arctan, which are the same, though x^-1 generally means 1/x

Rediscovering the 1986 “Chemical Trainer”: A Pioneer in Interactive Learning | November 23 2025, 15:55

At my home in Kolomna, I have a book called “Chemical Trainer” from 1986. I have never seen anything like it before or since.

The material of each of the 54 programs is divided into many small, very short sections, or categories. At the end of each category, one or more questions are posed. This is done to check whether the content of the category is truly understood. For each answer, there is a place in the book to jump to in order to see if the answer is correct. If the answer is wrong, it describes why and asks a new question. If correct — you move further in this quest.

These Germans in 1986 created an interactive textbook even before it became fashionable.

Data Science: The Modern Alchemy of the 21st Century | November 16 2025, 04:02

A cryptic post today. While writing a book on RecSys, I caught myself thinking that modern data science is essentially the alchemy of the 21st century. Half of the “best practices” in algorithms lack a solid mathematical framework. It’s a set of heuristics that “just work”. Much like in the 17th century where they mixed everything indiscriminately, it happens now, and if something works better, everyone else starts doing the same. There’s just no answer to the question “why”.

Take, for example, the NCF/NeuMF (Neural Collaborative Filtering) algorithm. The logic goes like this. Say, there are a million movie ratings by users. And 100 million ratings by users yet given – users can’t watch every movie in the world. But out of these 100 million, you need to choose candidates for advertising for a particular user. The algorithm, of course, has a training phase, where weights are calculated, and a prediction stage, where these weights are used on the incoming data.

(What the algorithm does. Essentially, it’s an ensemble of three sub-algorithms, two of which generate their own conclusions, and then their decisions go to a new neural network, the third algorithm, which provides the final recommendation. Smartly, it’s a hybrid of GMF (matrix factorization) and MLP (Multi-Layer Perceptron). The first of these two is based on matrix decomposition, and the second represents a neural network with multiple layers. Weights are adjusted on training data.)

For one positive example, it takes 4 negative ones. Why four? Just because it’s “not too many and not too few”. Would 8 be better? Unknown, but it would definitely take longer to learn.

Why are embedding dimensions 32? or 64? There’s no formula. It’s the “golden mean” between a “dumb” model (few k) and an “overtrained” (many k).

Now about the neural network. Why is the MLP block built as a “tower” (64 -> 32 -> 16)? Why not (50 -> 25 -> 10)? Why ReLU between them (and not tanh for example)? Pure empiricism. The number of layers in the tower is also adjusted.

Why do GMF and MLP parts have different embeddings at the input? Because the authors of the paper tried it, and it “worked out better”. No mathematical proof. Why do they go to the final layer with equal weights? Because they just do.

Why are the outputs of the two paths “concatenated” (concat), and not added or multiplied? “Experience showed that this way the result is more accurate.”

And so it is with everything, up to the choice of optimizer Adam or the “magical” learning_rate=0.001, although at least these have some mathematical basis.

That is, at least a dozen parameters of one algorithm are empirically chosen, with no clear confidence that they are independent of each other. But many of them depend on the dataset, but no one knows how 😉

In general, alchemy.

Exploring Recommender Algorithms Through Interactive Visualizations and Sandbox Simulations | November 11 2025, 05:23

I’ve launched an electronic open source application for my book Recommender Algorithms! It’s a “sandbox” where you can “run” various recommendation algorithms with different settings, and view specific visualizations for each algorithm that help understand how it works. For instance, for algorithms like ItemKNN, SLIM, or EASE, a key visualization is a heatmap of the learned similarity matrix (item-item similarity matrix). This allows you to see which pairs of items the model considers “similar” (or “influencing” each other). For SLIM, for example, a useful “Sparsity Plot” shows that the similarity matrix indeed turned out to be sparse. For associative rule algorithms (Apriori, FP-Growth, Eclat) the visualization is not a graph, but interactive tables with found “Frequent Itemsets” and generated “Association Rules,” which can be filtered and sorted.

Additionally, there is a parametric mechanism for creating a “game dataset” — Dataset Wizard. It works like this – there are template datasets that describe items through characteristics. For example, recipes through flavors. Or movies through genres. The system generates random users with a random set of characteristics from the same set — and there are many sliders to make this distribution more contrasted or complex. Next, a matrix of user ratings of items is created – conditionally, if the characteristics of the user and the item match, then the rating will be higher because “tastes match”; conversely, if they differ, then the rating will be lower. Here too, sliders add noise and scarcity – randomly removing part of the matrix. The characteristics of products and users are not fed into the recommendation algorithm; they are hidden, but they are used to visualize the results.

The third component of the application is the tuning of hyperparameters. Essentially, it’s an auto-configurator for a specific dataset. An iterative approach is used, which is much more efficient than a full search (Grid Search) or random search (Random Search). In short, the system analyzes the history of past runs (trials) and builds a probability “map” (surrogate model) of which parameters will likely yield the best result. Then, it uses this map to smartly choose the next combination to test. This method is called Sequential Model-Based Optimization (SMBO).

The code is open source and will be further supplemented with new algorithms and new visualizations.

Link to the code in the comments.

Link to the site where the code is deployed and where you can check out the application is also in the comments.

Unveiling “Recommender Algorithms”: A Comprehensive Guide on Recommendation Systems | October 25 2025, 17:36

I finally released a book on #RecSys! It’s called Recommender Algorithms, where I’ve compiled over 50 recommendation algorithms with detailed mathematical derivations, thorough explanations, and code examples.

https://www.testmysearch.com/books/recommender-algorithms.html

It all started early this spring in Germany, when I attended an ACM conference and sketched out the first structure of the book while analyzing the talks from the RecSys track. And now, just six months later, it has come to life.

Why did I write it? Because neither online nor in print is there a single, accessible resource that deeply explores recommendation algorithms of various types and purposes. There are articles focused on small subsets, but collecting and systematizing approaches—from foundational methods to the very latest—seems to have never been done before. I don’t know if I succeeded, but I’d love to hear your feedback.

Please like & share!

P.S. Click at READ SAMPLE to see the first 40 pages. The table of contents is there as well.

https://www.testmysearch.com/books/recommender-algorithms.html

https://www.testmysearch.com/books/recommender-algorithms.html

Haunting Tales of Hotel Room 441 | October 16 2025, 12:27

I am starting to like my hotel. I am in room 446

Items we have read report that if you do stay at this hotel, avoid the fourth floor, or at least Room 441. That is where a lady from the other side lurks at the end of the bed, kicking the feet of guests who attempt to sleep there. And those guests are attempting to sleep there because they want to have the haunted experience. Don’t ask. Ever read Stephen King’s horror short story 1408”? The Congress Plaza is said to be its inspiration and a portion of the source that has brought Mr. King’s net worth to $500 million.”

Items we have read report that if you do stay at this hotel, avoid the fourth floor, or at least Room 441. That is where a lady from the other side lurks at the end of the bed, kicking the feet of guests who attempt to sleep there. And those guests are attempting to sleep there because they want to have the hauted experience. Don’t ask. Ever read Stephen King’s horror short story 1408”? The Congress Plaza is said to be its inspiration and a portion of the source that has brought Mr, King’s net worth to $500 million.”

From Vision to Bookshelf: Launching “Recommender Algorithms” | October 13 2025, 11:54

Finally, I have released a book! It is called Recommender Algorithms — it contains more than 50 recommendation algorithms with mathematical explanations, detailed descriptions, and code examples.

It all started early in the spring in Germany, when I attended the ACM conference and made the first sketches of the book’s structure, analyzing reports on the RecSys stream. And now, six months later, the book has been published.

Why did it appear? Because there is no single, accessible source either online or in print where the recommendation algorithms of various types and purposes are thoroughly examined. There are articles focused on narrow aspects, but to collect and systematize the developments — from fundamental to the most recent — until now, it seems, no one has managed to do it for some reason. Maybe no one needed to. Suddenly, I found I needed to. I don’t know if I succeeded, but I am eager for your feedback.

Available on Amazon and Barnes and Noble. There is a Russian automatic translation (surprisingly, but very decent), but I do not know how to sell it yet.

https://www.testmysearch.com/books/recommender-algorithms.html?FB

(This is not my only book, but today — just about this one.)

Decoding Solr and Lucene: Engineering Insights and Algorithms | October 06 2025, 17:11

Preparing a book for publication on Solr&Lucene. What do you think about publishing such a translation on Amazon? 🙂

The book is about algorithms and under-the-hood engineering. I haven’t seen books from this angle yet, maybe someone will find it interesting.

Introducing the AI-Powered Text-to-Diagram Generator | September 30 2025, 20:57

While working on a book, I realized what kind of product I’m missing. It’s an AI diagram generator based on textual descriptions.

The idea is that the master document for the diagram is text. This textual description can be (and should be) quite detailed, so the generated diagram exactly matches the author’s vision. The diagram itself is not edited. That is, it can be edited – moving circles around, but ideally, after making changes, the system should update the text, generating from which will result in what the user adjusted.

The result — the diagram — should correspond as closely as possible to the description. If it does not match the description because, for example, it’s impossible to make a triangle with three obtuse angles, the system should do its best and provide a verbal response about what didn’t work. The user can then modify the task so that the system complies and produces the diagram correctly.

But then we understand that the author might have randomly achieved something that they liked with their flawed text. And if regenerated, it might turn out differently, and not necessarily better. Therefore —

You could ask the system to generate a diagram description from the diagram, which, if inputted back into the diagram generator, would result exactly in what the description was generated from. Yes, this description would be more verbose and complex, but it would more reliably describe the result.

So, from this point, you are no longer working with the diagram. You are working with text. If a diagram is needed — you simply compile the text into a diagram and it turns out as needed. But you don’t even work directly with the text. You work with this diagram-description text through an LLM, asking it to add some block, and the text changes, but changes in a way that everything doesn’t suddenly shift.

The final diagram should be in an object form, from which raster (PNG) or vector (SVG, EPS) images can be created.

It would also be great if such a system could take existing diagrams or diagram templates so that it could borrow styles and existing conventions on how to display what.

So, these are my fantasies. If anyone has ideas on how to implement this — let’s discuss 🙂

Crafting the Future of Recommender Systems: A Deep Dive into Algorithms and Implementation | September 26 2025, 21:17

I decided a while ago to write a book on recommendation algorithms. With mathematics, code examples, a repository, etc. English, of course.

Accordingly, I am looking for volunteer reviewers who are knowledgeable in the field. Also those who have experience with print-on-demand on Amazon.

There’s already about 200 pages of content. About three months of work left. Working title Recommender Algorithms in 2026: A Practitioner’s Guide. Roughly half of it is still in draft form, with the first 80 pages about 80% complete.

I’ve built a mechanism to publish in HTML and PDF simultaneously. The HTML version is fully functional, with navigation. The navigation block reflects the current section, and as you scroll, it shifts to the one in front of the reader. Clicking on a section, of course, teleports you to what you clicked on. It’s all completely automatic.