Navigating Complexity: The Challenge of Wikipedia’s Expert-Driven Content | November 26 2025, 01:06

Wikipedia has one big problem. Well, or we have it with Wikipedia. If you go to almost any Wikipedia page about a relatively complex mathematical or physical concept, you often suddenly don’t want to read it any further. Formally everything is correct there, but the explanation is given through concepts, often even more complex than the concept being explained. Besides, there is often a lot of unnecessary information — what is formally/academically/taxonomically part of the topic, but essentially “pollutes” the first impression.

This problem arises because the authors of Wikipedia (often mathematicians) prioritize rigor and completeness rather than didactics and comprehensibility.

In the English-speaking environment, this is sometimes called “Drift into pedantry”. Articles are often written by experts for experts, not for those who are trying to learn the subject from scratch.

Let’s take, for example, a “tensor”. Imagine a student who has heard that tensors are used in machine learning (Google TensorFlow) or physics and wants to understand the essence.

What the reader expects (intuition): “A tensor is a table of numbers (or some sort of data container) that describes the properties of an object and correctly changes if we rotate the coordinate system”

What Wikipedia provides: “A tensor (from Latin tensus, ‘strained,’ as per the classical layout of mechanical stress at the sides of a deformable cube, see illustration) — is a layout (arrangement in space) of numbers (components), used in mathematics and physics as a special type of multi-index object, possessing mathematical properties.” The article immediately starts listing ranks, covariance and contravariance of indices. This is formally correct but it “pollutes” the first impression.

The illustration at the very top is captioned like this: “Mechanical stress, deforming a cube with faces perpendicular to the coordinate axes, in classic elasticity theory is described by the Cauchy stress tensor, which links 2 indices: the normal vector to the face with the stress vector T (force per unit area); there are 3 directions of normals and 3 directions of stress components, which gives a 2nd rank tensor 3×3 — consisting of 9 components.”

Formally — not a single error. In fact — it’s a wall of text that requires knowledge of linear algebra just to read the definition.

It’s as if you asked “What is an apple?”, and you were responded with: “An apple is a fruit of plants from the subfamily Amygdaloideae or Spiraeoideae, featuring an epicarp, mesocarp, and endocarp, often participating in Newton’s gravitational experiments.”

On one hand, it seems like with the emergence of LLM, Wikipedia is no longer necessary. There are conditional LLMs like ChatGPT, which essentially paraphrase everything that is in Wikipedia in the required form. But they do it because they were trained on Wikipedia, and undoubtedly Wikipedia was given much more weight during training than other internet junk. If there was no Wikipedia in the training set, it would be much more difficult. Meanwhile, Wikipedia is constantly edited, and LLM and Google use it exactly when answering questions.

Therefore, on the one hand, it seems to me that it is high time for Wikipedia to transition to generating on the basis of expert-curated data and packaging knowledge in the required format, for example, in the form of questions and answers. On the other, the whole idea of encyclopedia master-data for LLM/RAG is lost.

The paradox is that LLM is, in essence, the only “interface” that was able to read these pedantic definitions of Wikipedia, “understand” them (through thousands of examples of code and articles) and translate them back into humane language. Wikipedia has become an excellent database for robots, but a poor textbook for people.

Curiosity Click: How Facebook’s Ad Previews Captivate | November 21 2025, 21:51

Facebook keeps showing me ads (in this case – a vest) and occasionally chooses very “successful” spots for a freeze frame that serves as a video preview in the feed. But, I must say, it achieves its goal and I click to see what kind of madness this is.

Data Science: The Modern Alchemy of the 21st Century | November 16 2025, 04:02

A cryptic post today. While writing a book on RecSys, I caught myself thinking that modern data science is essentially the alchemy of the 21st century. Half of the “best practices” in algorithms lack a solid mathematical framework. It’s a set of heuristics that “just work”. Much like in the 17th century where they mixed everything indiscriminately, it happens now, and if something works better, everyone else starts doing the same. There’s just no answer to the question “why”.

Take, for example, the NCF/NeuMF (Neural Collaborative Filtering) algorithm. The logic goes like this. Say, there are a million movie ratings by users. And 100 million ratings by users yet given – users can’t watch every movie in the world. But out of these 100 million, you need to choose candidates for advertising for a particular user. The algorithm, of course, has a training phase, where weights are calculated, and a prediction stage, where these weights are used on the incoming data.

(What the algorithm does. Essentially, it’s an ensemble of three sub-algorithms, two of which generate their own conclusions, and then their decisions go to a new neural network, the third algorithm, which provides the final recommendation. Smartly, it’s a hybrid of GMF (matrix factorization) and MLP (Multi-Layer Perceptron). The first of these two is based on matrix decomposition, and the second represents a neural network with multiple layers. Weights are adjusted on training data.)

For one positive example, it takes 4 negative ones. Why four? Just because it’s “not too many and not too few”. Would 8 be better? Unknown, but it would definitely take longer to learn.

Why are embedding dimensions 32? or 64? There’s no formula. It’s the “golden mean” between a “dumb” model (few k) and an “overtrained” (many k).

Now about the neural network. Why is the MLP block built as a “tower” (64 -> 32 -> 16)? Why not (50 -> 25 -> 10)? Why ReLU between them (and not tanh for example)? Pure empiricism. The number of layers in the tower is also adjusted.

Why do GMF and MLP parts have different embeddings at the input? Because the authors of the paper tried it, and it “worked out better”. No mathematical proof. Why do they go to the final layer with equal weights? Because they just do.

Why are the outputs of the two paths “concatenated” (concat), and not added or multiplied? “Experience showed that this way the result is more accurate.”

And so it is with everything, up to the choice of optimizer Adam or the “magical” learning_rate=0.001, although at least these have some mathematical basis.

That is, at least a dozen parameters of one algorithm are empirically chosen, with no clear confidence that they are independent of each other. But many of them depend on the dataset, but no one knows how 😉

In general, alchemy.

Exploring Recommender Algorithms Through Interactive Visualizations and Sandbox Simulations | November 11 2025, 05:23

I’ve launched an electronic open source application for my book Recommender Algorithms! It’s a “sandbox” where you can “run” various recommendation algorithms with different settings, and view specific visualizations for each algorithm that help understand how it works. For instance, for algorithms like ItemKNN, SLIM, or EASE, a key visualization is a heatmap of the learned similarity matrix (item-item similarity matrix). This allows you to see which pairs of items the model considers “similar” (or “influencing” each other). For SLIM, for example, a useful “Sparsity Plot” shows that the similarity matrix indeed turned out to be sparse. For associative rule algorithms (Apriori, FP-Growth, Eclat) the visualization is not a graph, but interactive tables with found “Frequent Itemsets” and generated “Association Rules,” which can be filtered and sorted.

Additionally, there is a parametric mechanism for creating a “game dataset” — Dataset Wizard. It works like this – there are template datasets that describe items through characteristics. For example, recipes through flavors. Or movies through genres. The system generates random users with a random set of characteristics from the same set — and there are many sliders to make this distribution more contrasted or complex. Next, a matrix of user ratings of items is created – conditionally, if the characteristics of the user and the item match, then the rating will be higher because “tastes match”; conversely, if they differ, then the rating will be lower. Here too, sliders add noise and scarcity – randomly removing part of the matrix. The characteristics of products and users are not fed into the recommendation algorithm; they are hidden, but they are used to visualize the results.

The third component of the application is the tuning of hyperparameters. Essentially, it’s an auto-configurator for a specific dataset. An iterative approach is used, which is much more efficient than a full search (Grid Search) or random search (Random Search). In short, the system analyzes the history of past runs (trials) and builds a probability “map” (surrogate model) of which parameters will likely yield the best result. Then, it uses this map to smartly choose the next combination to test. This method is called Sequential Model-Based Optimization (SMBO).

The code is open source and will be further supplemented with new algorithms and new visualizations.

Link to the code in the comments.

Link to the site where the code is deployed and where you can check out the application is also in the comments.

The Evolution of the Albanian Virus: From Joke to Cyberthreat | November 07 2025, 14:21

“Hello. I am an Albanian virus, but due to the low level of technology in my country, I cannot do anything to your computer. Please kindly delete one file on your computer and then forward me to other users.”

Here’s the 2025 version. The line they ask to insert into the terminal – echo “” | base64 -d | bash

This line contains curl, pointing to 217.119.139.117 whose result is passed to `nohup bash`. And from this address, a script is loaded, of course obfuscated.

Naturally, no available LLM agrees to decrypt it. But Qwen didn’t mind.

Upon execution, the script gathers information from Chrome, Brave, Edge, Firefox, and others, extracting cookie files, autocomplete history, and system login data, collects crypto wallets like Electrum, Coinomi, Exodus, Atomic, Wasabi, Ledger Live, and others, gathers content from the “Notes” macOS app with attached media files, data from the Keychain (passwords), and also scans the desktop and documents for files of certain extensions. The collected data are archived and sent to a remote server with the IP address 217.119.139.117.

To ensure persistent access, the script creates hidden launch services (LaunchDaemons) with random names, making it difficult to detect. It can download and replace the legitimate Ledger Live application with a modified version.

Such is the Albanian virus)

Exploring SingleFile: The Chrome Extension for Easy Web Page Sharing | November 05 2025, 17:45

I found a useful Chrome extension – SingleFile. It solves a problem like this – you need to share a browser page that is not public, for example, via iMessage or Telegram. This is not so trivial to do. For example, you can save a .mhtml file from the browser on your laptop, and send it, but only recipients on an iPhone cannot open it. Saving as a standard .html is also not an option, as images and styles are not preserved. Taking a screenshot only captures a small fragment. Installing an extension that creates a long, large PNG of the entire page – this PNG cannot be opened on an iPhone from Telegram at least, only the top renders. Printing to PDF is also not a solution – the result is very poor and highly dependent on the developers’ desire to make a print-friendly version.

SingleFile allows you to create a snapshot of a page from the browser, a regular .html, which can be opened anywhere, with embedded styles and images. But what is especially convenient, before exporting, you can remove anything you don’t want to share through the WebInspector, and it won’t appear in the final .html. The extension is open source on GitHub, and it doesn’t send anything anywhere. Apparently, if there was dynamic loading through JS on the page, it saves not the JS, but the result of the loading, and the JS is cut out.

In general, it’s convenient, a good thing, use it.

(I had an interview released on the internal portal today, and I needed to share it with my family in our family chat)

Smart Car Seat Selection: How My Tesla Knows the Driver | November 03 2025, 14:29

Incidentally, in my Tesla, there’s a very clever system for identifying the driver. If I enter the car first but sit in the passenger seat, placing my phone immediately in the central console for charging, and then Nadia enters second but sits in the driver’s seat and also places her phone there, her profile is selected automatically because she’s the driver, even though both phones are on charge under the central console.

So, there are two possibilities: either there is an antenna which can precisely detect that a phone has crossed the driver’s door rather than entering the car in any other way, or there is a camera focused on the driver. In any case, it’s very reassuring that it “just works”.