Stages of Understanding Scientific Papers | December 10 2025, 19:38

As I periodically read scientific papers on my topic, I will try to articulate the levels of understanding the truth.

Level 0: “Read Later Folder” Downloaded the PDF, the title sounds genius, the abstract seems like the solution to all my problems. The file is forever buried in the ~/Downloads/Papers/ToRead folder.

Level 1: “Sumerian Cuneiform” Don’t understand anything at all. Random symbols, the Greek alphabet is over. “Orthogonal extrapolation of cognitive entropy within a quasi-stationary discourse inevitably induces a bifurcation of transcendental synergism.” Such materials really lower self-esteem. Most often from this level, you either fall back to zero, or gradually move to the second level.

Level 2: “Illusion of Competence” The Abstract is clear, the Introduction reads like a good detective story. But as soon as the main section starts, the text turns into a pumpkin. I can’t paraphrase it in my own words, only in general phrases: “Well, they trained a neural net… kind of.”

Level 3: “Formulas where needed and where not” The Abstract is clear, the first half of the article is also okay (architecture, pictures). But then comes formula (4), where “magic” happens. I take the authors’ word for it that equation (3) leads to (4) because, of course, I won’t check it. Beyond that — sheer horror and belief in a miracle.

Level 4: “Goldfish Effect” While reading — everything is crystal clear. The logic is solid, conclusions are obvious, the authors are smart. I close the tab, someone asks me, “What was the article about?” — and I freeze. My mind goes blank. If you take away the paper, I can’t reproduce even the idea because there essentially isn’t an idea, there is a process.

Level 5: “Armchair Expert” Everything’s clear, I can retell the essence over a beer. I know that Input transforms into Output, but the “black box” inside is still black. Give me a computer, I wouldn’t be able to reproduce even the skeleton because, it turns out, the article lacks half of the important stuff.

Level 6: “Critic-Practitioner” Everything is clear, I can recount, understand how to reproduce (even without their code). I see where they cut corners. I definitely know that the “state-of-the-art” result is achieved only thanks to a lucky seed or dataset and this strange trick in preprocessing, mentioned in the footnote on page 12.

Level 7: “Deconstructor” Hooray, I’ve understood everything and implemented it myself. It works worse than in the article, but I know why. However, I understand this work better than the second author (who just made charts). I see that all this complex mathematics over 5 pages boils down to two paragraphs in the middle.

Level 8: “Nirvana” The article is trivial. The idea is secondary, it was all in the ’90s with Schmidhuber, just named differently. Formulas are overcomplicated for importance. I can write the same in 10 lines of code and it will work faster. Reject.

If anything — I’m stuck somewhere between 2 and 4.

Comparing US and Russian Higher Education Systems through Credit Hours | December 10 2025, 17:35

Regarding education in the USA and the USSR/Russia. My degree in the USA is evaluated as a Master of Science degree in Computer Science. My younger colleagues say that a Russian university degree is rarely recognized as a Master’s these days, and often hardly qualifies even for a Bachelor’s. I decided to look at the numbers and was very surprised.

To earn a bachelor’s degree in the USA, you need to spend about 2000 hours in classrooms/laboratories. In terms of credits, this equals 120 credit hours. One credit usually equals 1 hour (50 minutes) of lectures per week for a semester (15 weeks). Laboratory work has a different coefficient (often 2–3 hours in the lab count as 1 credit), so the actual number of classroom hours is slightly higher (closer to 2000+).

So, my diploma states that I spent 7908 hours in classes over five years. That’s four times more than the typical student in the USA. Based on the numbers, it turns out that I spent about 2000 hours on math, physics, and English alone over five years, with a total of 42 subjects.

A colleague shared that in his Russian bachelor’s diploma there are 3140 academic hours, which is twice as less. And can you share how many hours are in your diploma?

Year of graduation, university, specialty, and the number of hours? I’m curious about the range of variation.

The Maddening Ambiguity of Mathematical Notation | December 02 2025, 15:30

If someone tells you that mathematics is an exact science, don’t believe them. Since I’m currently into data science as a hobby, I’m studying all sorts of things from different books and my brain is exploding at how this can happen in a science where every little detail should fit into a system, otherwise it goes by the wayside. Until it gets to notations. It’s a complete mess there. A set of dialects.

Take, for example, common logarithms. The “standard” for how to denote a logarithm depends on which room of the university you are in. In calculus and number theory, log(x) almost always means the natural logarithm ln(x) with base e. The derivative of e^x equals e^x. It’s “natural”. They’re too lazy to write ln. Yet, where decimal logarithms might appear (like in computer science), log(x) suddenly becomes decimal, and ln(x) is based on e.

The expected value E has an argument in square brackets. Meanwhile, the same square brackets in computer science are used for the step function 0/1.

Or if you see a vector – is it a column or a row? In classical mathematics, a vector is always a column. To multiply it by weights, we write T after the vector and then w for the weights. But in many papers, vectors are thought of as rows. And if you see y = xW+b, then x is not a column, because otherwise the dimensions wouldn’t match up. x here is a row. But in the next paper they write Wx+b. And there x is a column 🙂

Angle brackets . For the dot product, the symbol “⋅” is used, but it is hard to see, especially on a whiteboard, and I very often see that mathematicians use angle brackets for dot product. In general, angle brackets are used for the generalized concept of inner product, where the scalar product is a special case. signifies a certain abstract way to multiply a and b and get a number. Meanwhile, in quantum mechanics this would be written as . And for the scalar product, some use a circle with a dot or x in a circle.

And just for the sake of it, in Russia tangent is tg, while in the USA it’s tan. There’s also tan^-1 and arctan, which are the same, though x^-1 generally means 1/x

In-Flight French: Building a Language App on the Fly | December 01 2025, 15:45

By the way, yesterday morning, while waiting at the gate for my flight to Miami, I quickly wrote a French language learning app using Gemini based on an idea I sketched out to a friend while driving to the airport, and then used this app during the flight.

The idea is that in an unfamiliar foreign language text, the user first marks unknown words and then sees their translations — but without the original text, and then returns to the text itself — but no longer seeing the translations. It’s as if the “dictionary was in the next room.” The hypothesis is that this method helps better memorize than when the translation is shown immediately upon clicking on a word, and when no effort is needed.

I am pleased that creating the app from scratch to the finished version took only about 35-40 minutes, and then I used it for some time during the flight, without the internet. Since all translations of all words/phrases were already made in advance.

I just deployed it on Render. It’s also nice that demonstrating the code in action was free and took another 10 minutes.

https://readandlearn.onrender.com/

Navigating Complexity: The Challenge of Wikipedia’s Expert-Driven Content | November 26 2025, 01:06

Wikipedia has one big problem. Well, or we have it with Wikipedia. If you go to almost any Wikipedia page about a relatively complex mathematical or physical concept, you often suddenly don’t want to read it any further. Formally everything is correct there, but the explanation is given through concepts, often even more complex than the concept being explained. Besides, there is often a lot of unnecessary information — what is formally/academically/taxonomically part of the topic, but essentially “pollutes” the first impression.

This problem arises because the authors of Wikipedia (often mathematicians) prioritize rigor and completeness rather than didactics and comprehensibility.

In the English-speaking environment, this is sometimes called “Drift into pedantry”. Articles are often written by experts for experts, not for those who are trying to learn the subject from scratch.

Let’s take, for example, a “tensor”. Imagine a student who has heard that tensors are used in machine learning (Google TensorFlow) or physics and wants to understand the essence.

What the reader expects (intuition): “A tensor is a table of numbers (or some sort of data container) that describes the properties of an object and correctly changes if we rotate the coordinate system”

What Wikipedia provides: “A tensor (from Latin tensus, ‘strained,’ as per the classical layout of mechanical stress at the sides of a deformable cube, see illustration) — is a layout (arrangement in space) of numbers (components), used in mathematics and physics as a special type of multi-index object, possessing mathematical properties.” The article immediately starts listing ranks, covariance and contravariance of indices. This is formally correct but it “pollutes” the first impression.

The illustration at the very top is captioned like this: “Mechanical stress, deforming a cube with faces perpendicular to the coordinate axes, in classic elasticity theory is described by the Cauchy stress tensor, which links 2 indices: the normal vector to the face with the stress vector T (force per unit area); there are 3 directions of normals and 3 directions of stress components, which gives a 2nd rank tensor 3×3 — consisting of 9 components.”

Formally — not a single error. In fact — it’s a wall of text that requires knowledge of linear algebra just to read the definition.

It’s as if you asked “What is an apple?”, and you were responded with: “An apple is a fruit of plants from the subfamily Amygdaloideae or Spiraeoideae, featuring an epicarp, mesocarp, and endocarp, often participating in Newton’s gravitational experiments.”

On one hand, it seems like with the emergence of LLM, Wikipedia is no longer necessary. There are conditional LLMs like ChatGPT, which essentially paraphrase everything that is in Wikipedia in the required form. But they do it because they were trained on Wikipedia, and undoubtedly Wikipedia was given much more weight during training than other internet junk. If there was no Wikipedia in the training set, it would be much more difficult. Meanwhile, Wikipedia is constantly edited, and LLM and Google use it exactly when answering questions.

Therefore, on the one hand, it seems to me that it is high time for Wikipedia to transition to generating on the basis of expert-curated data and packaging knowledge in the required format, for example, in the form of questions and answers. On the other, the whole idea of encyclopedia master-data for LLM/RAG is lost.

The paradox is that LLM is, in essence, the only “interface” that was able to read these pedantic definitions of Wikipedia, “understand” them (through thousands of examples of code and articles) and translate them back into humane language. Wikipedia has become an excellent database for robots, but a poor textbook for people.

The Inner Mechanics of Old Rotary Phones | November 25 2025, 00:59

When I was little, I used to take apart old telephones many times, and only now, in my grey years, I realized that I never wondered how they worked. And they worked in a very interesting way.

Let’s start with the dial. The phone is connected to the network by two wires. The dial is a rotary one. When you wind up the disk, the contacts are blocked, and when you release it, the disk returns backward and delivers a series of interruptions/pulses to the line. But how was it made to return at a constant speed (which is 10 pulses per second)?

It operated based on a centrifugal friction governor. The mechanics (gearbox) accelerated the governor’s axle to thousands of revolutions per minute. Two weights with friction pads (consider them brakes) were seated on the axle. The centrifugal force pressed them against the stationary drum, creating a braking effort. This is a direct heir to Watt’s centrifugal governor, allowing the mechanism to work stably regardless of how sharply you released the disk.

Next. The Central Office connected you with a friend. You both speak at the same time, and sound is transmitted there and back through two wires—why two wires and not four, you understand? Well, okay, but why don’t you hear yourself too loudly, since the microphone sends the sound there, from where the “speaker” hears it?

I couldn’t answer quickly. Went googling. So, it turns out that a special differential transformer was responsible for this. There, the current from the microphone branches off: part goes into the line to the friend, and part goes into the “balance circuit” (a chain of a resistor and capacitor inside the phone), mimicking the line resistance. The transformer coils are wound in opposition: the magnetic flows from the current in the line and the current in the balance circuit mutually annihilate themselves in the coil that goes to the speaker. Engineers purposely adjusted the balance not perfectly, leaving a “local effect” – a quiet sound of one’s own voice, so the phone wouldn’t seem “dead.” But the incoming signal from the friend has nothing to unbalance it (silence on your side), so it freely passes to the speaker.

Now about the microphone. At that time there were no transistors in phones, but the signal was loud. The secret is in the design of the microphone, it’s carbon. Essentially, it is a box with carbon powder and a movable diaphragm. The sound from your mouth compresses and decompresses the powder, changing its resistance. The microphone does not generate current but modulates the powerful current coming from the Central Office. Essentially, it worked as an amplifier. Over time, the charcoal compacted, and the audibility dropped—hence the habit of tapping the handset to “shake up” the powder.

The speaker was normal, electromagnetic. Although not quite. If there were only an electromagnet inside (without a permanent magnet), the phone would horribly distort the voice. An electromagnet attracts iron regardless of the polarity of the current. If you supply a sine wave (voice), the diaphragm would be attracted during both the positive and the negative half-waves. Result: the frequency of the sound would have doubled, and you would hear not the voice of a friend, but an unintelligible high-frequency buzzing. The permanent magnet solves this problem: It creates “preload.” The diaphragm is always attracted to the magnet with medium force. When the “plus” of the signal arrives, the magnetic field strengthens and the diaphragm flexes more. When the “minus” arrives, the field weakens and the diaphragm springs back.

In modern speakers, the force strictly depends on the direction of the current. Plus pushes, minus pulls. Therefore, the frequency doubling, which old phone engineers feared, physically cannot occur here. The diaphragm doesn’t need “preload” by a magnet, it just needs to hang in peace.

Interestingly, the principle of old electromagnetic capsules (metal diaphragm + “anchor”) is used now in the most expensive in-ear headphones—google “balanced armature headphones” (prices around $500).

The voltage in the telephone network was negative – minus 48/60 volts. Plus was grounded, and the “live” wire was the minus. Why? It turns out, this is protection against electrochemical corrosion. The cables lie in moist earth. If there were a “plus” (anode) on the wire, upon insulation damage, copper would dissolve (electrolysis) and the cable would rot. With “minus” (cathode), metal ions, on the contrary, tend to settle on the conductor from the soil, which prolonged the cable’s life by decades.

Rediscovering the 1986 “Chemical Trainer”: A Pioneer in Interactive Learning | November 23 2025, 15:55

At my home in Kolomna, I have a book called “Chemical Trainer” from 1986. I have never seen anything like it before or since.

The material of each of the 54 programs is divided into many small, very short sections, or categories. At the end of each category, one or more questions are posed. This is done to check whether the content of the category is truly understood. For each answer, there is a place in the book to jump to in order to see if the answer is correct. If the answer is wrong, it describes why and asks a new question. If correct — you move further in this quest.

These Germans in 1986 created an interactive textbook even before it became fashionable.

Exploring the Fascinating Properties of Glass | November 21 2025, 23:58

I got carried away with the topic of glass and learned so many interesting things, so I’m sharing. It all started when I read about the supercritical state of matter – it turns out that the line separating liquid and gaseous states on a pressure and temperature graph at some point breaks off, and beyond that lies a state of matter that is neither here nor there. I started reading about states (phases) of matter and stumbled upon the fact that glass is essentially a state between liquid and solid. It flows, just very slowly. This myth is popular thanks to observations of medieval windows, where the glass is often thicker at the bottom, which was attributed to “flowing” under the influence of gravity, and it was even mentioned in school textbooks. In reality, glass is an amorphous solid with extremely high viscosity at room temperature, and it does not flow noticeably even over billions of years; the uneven thickness of old glass panes is explained by production technologies, when the thicker edge was installed at the bottom for stability.

I delved into the topic of glass further. It turned out that the reason why glass can be transparent is rooted in quantum mechanics, specifically in the electronic structure of the material, not because of the density of particles. The essence is that for an electron to absorb a photon, it must transition from one energy level to another, but in silicon dioxide, the width of the band gap is so large that the energy of visible light photons is physically insufficient to make this “jump.” As a result, light simply cannot interact with the electrons and goes straight through the material, while higher-energy ultraviolet radiation can overcome this barrier and is thus absorbed by glass.

It also turned out that melted glass conducts electricity. Moreover, the mechanism of conductivity fundamentally differs from how metals conduct electricity. In a copper wire, current is a flow of free electrons. In cold glass (an insulator), electrons are tightly bound, and ions are locked in the solid lattice. But when you heat glass to the molten state (usually above 1000 degrees for silicates), thermal energy breaks the rigid bonds of the lattice, and glass becomes a liquid, with ions gaining freedom of movement. The current in molten glass is the physical movement of charged atoms (ionic conductivity), not just “flowing” electrons.

The green tint you see on the edge of regular glass (as seen in the attached picture) turns out to be caused by iron ions, present as impurities (~0.1%). Sand is a natural material, and removing all the iron from it is difficult and costly. Low-iron glass, which has tens of times fewer iron ions, is used in solar panels, not just because it is more transparent. Iron greedily absorbs the infrared spectrum (thermal energy), reducing the efficiency of the panel. By removing iron, we allow maximum energy to reach the silicon cells.

And finally, the most “mind-blowing” (literally). There are these things called “Prince Rupert’s drops.” If you drop molten glass into icy water, the outer shell of the drop cools and hardens instantly, while the inner part remains liquid. As it cools, the core tries to contract, but the hardened shell doesn’t allow it. As a result, the inside of the drop preserves colossal mechanical stress (up to 700 MPa).

The physics of this process creates a paradox: the “head” of such a drop can withstand being struck by a hammer because the compression of the surface makes it incredibly strong (the same principle is used in tempered glass for smartphones). But just nick the thin tail, and the balance of forces is disrupted, and a wave of destruction moves through the drop at the speed of a bullet (about 1.5 km/s), turning it into glass dust right in your hands.

There’s also something in physics called “metallic glasses” (amorphous metals). If you cool the molten metal at a rate of a million degrees per second, atoms do not have time to arrange into a crystalline lattice and freeze in chaos. Such “glassy metal” possesses unique magnetic permeability and is stronger than titanium, because it lacks crystal lattice defects, which are usually the points of destruction. So glass is a much broader concept than just transparent substance in our windows 🙂

The only example of an object made from this material, amorphous metal, that I’ve encountered is, believe it or not, the iPhone clip.

By the way, that same amorphous structure of glass, which I mentioned earlier, gives it an unexpected advantage — supernatural sharpness. If you take a scalpel made of the best surgical steel and look at it under an electron microscope, its edge will look like a jagged saw. This is inevitable: steel is made up of crystalline grains, and it’s impossible to sharpen it any smoother than the grain size allows.

But obsidian (volcanic glass) when fractured provides an edge only about 3 nanometers thick (about 1/30000 the thickness of a human hair). There’s no magic here, just that glass lacks a crystalline lattice, which would otherwise prevent achieving a perfectly smooth fracture down to the molecular level. That’s why obsidian scalpels are still used in the most complex eye surgeries — the cut is so clean that tissue cells are minimally traumatized, and healing occurs faster.

And one more powerful engineering case — vitrification (glassification). Mankind has chosen glass as the most reliable “safe” for nuclear waste. Liquid radioactive waste is mixed with special additives, melted, and cooled into blocks. The trick is that dangerous isotopes are not just poured inside, they are chemically embedded into the atomic grid of the glass. Glass is chemically inert, it doesn’t rust like metal or decompose for thousands of years. This is perhaps the only material that engineers trust to store hazardous substances on a geological time scale. Yes, it takes about a million years for a discarded bottle to decompose.

And finally. Digging into history, it turns out that the Romans were engaged in nanotechnology 1600 years before we even invented the word. In the British Museum stands the “Lycurgus Cup” (4th century AD). If you look at it under normal lighting, it’s greenish and opaque. But if you place a light source inside the cup, the glass flashes bright rubin red.

Until the 1990s, scientists could not understand how this was achieved. An electron microscope showed: Roman craftsmen added gold and silver, ground to nanoparticles about 50 nanometers in size (about 1000-1800 times thinner than a hair). This size of particles triggers a quantum effect known as surface plasmon resonance: electrons in the metal begin to oscillate such that they absorb some wavelengths of light and let others pass depending on the angle of incidence. The funniest thing is that the Romans did this empirically, “by eye,” and we’ve only just learned to replicate this consciously in photonics. It’s crazy to think you could handle 50 nm gold dust by eye. This moment required additional googling.

It’s unlikely the Romans mechanically crushed the metal to 50 nanometers — they had no such mills.

More likely, they added gold and silver in the form of salts or foil to the molten glass mass. The nanoparticles formed not by crushing, but by crystallization and sedimentation from the melt under very precise temperature conditions (“glass prescription”). This is even more complex chemistry than simple grinding.

The most astonishing thing is not that they did it, but that the ratio of gold to silver was maintained perfectly. Changing the concentration of gold by just 1% would alter the color to something other than pure ruby red. This indicates that the craftsmen mastered the technology incredibly accurately, although they likely did not understand the mechanism. And that they had a heck of a lot of time for all kinds of nonsense;) probably many generations dedicated their lives to experimenting. Because it’s hard to see why all this was necessary.

There’s a beautiful hypothesis (unproven, but popular) that the cup could have been used as a detector. If you pour a different liquid into it (for example, alcohol with impurities or poison), the refractive index changes, and the color of the “flash” might vary.

Data Science: The Modern Alchemy of the 21st Century | November 16 2025, 04:02

A cryptic post today. While writing a book on RecSys, I caught myself thinking that modern data science is essentially the alchemy of the 21st century. Half of the “best practices” in algorithms lack a solid mathematical framework. It’s a set of heuristics that “just work”. Much like in the 17th century where they mixed everything indiscriminately, it happens now, and if something works better, everyone else starts doing the same. There’s just no answer to the question “why”.

Take, for example, the NCF/NeuMF (Neural Collaborative Filtering) algorithm. The logic goes like this. Say, there are a million movie ratings by users. And 100 million ratings by users yet given – users can’t watch every movie in the world. But out of these 100 million, you need to choose candidates for advertising for a particular user. The algorithm, of course, has a training phase, where weights are calculated, and a prediction stage, where these weights are used on the incoming data.

(What the algorithm does. Essentially, it’s an ensemble of three sub-algorithms, two of which generate their own conclusions, and then their decisions go to a new neural network, the third algorithm, which provides the final recommendation. Smartly, it’s a hybrid of GMF (matrix factorization) and MLP (Multi-Layer Perceptron). The first of these two is based on matrix decomposition, and the second represents a neural network with multiple layers. Weights are adjusted on training data.)

For one positive example, it takes 4 negative ones. Why four? Just because it’s “not too many and not too few”. Would 8 be better? Unknown, but it would definitely take longer to learn.

Why are embedding dimensions 32? or 64? There’s no formula. It’s the “golden mean” between a “dumb” model (few k) and an “overtrained” (many k).

Now about the neural network. Why is the MLP block built as a “tower” (64 -> 32 -> 16)? Why not (50 -> 25 -> 10)? Why ReLU between them (and not tanh for example)? Pure empiricism. The number of layers in the tower is also adjusted.

Why do GMF and MLP parts have different embeddings at the input? Because the authors of the paper tried it, and it “worked out better”. No mathematical proof. Why do they go to the final layer with equal weights? Because they just do.

Why are the outputs of the two paths “concatenated” (concat), and not added or multiplied? “Experience showed that this way the result is more accurate.”

And so it is with everything, up to the choice of optimizer Adam or the “magical” learning_rate=0.001, although at least these have some mathematical basis.

That is, at least a dozen parameters of one algorithm are empirically chosen, with no clear confidence that they are independent of each other. But many of them depend on the dataset, but no one knows how 😉

In general, alchemy.

Metchnikoff: Beyond Science and Survival | November 13 2025, 04:53

I was reading Metchnikoff’s biography (don’t ask why I ended up there) and thought about how much can fit into one life. He wasn’t just a scientist, but rather like a saga:

His elder brother Ivan was the prototype for Leo Tolstoy’s “The Death of Ivan Ilyich.” Another brother, Lev, was a prominent anarchist, sociologist and fought in Italy alongside Garibaldi. Metchnikoff himself tried to end his life twice: the first time after the death of his first wife (who, sick with tuberculosis, was carried to the church on a chair). He took morphine but survived. The second time was when his second wife Olga fell critically ill with typhus. He deliberately inoculated himself with relapsing fever. Fortunately, both survived. However, the Grim Reaper with his scythe only came after his third consecutive heart attack.

The dude graduated from university at 19 as an external student. I.M. Sechenov himself recommended him for a professorship. But Metchnikoff was “blackballed” (rejected) by one vote. In protest, Sechenov resigned along with him.

He founded the first bacteriological station in the country at that time in Odessa. But due to an employee mistake (they spoiled the anthrax vaccine) an entire flock of sheep died. After this scandal, he left Russia. The station — on Leo Tolstoy Street.

In Paris, he was immediately taken under the wing of Louis Pasteur (the father of pasteurized milk), who supported his theory and gave him a lab in his institute. There, Metchnikoff worked for 28 years, becoming the deputy director.

While studying cholera at the Pasteur Institute, Metchnikoff proposed a theory that not everyone who comes into contact with the pathogen gets sick. He suggested that it’s all about… (of course) the gut flora. To prove it, he deliberately drank a culture with cholera vibrios. Nothing happened (it would have surely happened to you, Metchnikoff thought)

In the end, he received the Nobel Prize for the discovery of phagocytosis (cellular immunity). He is also “the father of gerontology” — Metchnikoff was the one who proposed the theory that to achieve longevity, one must combat bad bacteria in the gut with probiotics. Now, they say, gerontologists around the world drink sour milk on May 15th remembering Metchnikoff.

He died in Paris, and his ashes are kept in the library of the Pasteur Institute.

Also, in the English Wikipedia he’s Élie Metchnikoff. Not easy to guess.

In the photo, Metchnikoff and Leo Tolstoy are discussing immunology.