Mathematics – Hi, I'm Rauf Aliev.

Navigating the Depths of High-Dimensional Spaces | April 13 2026, 23:17

I am now working a lot with high-dimensional vectors, and some things that I hadn’t fully realized before are really starting to tickle my brain. Our 3D intuition doesn’t just not work there—it lies.

It turns out that any two random vectors in high-dimensional space are almost certainly nearly perpendicular to each other. Almost all the space is one continuous “equator”.

Much of machine learning is built on exactly this. If your embeddings suddenly show high cosine similarity (for example, 0.8 — this is not a statistical error, but a powerful signal. It’s almost impossible to randomly converge like this in a 1000-dimensional world.

In such spaces, almost all the mass of data is concentrated in an extremely thin surface layer. The “insides” of objects are mathematically empty.

This can be easily verified with such an imaginary example. Take the “skin” of a multidimensional sphere with a thickness of just 1% of the radius. The volume of the sphere is proportional to the radius raised to the power of its dimensionality.

• In three-dimensional space, the pulp (0.99 of the radius) occupies 97% of the volume, you raise 0.99 to the third power.

• In 1000D, the pulp occupies just 0.000043%.

You can understand it differently. For a point to be closer to the origin, it requires that along all axes the coordinates need to be close to the origin. If one axis has a high value, that’s it, the point has gone. If you take points randomly, the mere probability that they all at once will be below any value decreases with the growth of dimensionality, and decreases quickly.

All the “meat” of the data always ends up in the skin. Any sample in High-D is essentially a set of boundary values.

For white noise in high dimensions, the distance between the closest and the farthest neighbor becomes almost the same. The concept of “closeness” simply degrades.

Exploring the Mystical Connection Between π² and g in Defining a Meter | March 01 2026, 17:11

It turns out that π² ≈ g is not some mystical coincidence. When the first scientists contemplated the definition of the meter, there was one elegant proposal: to make the meter equal to the length of a pendulum that takes exactly one second to swing from one side to the other.

For a mathematical pendulum, the period of oscillation is calculated by the formula: T = 2π √(L / g). If we take the length L = 1 meter and set the full period T = 2 seconds (so that it takes exactly one second for each half swing), the equation implies: g = π² (m/s²).

The definition of the meter was later changed: it was tied to one ten-millionth of the distance from the equator to the North Pole along the meridian passing through Paris. However, this geodetic definition was inspired by the earlier idea with the pendulum. Notably, both approaches match up with an accuracy of 1%. Essentially, since the old “pendulum” definition was the main candidate for a long time, values were adjusted so that the new meter was convenient and close to the measurements customary at that time.

It is also interesting that the number of seconds in a year roughly corresponds to the number of pi * 10^7. Earth’s orbital speed is about v = 30 km/s. The distance from the Sun to Earth is approximately r = 150,000,000 km. Thus, over a year, Earth travels a path of about d = 2 * π * r. Then, the orbital period equals T = d/v = π * 2 * r/v = π * 10⁷ seconds.

The Maddening Ambiguity of Mathematical Notation | December 02 2025, 15:30

If someone tells you that mathematics is an exact science, don’t believe them. Since I’m currently into data science as a hobby, I’m studying all sorts of things from different books and my brain is exploding at how this can happen in a science where every little detail should fit into a system, otherwise it goes by the wayside. Until it gets to notations. It’s a complete mess there. A set of dialects.

Take, for example, common logarithms. The “standard” for how to denote a logarithm depends on which room of the university you are in. In calculus and number theory, log(x) almost always means the natural logarithm ln(x) with base e. The derivative of e^x equals e^x. It’s “natural”. They’re too lazy to write ln. Yet, where decimal logarithms might appear (like in computer science), log(x) suddenly becomes decimal, and ln(x) is based on e.

The expected value E has an argument in square brackets. Meanwhile, the same square brackets in computer science are used for the step function 0/1.

Or if you see a vector – is it a column or a row? In classical mathematics, a vector is always a column. To multiply it by weights, we write T after the vector and then w for the weights. But in many papers, vectors are thought of as rows. And if you see y = xW+b, then x is not a column, because otherwise the dimensions wouldn’t match up. x here is a row. But in the next paper they write Wx+b. And there x is a column 🙂

Angle brackets . For the dot product, the symbol “⋅” is used, but it is hard to see, especially on a whiteboard, and I very often see that mathematicians use angle brackets for dot product. In general, angle brackets are used for the generalized concept of inner product, where the scalar product is a special case. signifies a certain abstract way to multiply a and b and get a number. Meanwhile, in quantum mechanics this would be written as . And for the scalar product, some use a circle with a dot or x in a circle.

And just for the sake of it, in Russia tangent is tg, while in the USA it’s tan. There’s also tan^-1 and arctan, which are the same, though x^-1 generally means 1/x

Exploring the Chaos Game: Creating Fractals From Randomness | October 04 2025, 15:32

I read something interesting today. About fractals. If you take any three points that form a triangle, and then a fourth point anywhere, and subsequently throw a dice, the faces of which are assigned to the first three points. Next, you move from the current point towards the point corresponding to the result on the dice and place a new point halfway; this becomes the new current point. After many iterations, the points start to form the Sierpinski triangle – the one shown in the attached picture. Intuitively, you would think the triangle should be fully filled because it involves random movements in three directions from a randomly chosen point, but no. Moreover, it works even if the starting point is inside the future empty triangle (yes, a few points will disrupt the picture, but that’s it). If you start our experiment with five or six points instead of three, different shapes will form – see the attached picture. This graphical method is called the Chaos Game.

By the way, it may seem obvious, but in case you wondered — all the presented figures have zero area.

If you take two triangles and with a probability p move towards random vertices of the first, and with (1-p) towards random vertices of the second, you end up forming a Barnsley fern (picture №2).

I love such things because they seem like magic at first glance 🙂

(It’s a kind of problem from the same class as the synchronization of metronomes)