Misadventures in Keyboard Layouts: Searching for Gremlin, Finding Surprises | April 28 2026, 20:33

This is me typing the word gremlin, without switching the keyboard layout. Wanted to read about the query language for graph databases, need it for work. Google surprises, it does surprise

Navigating the Depths of High-Dimensional Spaces | April 13 2026, 23:17

I am now working a lot with high-dimensional vectors, and some things that I hadn’t fully realized before are really starting to tickle my brain. Our 3D intuition doesn’t just not work there—it lies.

It turns out that any two random vectors in high-dimensional space are almost certainly nearly perpendicular to each other. Almost all the space is one continuous “equator”.

Much of machine learning is built on exactly this. If your embeddings suddenly show high cosine similarity (for example, 0.8 — this is not a statistical error, but a powerful signal. It’s almost impossible to randomly converge like this in a 1000-dimensional world.

In such spaces, almost all the mass of data is concentrated in an extremely thin surface layer. The “insides” of objects are mathematically empty.

This can be easily verified with such an imaginary example. Take the “skin” of a multidimensional sphere with a thickness of just 1% of the radius. The volume of the sphere is proportional to the radius raised to the power of its dimensionality.

• In three-dimensional space, the pulp (0.99 of the radius) occupies 97% of the volume, you raise 0.99 to the third power.

• In 1000D, the pulp occupies just 0.000043%.

You can understand it differently. For a point to be closer to the origin, it requires that along all axes the coordinates need to be close to the origin. If one axis has a high value, that’s it, the point has gone. If you take points randomly, the mere probability that they all at once will be below any value decreases with the growth of dimensionality, and decreases quickly.

All the “meat” of the data always ends up in the skin. Any sample in High-D is essentially a set of boundary values.

For white noise in high dimensions, the distance between the closest and the farthest neighbor becomes almost the same. The concept of “closeness” simply degrades.

CPU vs GPU: A Speed Challenge in Embedding Creation | April 11 2026, 18:08

When working with certain tasks, the difference between a CPU and a GPU is simply astounding. For example, I need to create many (millions) of embeddings, model BGE M3. Running this on my quite powerful 24-core Intel Core Ultra 9 285K processor takes 45.85 seconds to create 500 embeddings, while using an NVIDIA 5090 GPU, the same task is completed in just 0.36 seconds. It is so fast that I specifically wrote this benchmark to figure out whether my GPU is being utilized at all. The program that sends requests to TEI does it in test mode not actively enough (roughly a couple of times per second), and the GPU load graphs are practically zero.

— Testing http://localhost:8080/embed — <– CPU version

Requests completed: 500

Total time: 45.85 sec

Throughput: 10.90 req/sec

Average latency (Avg Latency): 4386.11 ms

P95 latency: 5021.88 ms

— Testing http://localhost:8090/embed — <– GPU version (NVIDIA 5090)

Requests completed: 500

Total time: 0.36 sec

Throughput: 1398.69 req/sec

Average latency (Avg Latency): 31.38 ms

P95 latency: 53.18 ms

========================================

RESULT: http://localhost:8090/embed is 99.22% faster

Smartfolio.me: Revolutionizing Knowledge Organization with Advanced Features | March 19 2026, 04:01

My creation – the knowledge organization tool Smartfolio.me – has gained new features. I’m attaching a five-minute video overview.

It’s like Google Docs, but you can embed documents within each other, creating a network of connected knowledge, and these documents can be PDFs and regular texts.

Upload a PDF, the program converts it into images, and you can highlight any sections right on the pages to leave a comment or ask a question.

If something in the text is unclear, you highlight the area and press “elaborate” — the LLM will detail everything thoroughly, taking into account the context of the entire document, and the explanation will stay linked to the highlighted fragment.

You can simply cut out a piece from a PDF, and the LLM extracts clean text or a ready-made formula from it.

In the PDF window, there is now a small panel — all comments and explanations are immediately visible there, so you can quickly jump to the necessary parts.

You can cut out a diagram or graph from a PDF, copy it as a picture, and paste it into your text. It will automatically crop “on the fly” and save in the database, not as a copy but as a link to the page with crop parameters.

If you delete the page link in the text, it won’t disappear completely but will go into a special list, from where you can reattach it somewhere else or delete it finally. The same document can be inserted in several places. If you add a comment to it, it updates everywhere where this document is linked.

Mathematics is fully supported — LaTeX formulas can be not only viewed but also clicked to adjust them in the editor.

You can generate formulas by description. Just write in words what formula you need (for example, “binomial distribution”), and the system itself outputs the ready formula code.

Now there is a system of plugins – essentially isolated experimental functions separate from the main program. For instance, there is a plugin that recursively collects all subpages into one long document — convenient if you need to read or print everything at once.

Or consider the “YouTube Transcript Cleaning” plugin. If there is a dirty lecture text from YouTube, the plugin will punctuate, paragraph, and create neat headers.

If you insert a link to a website, it opens in a column next to it — you can read the source and simultaneously take your notes. However, some websites do not allow embedding on foreign pages. The system recognizes such sites, and they open in a new tab.

The left panel with the list of pages can be hidden or resized with the mouse, so it doesn’t take up space on the screen.

You can simply copy and paste an image or screenshot, and it will not just insert, but also upload to the database.

It supports working from a mobile phone. On the phone, the interface switches to a single-column mode for convenient reading and commenting on the go.

Multiple databases are supported – you can switch between them. You can connect different databases and different LLMs and switch between them.

Exploring Multilingual Vocabulary in Nabokov’s Works with Apple Books | March 15 2026, 23:20

Man, it’s really convenient. Just sitting here reading.

The usage pattern is as follows: I hold the phone in my hands. There, in apple books, this and that book. You see an unfamiliar word – it will likely be in the word list of the chapter. The definition takes into account the translation by Nabokov himself. Then you look a couple words ahead, put the phone down, continue reading. You encounter those words, and they are still in your short-term memory, and hooray, you understand. During a break, you load the next couple of words into your brain. You have to hold the phone and flip through, each page contains 4-5 definitions.

Now, every word has definitions in English (interpretation), French, and German. Consequently, I can publish four books.

Overall, my level of English matches what my app predicts about which words will be challenging. But someday I’ll need the same for French, and it will require an assessment of the difficulty level for each word because even some basic words will be unclear to me. I’m not sure that a book with basic words will be handy. With rare ones – definitely handy.

Crafting Nabokov’s Dictionary: A Multilingual Lexical Journey | March 15 2026, 18:30

I’m reading Nabokov and decided to take a break to create a convenient app “Nabokov’s Dictionary” and am considering selling it on Amazon as a book. Essentially, it looks like this (see screenshot) – definitions of complex words in English, Russian, German, and French, in the same order they appear in the original book.

Would you buy such a book?

To accurately make their definitions, I also wrote an aligner – a program that matches sentences and paragraphs in English with their translations (Nabokovian) into Russian. And when a word’s definition is created, it uses not only the knowledge of LLM but also the Russian translation by the author. It’s worth separately discussing how the algorithm works (I invented it myself because everything I found online did not work as I needed). It first finds long sentences and matches the longest sentences with their pair through cosine similarity of embedding vectors created through the multilingual e5 model. These sentences become anchors. Then, assuming that for long sentences the error is almost excluded, the longest sentence between anchors is found, and everything repeats recursively. There are many situations where a sentence in Russian has no equivalent in English and vice versa, where a sentence is split into two, or conversely two are merged into one. The algorithm handles this as best as it can. The result is quite a good quality of alignment. To such an extent, that errors in alignment can hardly be found (but they are likely still there). Either way, it is only needed for the context for translating words, even if there are rare errors, it’s not a big deal.

Would you buy such a book?

Mapping Global Friendships and Rivalries: A Color-Coded Matrix Analysis | March 12 2026, 03:29

For fun, I decided to make a matrix of who is friends with whom and who is enemies with whom. For each country-country pair, I asked Gemini which of the five categories the relations fall into: “at daggers drawn” (purple), “predominantly unfriendly” (red), “neutral” (yellow), “predominantly friendly” (blue), “friends” (green). Lisa said that “neutral” should be purple. Overall, the quality of Gemini’s assessments is quite good.

Among all countries, three red lines stand out. These are countries that are on very bad terms with many others. Well, you guessed Russia right. And what is the second country? Israel? No, it’s Belarus and Venezuela.

In the top five countries that everyone is friends with and who have many friends themselves, LLM included the USA, United Kingdom, Canada, France, and Germany. There is an anti-rating – these are countries that have very bad relations (“at daggers drawn”) with many others. In this rating, Russia is in first place with 21 countries, and Israel is in second place with 18 enemies. Following them, with a significant gap, are Syria and the USA with 9 enemies each. There is also a separate Conflict Zone rating – this is the sum of red and purple. Russia, Venezuela, Belarus, Israel, USA, Iran, Ukraine.

There is a “pacifists’ club”. These are the ones who have no enemies at all, sorted by the number of friends. Rating: Bahamas, Vatican, Luxembourg, Angola, Singapore, Iceland, Jamaica, Tanzania, Zambia.

I was curious, what if I apply the formula: the enemy of my enemy is my friend? What would change? This led to new colors on the matrix – logic friends.

The most unexpected leader of the Master Pragmatists ranking was Taiwan (25 logical connections). Why so? In the logic of LLM, Taiwan is a country that is officially recognized by few, but because of its global opposition to China, it automatically becomes a “logical friend” for everyone who has strained relations with Beijing. This is confirmed in the Shadow Bridges section: Taiwan has 23 connections beyond its region. It literally “stitches” different parts of the world together through a common problem.

The report “Secret Partners” – a list of geopolitical oxymorons. These are pairs that are “at daggers drawn” in official news but are forced to be friends by Gemini’s calculation. For example, Afghanistan – USA/United Kingdom. Despite the status “rather bad relations”, Gemini’s logic sees them as “logical friends”. Possibly due to common regional threats (like ISIS) or dependence on humanitarian and back channels. Or here’s a strange alliance “Belarus — Hungary”. Nominal — different camps, factually — similar style of rhetoric and common “enemies” in Brussels. Eritrea — Ethiopia: Status “at daggers drawn”, but at the same time, they became logical friends.

In the report “Most Controversial,” the first places are taken by the USA, and then with a significant gap, Russia, and even larger – United Kingdom, Canada, Ukraine. These are countries with the highest Love x Hate product value. That is, countries that have many friends and enemies at the same time.

Another report – the indifferent ones. About them, LLM couldn’t say much, apparently because they bother no one (both literally and figuratively). There are, for example, Madagascar and Haiti.

I also tried to cluster by the strength of friends and got four groups of countries.

The largest cluster. Core: China, Russia, Iran, India, and BRICS+ countries, as well as almost the entire African continent (from Egypt to South Africa) and a significant part of the Middle East (UAE, Saudi Arabia, Qatar).

The second cluster mainly included European countries. Core: France, Germany, United Kingdom. The algorithm determined Ukraine and Israel to be here. Logically: their survival depends on “predominantly friendly relations” with the European core. In this same club are Armenia, Georgia, and Serbia. Apparently, despite all the political swings, Gemini considers their ties to Europe more fundamental than any others.

The third cluster included the USA, Canada, Brazil, Mexico, and, for example, Taiwan. Officially, it can be a “logical friend” to all of China’s enemies, but by “strength of friends,” it is permanently sewn to the American block. The Vatican also ended up here, which makes this club not only economic but also somewhat “values-based.”

The fourth cluster, the most compact and specialized, included countries of Oceania and Southeast Asia. Leaders: Australia, Japan, New Zealand, Singapore. This turned out to be a club of countries trying to balance in the most complex region of the planet. Here are almost all island states (Fiji, Samoa, Tonga).

What else could we extract from this information?

Gravitational Mastery: Semikhatov’s Cinematic Triumph | March 09 2026, 14:56

Semikhatov’s movie about gravity turned out to be really cool. Of course, it’s quite popular, but understandably so – they didn’t want to scare off the audience. It’s very cool and professionally made.

I have Semikhatov’s book on my shelf (“Everything That Moves”). It’s also popular, but it’s a bit more serious in its presentation, at times with formulas and loaded with illustrations. Later, my opinion of him slightly soured due to his specific way of conducting podcasts, constantly interrupting guests and answering his own questions in a way that outshines the guest demonstratively. But in the movie, he looks absolutely great. I recommend it.

The link is in the first comment.