CPU vs GPU: A Speed Challenge in Embedding Creation | April 11 2026, 18:08

When working with certain tasks, the difference between a CPU and a GPU is simply astounding. For example, I need to create many (millions) of embeddings, model BGE M3. Running this on my quite powerful 24-core Intel Core Ultra 9 285K processor takes 45.85 seconds to create 500 embeddings, while using an NVIDIA 5090 GPU, the same task is completed in just 0.36 seconds. It is so fast that I specifically wrote this benchmark to figure out whether my GPU is being utilized at all. The program that sends requests to TEI does it in test mode not actively enough (roughly a couple of times per second), and the GPU load graphs are practically zero.

— Testing http://localhost:8080/embed — <– CPU version

Requests completed: 500

Total time: 45.85 sec

Throughput: 10.90 req/sec

Average latency (Avg Latency): 4386.11 ms

P95 latency: 5021.88 ms

— Testing http://localhost:8090/embed — <– GPU version (NVIDIA 5090)

Requests completed: 500

Total time: 0.36 sec

Throughput: 1398.69 req/sec

Average latency (Avg Latency): 31.38 ms

P95 latency: 53.18 ms

========================================

RESULT: http://localhost:8090/embed is 99.22% faster

Smartfolio.me: Revolutionizing Knowledge Organization with Advanced Features | March 19 2026, 04:01

My creation – the knowledge organization tool Smartfolio.me – has gained new features. I’m attaching a five-minute video overview.

It’s like Google Docs, but you can embed documents within each other, creating a network of connected knowledge, and these documents can be PDFs and regular texts.

Upload a PDF, the program converts it into images, and you can highlight any sections right on the pages to leave a comment or ask a question.

If something in the text is unclear, you highlight the area and press “elaborate” — the LLM will detail everything thoroughly, taking into account the context of the entire document, and the explanation will stay linked to the highlighted fragment.

You can simply cut out a piece from a PDF, and the LLM extracts clean text or a ready-made formula from it.

In the PDF window, there is now a small panel — all comments and explanations are immediately visible there, so you can quickly jump to the necessary parts.

You can cut out a diagram or graph from a PDF, copy it as a picture, and paste it into your text. It will automatically crop “on the fly” and save in the database, not as a copy but as a link to the page with crop parameters.

If you delete the page link in the text, it won’t disappear completely but will go into a special list, from where you can reattach it somewhere else or delete it finally. The same document can be inserted in several places. If you add a comment to it, it updates everywhere where this document is linked.

Mathematics is fully supported — LaTeX formulas can be not only viewed but also clicked to adjust them in the editor.

You can generate formulas by description. Just write in words what formula you need (for example, “binomial distribution”), and the system itself outputs the ready formula code.

Now there is a system of plugins – essentially isolated experimental functions separate from the main program. For instance, there is a plugin that recursively collects all subpages into one long document — convenient if you need to read or print everything at once.

Or consider the “YouTube Transcript Cleaning” plugin. If there is a dirty lecture text from YouTube, the plugin will punctuate, paragraph, and create neat headers.

If you insert a link to a website, it opens in a column next to it — you can read the source and simultaneously take your notes. However, some websites do not allow embedding on foreign pages. The system recognizes such sites, and they open in a new tab.

The left panel with the list of pages can be hidden or resized with the mouse, so it doesn’t take up space on the screen.

You can simply copy and paste an image or screenshot, and it will not just insert, but also upload to the database.

It supports working from a mobile phone. On the phone, the interface switches to a single-column mode for convenient reading and commenting on the go.

Multiple databases are supported – you can switch between them. You can connect different databases and different LLMs and switch between them.

Exploring Multilingual Vocabulary in Nabokov’s Works with Apple Books | March 15 2026, 23:20

Man, it’s really convenient. Just sitting here reading.

The usage pattern is as follows: I hold the phone in my hands. There, in apple books, this and that book. You see an unfamiliar word – it will likely be in the word list of the chapter. The definition takes into account the translation by Nabokov himself. Then you look a couple words ahead, put the phone down, continue reading. You encounter those words, and they are still in your short-term memory, and hooray, you understand. During a break, you load the next couple of words into your brain. You have to hold the phone and flip through, each page contains 4-5 definitions.

Now, every word has definitions in English (interpretation), French, and German. Consequently, I can publish four books.

Overall, my level of English matches what my app predicts about which words will be challenging. But someday I’ll need the same for French, and it will require an assessment of the difficulty level for each word because even some basic words will be unclear to me. I’m not sure that a book with basic words will be handy. With rare ones – definitely handy.

Crafting Nabokov’s Dictionary: A Multilingual Lexical Journey | March 15 2026, 18:30

I’m reading Nabokov and decided to take a break to create a convenient app “Nabokov’s Dictionary” and am considering selling it on Amazon as a book. Essentially, it looks like this (see screenshot) – definitions of complex words in English, Russian, German, and French, in the same order they appear in the original book.

Would you buy such a book?

To accurately make their definitions, I also wrote an aligner – a program that matches sentences and paragraphs in English with their translations (Nabokovian) into Russian. And when a word’s definition is created, it uses not only the knowledge of LLM but also the Russian translation by the author. It’s worth separately discussing how the algorithm works (I invented it myself because everything I found online did not work as I needed). It first finds long sentences and matches the longest sentences with their pair through cosine similarity of embedding vectors created through the multilingual e5 model. These sentences become anchors. Then, assuming that for long sentences the error is almost excluded, the longest sentence between anchors is found, and everything repeats recursively. There are many situations where a sentence in Russian has no equivalent in English and vice versa, where a sentence is split into two, or conversely two are merged into one. The algorithm handles this as best as it can. The result is quite a good quality of alignment. To such an extent, that errors in alignment can hardly be found (but they are likely still there). Either way, it is only needed for the context for translating words, even if there are rare errors, it’s not a big deal.

Would you buy such a book?

Mapping Global Friendships and Rivalries: A Color-Coded Matrix Analysis | March 12 2026, 03:29

For fun, I decided to make a matrix of who is friends with whom and who is enemies with whom. For each country-country pair, I asked Gemini which of the five categories the relations fall into: “at daggers drawn” (purple), “predominantly unfriendly” (red), “neutral” (yellow), “predominantly friendly” (blue), “friends” (green). Lisa said that “neutral” should be purple. Overall, the quality of Gemini’s assessments is quite good.

Among all countries, three red lines stand out. These are countries that are on very bad terms with many others. Well, you guessed Russia right. And what is the second country? Israel? No, it’s Belarus and Venezuela.

In the top five countries that everyone is friends with and who have many friends themselves, LLM included the USA, United Kingdom, Canada, France, and Germany. There is an anti-rating – these are countries that have very bad relations (“at daggers drawn”) with many others. In this rating, Russia is in first place with 21 countries, and Israel is in second place with 18 enemies. Following them, with a significant gap, are Syria and the USA with 9 enemies each. There is also a separate Conflict Zone rating – this is the sum of red and purple. Russia, Venezuela, Belarus, Israel, USA, Iran, Ukraine.

There is a “pacifists’ club”. These are the ones who have no enemies at all, sorted by the number of friends. Rating: Bahamas, Vatican, Luxembourg, Angola, Singapore, Iceland, Jamaica, Tanzania, Zambia.

I was curious, what if I apply the formula: the enemy of my enemy is my friend? What would change? This led to new colors on the matrix – logic friends.

The most unexpected leader of the Master Pragmatists ranking was Taiwan (25 logical connections). Why so? In the logic of LLM, Taiwan is a country that is officially recognized by few, but because of its global opposition to China, it automatically becomes a “logical friend” for everyone who has strained relations with Beijing. This is confirmed in the Shadow Bridges section: Taiwan has 23 connections beyond its region. It literally “stitches” different parts of the world together through a common problem.

The report “Secret Partners” – a list of geopolitical oxymorons. These are pairs that are “at daggers drawn” in official news but are forced to be friends by Gemini’s calculation. For example, Afghanistan – USA/United Kingdom. Despite the status “rather bad relations”, Gemini’s logic sees them as “logical friends”. Possibly due to common regional threats (like ISIS) or dependence on humanitarian and back channels. Or here’s a strange alliance “Belarus — Hungary”. Nominal — different camps, factually — similar style of rhetoric and common “enemies” in Brussels. Eritrea — Ethiopia: Status “at daggers drawn”, but at the same time, they became logical friends.

In the report “Most Controversial,” the first places are taken by the USA, and then with a significant gap, Russia, and even larger – United Kingdom, Canada, Ukraine. These are countries with the highest Love x Hate product value. That is, countries that have many friends and enemies at the same time.

Another report – the indifferent ones. About them, LLM couldn’t say much, apparently because they bother no one (both literally and figuratively). There are, for example, Madagascar and Haiti.

I also tried to cluster by the strength of friends and got four groups of countries.

The largest cluster. Core: China, Russia, Iran, India, and BRICS+ countries, as well as almost the entire African continent (from Egypt to South Africa) and a significant part of the Middle East (UAE, Saudi Arabia, Qatar).

The second cluster mainly included European countries. Core: France, Germany, United Kingdom. The algorithm determined Ukraine and Israel to be here. Logically: their survival depends on “predominantly friendly relations” with the European core. In this same club are Armenia, Georgia, and Serbia. Apparently, despite all the political swings, Gemini considers their ties to Europe more fundamental than any others.

The third cluster included the USA, Canada, Brazil, Mexico, and, for example, Taiwan. Officially, it can be a “logical friend” to all of China’s enemies, but by “strength of friends,” it is permanently sewn to the American block. The Vatican also ended up here, which makes this club not only economic but also somewhat “values-based.”

The fourth cluster, the most compact and specialized, included countries of Oceania and Southeast Asia. Leaders: Australia, Japan, New Zealand, Singapore. This turned out to be a club of countries trying to balance in the most complex region of the planet. Here are almost all island states (Fiji, Samoa, Tonga).

What else could we extract from this information?

Gravitational Mastery: Semikhatov’s Cinematic Triumph | March 09 2026, 14:56

Semikhatov’s movie about gravity turned out to be really cool. Of course, it’s quite popular, but understandably so – they didn’t want to scare off the audience. It’s very cool and professionally made.

I have Semikhatov’s book on my shelf (“Everything That Moves”). It’s also popular, but it’s a bit more serious in its presentation, at times with formulas and loaded with illustrations. Later, my opinion of him slightly soured due to his specific way of conducting podcasts, constantly interrupting guests and answering his own questions in a way that outshines the guest demonstratively. But in the movie, he looks absolutely great. I recommend it.

The link is in the first comment.

Seeking Alpha Testers for a Revolutionary Text and PDF Management Tool | March 03 2026, 03:02

Looking for alpha-testers. As part of R&D and for my own tasks, I wrote a productivity tool (I actually wrote about this in my last post, but Facebook said that because I put a link in the post, only 12% saw it). Now I want to check if it will be useful to anyone else. If the idea resonates with you — let me know, and I will share access.

Website smartfolio dot me. What’s the main idea?

It’s an online notebook for working with text and PDFs, organized as a graph. It looks like Google Docs, but there’s an important difference: you can attach “child” documents to specific parts of the main text to expand on details or clarify concepts. These “comments” themselves are full documents and can have their own nested branches.

If there’s a fragment in the text that is unclear, you can ask the system to explain it (this will require your Google Gemini API key).

The system uses the full context of the document to generate a response.

Explanations are permanently attached to a specific place in the text.

This is super convenient when reading complex scientific articles. For instance, you can highlight the authors’ surnames in a PDF and instantly get a background on them — the information will be attached right to that fragment on the page.

Typical workflow

Upload a complex text and read it right in the app from either a mobile or a computer. As you go, add manual or AI-generated notes to important or unclear sections for future reference.

I do not store your documents, PDFs, images, or API keys on my servers. All data is stored in Turso DB (SaaS, free up to 5 GB).

Screenshots on the website’s main page best describe the project.

How to try?

To register in the app, you need an invite code. Just write me in the comments or in a private message, and I will send it.

Website smartfolio-dot-me

Navigating the Tricky Path of Online Donations: A User Experience Dilemma | February 20 2026, 19:02

Here we have the ultimate tricksters. If you accidentally choose an answer for “would you like to donate?”, getting to “oh, I don’t want to yet” takes about 10 minutes and is fraught with the risk of losing your seats. Because 1) there is no option for ‘don’t want to’ 2) any selection ranges from $5 to $9.60 3) refreshing the page results in an error, forcing you to reselect seats and try not to hit those radio buttons again. And by the way, these were the last two seats in the auditorium. They weren’t available yesterday, but showed up today.

Revolutionizing Research: Introducing a Web-Based Notebook Integrated with AI and PDF Support | February 19 2026, 16:19

I’ve further developed a new tool for myself for working with information and organizing it. The main idea is a web-based notebook for research, studying subjects, working on them, integrated with AI and PDF support.

The main problem with typical PDF readers and notes is that the context is lost as soon as you switch to a new tab. In my tool, each text fragment or PDF becomes a node in a “live” hypertext tree, which I can access from multiple computers at any time.

Work process:

– Contextual AI. I can ask the AI to clarify complex passages right within the document. The explanation stays right where the question was asked. Moreover, it is a separate document, linked to the specific spot in the source. When clicked, you see both the original and the explanation on the screen at the same time.

– Panels instead of windows. If the explanation itself requires clarification, a new panel opens to the right. This allows for an endless chain of queries, never losing the place in the original text. That is, you see several panels at once, and unnecessary ones can be closed.

– PDF support. I can upload a PDF, select an area on the page (e.g., a complex diagram or a list of authors), and the LLM instantly extracts data, supplements, or explains them. The explanation is attached to the spot where it was requested, just like with non-PDFs.

– Nested annotations. My comments are not just static text. They can contain their own PDFs, links, and further sub-tasks for AI, maintaining a depth of nesting that reflects how we actually think.

This is not just a file storage system, but an “engine” for building knowledge.

The tool suits me personally very well, but perhaps it only solves my specific tasks. What do you think, would something like this be useful to others? Would it be useful to you? Should I develop the project into a fully-fledged product and give it to other users for testing?

Exploring LLMs and AI: Connecting Neural Processors to Natural Language Learning | February 15 2026, 15:41

Some thoughts on LLMs and artificial intelligence in general. And in the end about neuromorphic processors and Intel Loihi.

As you all know, fundamentally LLMs operate on the principle of “propose the likely next word using the context from the previous N words,” and then the word enters the context, and the process repeats all over again for the next word. Well, and the context is also processed considering the importance of words.

Now let’s think about how children were taught languages in primitive societies. There were no alphabets, nor grammar. But the grammar itself, according to estimates, was quite complex—based on observations of the small languages of small peoples. Simple grammar is modern when the language has spread to millions and billions.

That is, a child’s brain had to reconstruct grammar in its neurons simply from the flow of speech from those around and through testing the understanding of what was said. It’s likely that the child was corrected if they spoke incorrectly, but somehow this grammar and sound extraction had to settle in the brain—and here the same mechanism as in LLMs is used: which words/sounds go next in what context is determined by latent and uninterpretable rules, which each person in childhood creates in their brain in their own way. That is, roughly speaking, it trains the ML model every time from scratch on the flow of speech from those around. A child does not know what a “case” is, but feels what ending is statistically more likely in a given context.

Actually, modern cognitive science (Karl Friston’s theory) asserts that the brain is literally a “prediction machine.” We constantly generate hypotheses about the next sound or word and correct them when they don’t match (prediction error).

The peculiarity of LLMs is that for them, teachers are texts and images, but for a child’s brain, it’s the living world around, and if all the texts they hear were digitized, their volume wouldn’t even be enough to train a very weak model. LLM sees the word “apple” next to the word “red.” A child sees an apple, feels its smell, taste, weight, and simultaneously hears the sound. This “stitching” of different sensory channels allows building neural connections thousands of times faster than on plain text. That is, modern LLMs take a brute force approach—simply observing the speech of billions, not just their immediate environment. A good question is how the human brain manages to learn from a relatively small dataset. However, it’s a big question whether this dataset is small—for example, lip movements, facial expressions, context provide a lot for building this neural network in the biological brain.

About the context: unlike LLMs, a child understands the speaker’s intention. If mom looks at a cup and says “hot,” the child’s brain limits the search space of meanings to one cup. And if he didn’t understand, he’ll get burned and remember.

One might assume, of course, that the brain already has a ready network at birth. It’s true, but science can’t yet explain it properly. Our entire genetic program has about 20,000 genes encoding proteins, and these 20,000 are responsible for everything—where and how the lungs, heart, bones, blood should be built, and they themselves are of mind-boggling complexity, and somewhere among 3 billion nucleotides and 20,000 genes this information must be recorded.

Apparently, genes encode not a map but an algorithm of self-assembly. Essentially, the architecture of the neural network is built dynamically, and this process begins long before birth. Then it is calibrated by all the signals received by the unborn child, and by the time of birth, there is already a somewhat tuned network in the brain.

It’s likely that the child’s brain is millions of neural networks of different “architectures” that evolve and merge in the learning process. Unlike LLMs, here learning and usage are strictly separated in time. But most importantly—the brain, although the most energy-consuming in the body, consumes very little energy in absolute terms, especially compared to the current “candidates for replacements in hardware.”

In the last few years, there has been active development in the field of neuromorphic systems (for example, the old IBM TrueNorth processor and the actively developing Intel Loihi). In conventional AI, neurons transmit numbers (0.15, 0.88…). In neuromorphic systems, they transmit “spikes” (impulses)—as in the living brain (and the architecture is called Spiking Neural Network – SNN). A few years ago, Intel released Loihi 2. Fully programmable. Neurons on Loihi can change their connections (synapses) right during operation. Supports plasticity—the very biological mechanism when the connection between neurons is strengthened if they often “fire” together. But the main thing—it consumes very little.

In this architecture, the model can continue learning “on the fly” right during operation, without forgetting old data (Continual Learning). Besides that—extreme energy efficiency.

Loihi 2 cannot multiply matrices as modern GPUs do, so completely new software has to be written for them (and this is moving very slowly). No PyTorch or TensorFlow—for Loihi there is only the Lava framework available today. And 1 million neurons from Loihi 2 is very little for LLMs. Therefore, Intel creates systems like Hala Point—it’s an array of 1152 Loihi 2 processors. It contains up to 1.15 billion neurons. Theoretically, in terms of performance per watt, such a system can surpass traditional GPUs by 10–50 times when working with AI models.

Experimental LLMs are already being launched on Loihi 2 (for example, models with 370 million parameters). They are not yet going to replace ChatGPT in the cloud, but theoretically, they are the future for “smart” robots and gadgets that need to understand human speech while running off a small battery.

We’ll observe. It might turn out to be a dud, or it could be another major revolution.