Data – Page 2 – Hi, I'm Rauf Aliev.

CPU vs GPU: A Speed Challenge in Embedding Creation | April 11 2026, 18:08

When working with certain tasks, the difference between a CPU and a GPU is simply astounding. For example, I need to create many (millions) of embeddings, model BGE M3. Running this on my quite powerful 24-core Intel Core Ultra 9 285K processor takes 45.85 seconds to create 500 embeddings, while using an NVIDIA 5090 GPU, the same task is completed in just 0.36 seconds. It is so fast that I specifically wrote this benchmark to figure out whether my GPU is being utilized at all. The program that sends requests to TEI does it in test mode not actively enough (roughly a couple of times per second), and the GPU load graphs are practically zero.

— Testing http://localhost:8080/embed — <– CPU version

Requests completed: 500

Total time: 45.85 sec

Throughput: 10.90 req/sec

Average latency (Avg Latency): 4386.11 ms

P95 latency: 5021.88 ms

— Testing http://localhost:8090/embed — <– GPU version (NVIDIA 5090)

Requests completed: 500

Total time: 0.36 sec

Throughput: 1398.69 req/sec

Average latency (Avg Latency): 31.38 ms

P95 latency: 53.18 ms

========================================

RESULT: http://localhost:8090/embed is 99.22% faster

Navigating the Lexical Complexity of Nabokov’s “Lolita” | April 02 2026, 15:56

I’ve finished the first version of a dictionary-style book on Nabokov’s “Lolita”. The chart shows how the complexity of vocabulary is distributed across the pages of the book. The lower chart averages 25 sentences, displaying the number of complex words on the vertical axis, with colors indicating their complexity/rarity (purple – the most complex, red – less complex, yellow – even less so). But I have already removed two levels, and overall, for a foreigner, all five levels are challenging. In the book, level 3 is marked with a dashed line, level 4 with a simple frame, and level 5 with a double frame. Currently, there are 5794 words, of which 541 are fifth level, 1070 are fourth, 1883 are third, 1393 are second, and 54 are first (the simplest ones). Considering that the first version ended up being 1148 pages, the dictionary will need to be significantly streamlined by removing what can be dispensed with. This mainly pertains to the first and second levels, and some from the third and fourth. The rarity of words is calculated in three ways: through LLM, and through two lists of word frequencies in the English language corpus (300K words).

Not all words are complex. For instance, in the sentence “With the ebb of lust, an ashen sense of awfulness, abetted by the realistic drabness of a gray neuralgic day, crept over me and hummed within my temples.” someone well-acquainted with English might not know the words ebb, abet, drabness, while everything else is familiar, but lower the requirements for the reader, and the dictionary might not be very useful for such cases.

Or consider the sentence:

Homo pollex of science, with all its many sub-species and forms; the modest soldier, spic and span, quietly waiting, quietly conscious of khaki’s viatric appeal; the schoolboy wishing to go two blocks; the killer wishing to go two thousand miles; the mysterious, nervous, elderly gent, with brand-new suitcase and clipped mustache; a trio of optimistic Mexicans; the college student displaying the grime of vacational outdoor work as proudly as the name of the famous college arching across the front of his sweatshirt; the desperate lady whose battery has just died on her; the clean-cut, glossy-haired, shifty-eyed, white-faced young beasts in loud shirts and coats, vigorously, almost priapically thrusting out tense thumbs to tempt lone women or sadsack salesmen with fancy cravings.

My browser even highlights four words here.

I have definitions of words in English, German, French, and Russian. I’ve encountered the issue that different words from the text are considered complex in different languages, yet they are unified for me. So, I’ll have to mark, for example, French words in the English text separately, so they are not included in the French version, since there, the reader knows, for instance, what quel mot means.

Overall, this weekend I’ll be manually removing about half, and then I can make the cover and list it on Amazon.

| March 21 2026, 13:03

https://x.com/byajperez/status/2035200878198538270

Mapping Global Friendships and Rivalries: A Color-Coded Matrix Analysis | March 12 2026, 03:29

For fun, I decided to make a matrix of who is friends with whom and who is enemies with whom. For each country-country pair, I asked Gemini which of the five categories the relations fall into: “at daggers drawn” (purple), “predominantly unfriendly” (red), “neutral” (yellow), “predominantly friendly” (blue), “friends” (green). Lisa said that “neutral” should be purple. Overall, the quality of Gemini’s assessments is quite good.

Among all countries, three red lines stand out. These are countries that are on very bad terms with many others. Well, you guessed Russia right. And what is the second country? Israel? No, it’s Belarus and Venezuela.

In the top five countries that everyone is friends with and who have many friends themselves, LLM included the USA, United Kingdom, Canada, France, and Germany. There is an anti-rating – these are countries that have very bad relations (“at daggers drawn”) with many others. In this rating, Russia is in first place with 21 countries, and Israel is in second place with 18 enemies. Following them, with a significant gap, are Syria and the USA with 9 enemies each. There is also a separate Conflict Zone rating – this is the sum of red and purple. Russia, Venezuela, Belarus, Israel, USA, Iran, Ukraine.

There is a “pacifists’ club”. These are the ones who have no enemies at all, sorted by the number of friends. Rating: Bahamas, Vatican, Luxembourg, Angola, Singapore, Iceland, Jamaica, Tanzania, Zambia.

I was curious, what if I apply the formula: the enemy of my enemy is my friend? What would change? This led to new colors on the matrix – logic friends.

The most unexpected leader of the Master Pragmatists ranking was Taiwan (25 logical connections). Why so? In the logic of LLM, Taiwan is a country that is officially recognized by few, but because of its global opposition to China, it automatically becomes a “logical friend” for everyone who has strained relations with Beijing. This is confirmed in the Shadow Bridges section: Taiwan has 23 connections beyond its region. It literally “stitches” different parts of the world together through a common problem.

The report “Secret Partners” – a list of geopolitical oxymorons. These are pairs that are “at daggers drawn” in official news but are forced to be friends by Gemini’s calculation. For example, Afghanistan – USA/United Kingdom. Despite the status “rather bad relations”, Gemini’s logic sees them as “logical friends”. Possibly due to common regional threats (like ISIS) or dependence on humanitarian and back channels. Or here’s a strange alliance “Belarus — Hungary”. Nominal — different camps, factually — similar style of rhetoric and common “enemies” in Brussels. Eritrea — Ethiopia: Status “at daggers drawn”, but at the same time, they became logical friends.

In the report “Most Controversial,” the first places are taken by the USA, and then with a significant gap, Russia, and even larger – United Kingdom, Canada, Ukraine. These are countries with the highest Love x Hate product value. That is, countries that have many friends and enemies at the same time.

Another report – the indifferent ones. About them, LLM couldn’t say much, apparently because they bother no one (both literally and figuratively). There are, for example, Madagascar and Haiti.

I also tried to cluster by the strength of friends and got four groups of countries.

The largest cluster. Core: China, Russia, Iran, India, and BRICS+ countries, as well as almost the entire African continent (from Egypt to South Africa) and a significant part of the Middle East (UAE, Saudi Arabia, Qatar).

The second cluster mainly included European countries. Core: France, Germany, United Kingdom. The algorithm determined Ukraine and Israel to be here. Logically: their survival depends on “predominantly friendly relations” with the European core. In this same club are Armenia, Georgia, and Serbia. Apparently, despite all the political swings, Gemini considers their ties to Europe more fundamental than any others.

The third cluster included the USA, Canada, Brazil, Mexico, and, for example, Taiwan. Officially, it can be a “logical friend” to all of China’s enemies, but by “strength of friends,” it is permanently sewn to the American block. The Vatican also ended up here, which makes this club not only economic but also somewhat “values-based.”

The fourth cluster, the most compact and specialized, included countries of Oceania and Southeast Asia. Leaders: Australia, Japan, New Zealand, Singapore. This turned out to be a club of countries trying to balance in the most complex region of the planet. Here are almost all island states (Fiji, Samoa, Tonga).

What else could we extract from this information?

Seeking Alpha Testers for a Revolutionary Text and PDF Management Tool | March 03 2026, 03:02

Looking for alpha-testers. As part of R&D and for my own tasks, I wrote a productivity tool (I actually wrote about this in my last post, but Facebook said that because I put a link in the post, only 12% saw it). Now I want to check if it will be useful to anyone else. If the idea resonates with you — let me know, and I will share access.

Website smartfolio dot me. What’s the main idea?

It’s an online notebook for working with text and PDFs, organized as a graph. It looks like Google Docs, but there’s an important difference: you can attach “child” documents to specific parts of the main text to expand on details or clarify concepts. These “comments” themselves are full documents and can have their own nested branches.

If there’s a fragment in the text that is unclear, you can ask the system to explain it (this will require your Google Gemini API key).

The system uses the full context of the document to generate a response.

Explanations are permanently attached to a specific place in the text.

This is super convenient when reading complex scientific articles. For instance, you can highlight the authors’ surnames in a PDF and instantly get a background on them — the information will be attached right to that fragment on the page.

Typical workflow

Upload a complex text and read it right in the app from either a mobile or a computer. As you go, add manual or AI-generated notes to important or unclear sections for future reference.

I do not store your documents, PDFs, images, or API keys on my servers. All data is stored in Turso DB (SaaS, free up to 5 GB).

Screenshots on the website’s main page best describe the project.

How to try?

To register in the app, you need an invite code. Just write me in the comments or in a private message, and I will send it.

Website smartfolio-dot-me

| February 23 2026, 06:31

https://youtu.be/nciLt5Edsa8

Revolutionizing Research: Introducing a Web-Based Notebook Integrated with AI and PDF Support | February 19 2026, 16:19

I’ve further developed a new tool for myself for working with information and organizing it. The main idea is a web-based notebook for research, studying subjects, working on them, integrated with AI and PDF support.

The main problem with typical PDF readers and notes is that the context is lost as soon as you switch to a new tab. In my tool, each text fragment or PDF becomes a node in a “live” hypertext tree, which I can access from multiple computers at any time.

Work process:

– Contextual AI. I can ask the AI to clarify complex passages right within the document. The explanation stays right where the question was asked. Moreover, it is a separate document, linked to the specific spot in the source. When clicked, you see both the original and the explanation on the screen at the same time.

– Panels instead of windows. If the explanation itself requires clarification, a new panel opens to the right. This allows for an endless chain of queries, never losing the place in the original text. That is, you see several panels at once, and unnecessary ones can be closed.

– PDF support. I can upload a PDF, select an area on the page (e.g., a complex diagram or a list of authors), and the LLM instantly extracts data, supplements, or explains them. The explanation is attached to the spot where it was requested, just like with non-PDFs.

– Nested annotations. My comments are not just static text. They can contain their own PDFs, links, and further sub-tasks for AI, maintaining a depth of nesting that reflects how we actually think.

This is not just a file storage system, but an “engine” for building knowledge.

The tool suits me personally very well, but perhaps it only solves my specific tasks. What do you think, would something like this be useful to others? Would it be useful to you? Should I develop the project into a fully-fledged product and give it to other users for testing?

Interactive Text Enhancer: A Tool for Embedding Clarifications | February 12 2026, 16:11

I whipped up this thing in just an hour. Do you think anyone besides me needs it?

Here’s the idea. Take any text – a Wikipedia article, for example. Highlight any segment, say something unclear. The LLM gives us an explanation, and instantly inserts a box right in the text which you can click to open the explanation. In this explanation, there might be something unclear too. We highlight it with the mouse from this explanation, and a box appears there too. This continues until everything is clear. All the boxes remain in the text, so you can always return to them. So, if the idea was unclear to me, maybe it will be to others, and then a ready link with explanations will come in very handy. The result can be shared with colleagues.

For explanations, not just the fragment is used, but also the context. For example, otherwise, the highlighted word Terrier would yield text about a dog breed, not about the search system.

Navigating the Future: Embracing Earth’s Magnetic Field as a GPS Alternative | January 10 2026, 17:41

I learned today that there is and is actively used a technology for navigation using the Earth’s magnetic field. It is used as a replacement or an extension of GPS.

For example, there is the Scandinavian ferry Express 5 of Bornholmslinjen, which insures against GPS problems (which do happen) by using MagNav navigation. Unlike GPS, the Earth’s magnetic field cannot be jammed or spoofed—it simply exists. The ferry follows the same route, and generally, navigation could even be achieved through household fishing sonars.

But there are a few startups that use this technology for indoor navigation, where GPS signals cannot reach. It’s claimed that the navigation accuracy is within 1 meter. That’s more interesting.

GiPStech, Oriient, Mapsted.

The basis of this technology is a process called magnetic fingerprinting. Engineers or mapping robots walk through a building with a smartphone, recording unique distortions of the magnetic field at every point. These distortions are created by the steel frame of the building, rebar in the walls, and large electrical equipment. A database is formed where each coordinate (x, y, z) corresponds to its unique magnetic field vector (intensity, inclination, deviation).

The collected data is uploaded to the cloud platform of the provider company. There, they undergo noise cleaning and are “stitched” together with the digital floor plan. When a user walks through a shopping center, their smartphone reads data from the built-in magnetometer in real-time. Special software (SDK) compares the current readings with those stored in the database. For accuracy to be within 1–2 meters, the system relies not only on magnets. It uses sensor fusion—combining data from the magnetic field with inertial sensors (accelerometer counts steps, gyroscope determines turns) and sometimes Wi-Fi/Bluetooth signals for rough localization.

This technology is certainly being actively implemented for drones. The main technical difficulty there is dealing with their own interference and considering that the magnetic field changes, requiring constant map updates. Electrics, engines create strong magnetic fields, which “drown out” the natural background of the Earth. However, various filtering algorithms (including neural networks) are used, which in real-time “subtract” motor interference from the overall sensor readings. From what I understand, at high altitudes (kilometers), the magnetic field is more “smooth”, therefore the accuracy is lower (about 1–5 km). But if several drones fly together and exchange signals, overall they can provide very good accuracy each. Additionally, a group of drones can measure the gradient (rate of change) of the magnetic field in space, tying location not to absolute values, but to relative ones. Essentially, using a group of drones turns the navigation system from a set of individual receivers into a distributed phased array antenna, capable of filtering global interferences and working with much weaker useful signals. Considering that small drones capable of staying airborne for long periods can be released into the air by the hundreds (and cost pennies), this is a quite promising area for military.

There’s an interesting startup, Zerokey. They release QUANTUM RTLS 2.0. This device provides spatial accuracy to 1.5mm. It’s used in production, for example. Their video shows a “watch” on a worker’s hand that monitors the correctness of assembling something on a table. Here, the principle is ultrasonic, and it’s understandable that these “watches” are paired with stationary sensors and further multilateration.

Tag: Data