In this video, nothing happens. It’s funny that YouTube sparks lively interest not only among the local foxes but also the rabbits.
Category: AI
Navigating Simple English in “Project Hail Mary” | May 10 2026, 15:30
I’ve read about a quarter of Project Hail Mary so far. The English is very simple, easy to read, captivating; the movie so far follows the book closely, but still, it makes reading quite interesting. However, I generally find it hard to read fiction because I keep getting distracted to google stuff. I reached the phrase “..I used the bathroom (or “head” I guess, because I was on the ship)…” and it got me thinking, it’s interesting to learn that the toilet is called differently on a ship not just in Russian. And why “head”? Turns out that “galley” in Danish and German is “head”. Interestingly, galleys are also found on airplanes, and historically, galleys were used only by sailors; officers did not use them.
The text is very childish, and understandably so – the main character is a physics teacher at a school after all. All these motherfluffer and dang it, gosh darn it, fudge, holy moly, for cripes’ sake instead of for Christ’s sake, there’s even bull-puckey instead of bullshit. “To go wee” is how they say “to pee” in the book. I recall, the day before yesterday we entered a mattress store, and the consultant, while discussing the topic “if one of you goes to the toilet, the other won’t even notice that the first one got up” – well, because the mattresses are so soft – actively used the verb “to pee”. So what? 🙂
Update: when the physics teacher encounters an alien ship on page 120, the chapter ends with holy fucking shit! That’s what all the rest was leading to;)
Occasionally, there are quite funny expressions that can even be used in life 🙂 For instance, the main character asks, “Who pooped in your Rice Krispies?” which is the idiom “to poop in someone’s cereal” – “who messed up your meal”.
In conclusion, if you’re choosing your first book to read in English – this one is at the top of my list. Even something seemingly simple like “Harry Potter” is more sophisticated, in my opinion. Here, there’s a lot of dialogue, school level but almost slang-free vocabulary, and a pretty interesting plot. Plus, it’s real science fiction, where the author educates the reader about the scientific method, how the world works, etc., all from the viewpoint of the hero, a physics teacher, who shares various facts and thoughts on how physics works, relating it to the plot in his interactions with other characters or thoughts to himself (rather than directly to the reader). It’s middle school level so far, but maybe it’ll get more complex later on.

Shiba Inu Meets Its Match: A Book Encounter | May 07 2026, 19:28

Exploring Automated Documentation of Large Excel Datasets | May 06 2026, 22:28
I wonder if there exists an agent that takes an Excel table significantly larger than the context window and begins to document its essence. Here are several tabs. Here on tab 5, there is a table with a million rows and five columns. The columns are as follows. We take random data from the table, looks like there are numbers, and there – surnames. We assume that there are numbers everywhere – we write a code that checks this assumption and at the same time calculates min/max and a set of unique values. So, few values, only five. We record it. Now we check the surnames. Yes, these are just strings, new sampling showed that they are indeed surnames. Here’s a formula. We see where it points. And so on. And this column – unclear purpose. We look at the data – these are some numbers from 0 to 1. We measure the average and the spread. We ask the user – maybe they’ll provide some comments. They did. It turned out to be a KPI issued to this user from an external system. We record it. And so on. Documentation emerges. Later, when there is documentation, one can request to perform some operations with all this, since the LLM now more or less understands the purpose of the data and their connection, and can build some hypotheses on detecting outliers and verifying them.
The Crucial Role of Data Quality Oversight in Development Projects | May 06 2026, 16:07
Almost every development project features a dedicated functional testing automation team, yet surprisingly, a similar emphasis on Data Quality is rarely found. Regardless of whether data comes from external integrations, users, or is generated by the system itself, it often remains without proper control simply because no one seems to consider it important, and later they struggle with the consequences – they accumulate like a snowball. The longer such issues persist, the harder they are to resolve, eventually leading to a situation where people just resign themselves to the “irreparable” state of the database. It is much better to identify these problems at the moment they arise, while the technical debt has not yet become insurmountable, rather than later figuring out how to prevent them from causing everything to crash;
In essence, there needs to be a constant “supervisor” over all types of databases used by the system (relational, NoSQL, search indexes, or graph databases) — essentially, this is a layer of data quality checking over processes. Of course, there must be clear rules – specifically what to check and which flags to use to mark specific anomalies.
There must be a responsible party for the process (a human, not AI), who will integrate these reports into the development and support workflows. Many data integrity issues cannot just be resolved through an interface — they require the engineering team to develop scripts for mass correction and data cleansing.
Incidentally, this also transitions into the realm of anomaly detection (outlier detection). Machine learning and LLMs for identifying subtle “bad” patterns that traditional rule-based systems might miss.
What do you think about this? Are similar mechanisms implemented in your processes?

Harnessing Chat Data for Semantic Q&A Search | April 30 2026, 04:05
In one evening, I created a simple utility that extracts the Natural Language Processing chat for a year and a half – there are 65,000 messages, and converts it into question-answer pairs with semantic search available. Clicking on a search result (on the left) opens the dialogue in the chat. The messages that are responses to the question are highlighted. And at the top, the original phrasing of the question is highlighted as well.
How it works: the system assumes that people mainly reply to messages that are relatively close in the past. If several replies are made to one message, then it is likely useful and caught the interest of others in the chat. The system takes messages starting from the one many have replied to, ending with the last in the reply-to chain – and among such messages, it selects those that have at least 3 reply-tos to the original question. In essence, it cuts a piece from the chat starting with a popular question so that after the bottom cut, most likely, irrelevant content follows. Such blocks can overlap each other – for example, if someone asked a question while others were replying to something else.
So, if user A asked what the weather was like, and they received answers like “good,” “bad,” “rain,” and there were five messages without a reply-to, and then someone replied to “rain” with the question “why rain”, and five more people replied to this question, then the first question about the weather makes it into the system – the piece ends with 13 messages.
Afterwards, these pieces are summarized into question-answer pairs.
It turns out quite cool.
P.S. In the screenshot, the search query has nothing to do with the search result because I foolishly took the screenshot after I changed the query but before I hit send.

Misadventures in Keyboard Layouts: Searching for Gremlin, Finding Surprises | April 28 2026, 20:33
This is me typing the word gremlin, without switching the keyboard layout. Wanted to read about the query language for graph databases, need it for work. Google surprises, it does surprise

Tesla Robots Gradually Taking to the Streets | April 25 2026, 05:37
Tesla robots are slowly being kicked out onto the street. I rode by on my bike today. Too bad they’re not turned on
Shiba Inu at Work: Turning Daily Moments into Cozy Companionship | April 23 2026, 01:49
How to occupy a dog

Navigating the Depths of High-Dimensional Spaces | April 13 2026, 23:17
I am now working a lot with high-dimensional vectors, and some things that I hadn’t fully realized before are really starting to tickle my brain. Our 3D intuition doesn’t just not work there—it lies.
It turns out that any two random vectors in high-dimensional space are almost certainly nearly perpendicular to each other. Almost all the space is one continuous “equator”.
Much of machine learning is built on exactly this. If your embeddings suddenly show high cosine similarity (for example, 0.8 — this is not a statistical error, but a powerful signal. It’s almost impossible to randomly converge like this in a 1000-dimensional world.
In such spaces, almost all the mass of data is concentrated in an extremely thin surface layer. The “insides” of objects are mathematically empty.
This can be easily verified with such an imaginary example. Take the “skin” of a multidimensional sphere with a thickness of just 1% of the radius. The volume of the sphere is proportional to the radius raised to the power of its dimensionality.
• In three-dimensional space, the pulp (0.99 of the radius) occupies 97% of the volume, you raise 0.99 to the third power.
• In 1000D, the pulp occupies just 0.000043%.
You can understand it differently. For a point to be closer to the origin, it requires that along all axes the coordinates need to be close to the origin. If one axis has a high value, that’s it, the point has gone. If you take points randomly, the mere probability that they all at once will be below any value decreases with the growth of dimensionality, and decreases quickly.
All the “meat” of the data always ends up in the skin. Any sample in High-D is essentially a set of boundary values.
For white noise in high dimensions, the distance between the closest and the farthest neighbor becomes almost the same. The concept of “closeness” simply degrades.

