Helicopter Installs Anti-Drone System on Moscow Residential Building | June 06 2026, 16:26

I read the news that a “Pantsir” anti-drone system was installed by helicopter onto the roof of a residential high-rise (“House in Sokolniki”) in Moscow. Yes, it’s a full-fledged Pantsir, specifically its anti-drone modification (SMD-E), but I couldn’t resist making this AI photo.

[SKIP]

The Mystery of Tal’s Havana Incident: Chess, Reality, and AI | June 02 2026, 00:50

In one of the chess communities, they posted this photo.

It seems to follow a theme. The Olympics, Havana, Tal really did get hit on the head with a bottle in one of the bars – and he was out of action for several days – but then he returned to the board.

But there are seven obvious differences from reality. The most interesting detail – too many pens and fingers on the right hand. But the most interesting – AI would never correctly portray it, no matter how hard it tries.

Mastering Cross-Posting: From Facebook Frustrations to Dual Blogging Excellence | May 23 2026, 14:28

I have perfected the cross-posting from Facebook to my two blog sites [which almost no one visits] – beinginamerica.com and raufaliev.com. When a new post is published on Facebook, a mechanism is triggered to translate the post into English, process attached images, generate descriptions for them, create a title based on the text of the post and descriptions of the images, generate tags from the same basis, record the post in turso db – this is a cloud database, free up to certain limits, create embeddings via openai, record in qdrant cloud – this is also a cloud database, but vector-based, and finally, upload images to wordpress via API, and publish the post in English and Russian via API.

All would be well, but of all the APIs, the silliest one is Facebook’s. Firstly, for pages like mine, transitioned to New Experience, it’s almost impossible to use most of this API. Well, it’s possible, but you have to spend a long time proving to Facebook that you really need it, by showing startup documents, demonstrating the application, etc. Obviously, they are reluctant to deal with something that takes content out of their system. In addition, the token that gives access to the latest messages is relatively short-lived (possibly a few weeks), and it needs to be obtained anew through a browser only. So, any automation requires regular attention, otherwise it breaks.

If you mess up and don’t offload the latest posts through this Facebook Graph API in time, they just disappear from the list of recent ones and that’s it, no more API access to them. The only way is to request an archive download from Facebook. This download is also rather silly – it requires a lot of transformations and removing unnecessary stuff. For example, in the file containing posts, which I process, for some reason there are links that I sent in comments without accompanying text. And the comments are in a separate file!

To assign tags, I had to solve a separate challenge. Here’s the thing: there are about 10,000 posts over all time. That’s a big chunk, and you can’t build tags from it because it doesn’t fit into the contextual window of the LLM. But you need to. So, I did this: a script takes random posts from the 10,000 in such a volume that their total size is just below the specified limit in tokens, and at the end of this block, it adds the prompt “generate the most common tags for me, 30 pieces” (I simplify the prompt used). In the end, I ran this 10 times and got 10 sets of tags with 30 pieces each, generated for different slices of the database. That made 300 tags, some of which are complete duplicates, while others are synonyms and closely related in meaning. All this is fed into the LLM, and we get a list of tags and a hierarchy of tags. Now we have a limited set of tags that reflect the 10,000 posts as closely as possible. Turns out, that in almost 20 years on Facebook, my breakdown is as follows:

Tag Posts

==================================================

#Russia 3412

#Thoughts 3146

#Tech 3105

#Culture 2765

#Hobbies 2726

#AI 1603

#Science 1367

#Software 1358

#Travel 1298

#Learning 1138

#Society 1050

#Nature 958

#Education 915

#Business 902

#Art 894

#Programming 889

#Humor 840

#History 807

#Gadgets 750

#Moscow 713

#USA 614

#Cinema 567

#Webdev 493

#Music 476

#Sports 473

#Mindset 443

#Auto 400

#Books 386

and so on. This list includes both tags from the limited list and tags that the LLM appointed to content simply because it didn’t find anything suitable in the limited one.

Tags from the limited list became categories on the site. The rest of the tags + these just became regular wordpress tags.

As for image search. I had two ideas on how to do it. The first – OpenCLIP. It’s pretty straightforward but requires hosting the model somewhere. Easy on my machine, but inconvenient to start it each time, plus I planned to move the migrator to a cheap server on Amazon. It’s also okay to calculate in cloud models, but you have to pay a bit, which is yet another dependency. But the main thing – it works quite well without it. I generate descriptions for images using OpenAI, which is used for translating into English anyway, and then create embeddings using a large model. So far, all search tests are a great success. Especially when there’s text on the image, and it’s a big question whether OpenCLIP would have interpreted it successfully.

In the end:

1) wordpress raufaliev.com – free

2) wordpress beinginamerica.com – free

3) turso db where all posts are stored – free

4) qdrant cloud where embeddings are stored – free

5) openai for translation and image descriptions – not free, but inexpensive (cost $30 for post processing over a year).

I attach two screenshots – how the search by images works, and by texts, as well as the migrator dashboard.

Navigating Simple English in “Project Hail Mary” | May 10 2026, 15:30

I’ve read about a quarter of Project Hail Mary so far. The English is very simple, easy to read, captivating; the movie so far follows the book closely, but still, it makes reading quite interesting. However, I generally find it hard to read fiction because I keep getting distracted to google stuff. I reached the phrase “..I used the bathroom (or “head” I guess, because I was on the ship)…” and it got me thinking, it’s interesting to learn that the toilet is called differently on a ship not just in Russian. And why “head”? Turns out that “galley” in Danish and German is “head”. Interestingly, galleys are also found on airplanes, and historically, galleys were used only by sailors; officers did not use them.

The text is very childish, and understandably so – the main character is a physics teacher at a school after all. All these motherfluffer and dang it, gosh darn it, fudge, holy moly, for cripes’ sake instead of for Christ’s sake, there’s even bull-puckey instead of bullshit. “To go wee” is how they say “to pee” in the book. I recall, the day before yesterday we entered a mattress store, and the consultant, while discussing the topic “if one of you goes to the toilet, the other won’t even notice that the first one got up” – well, because the mattresses are so soft – actively used the verb “to pee”. So what? 🙂

Update: when the physics teacher encounters an alien ship on page 120, the chapter ends with holy fucking shit! That’s what all the rest was leading to;)

Occasionally, there are quite funny expressions that can even be used in life 🙂 For instance, the main character asks, “Who pooped in your Rice Krispies?” which is the idiom “to poop in someone’s cereal” – “who messed up your meal”.

In conclusion, if you’re choosing your first book to read in English – this one is at the top of my list. Even something seemingly simple like “Harry Potter” is more sophisticated, in my opinion. Here, there’s a lot of dialogue, school level but almost slang-free vocabulary, and a pretty interesting plot. Plus, it’s real science fiction, where the author educates the reader about the scientific method, how the world works, etc., all from the viewpoint of the hero, a physics teacher, who shares various facts and thoughts on how physics works, relating it to the plot in his interactions with other characters or thoughts to himself (rather than directly to the reader). It’s middle school level so far, but maybe it’ll get more complex later on.

Exploring Automated Documentation of Large Excel Datasets | May 06 2026, 22:28

I wonder if there exists an agent that takes an Excel table significantly larger than the context window and begins to document its essence. Here are several tabs. Here on tab 5, there is a table with a million rows and five columns. The columns are as follows. We take random data from the table, looks like there are numbers, and there – surnames. We assume that there are numbers everywhere – we write a code that checks this assumption and at the same time calculates min/max and a set of unique values. So, few values, only five. We record it. Now we check the surnames. Yes, these are just strings, new sampling showed that they are indeed surnames. Here’s a formula. We see where it points. And so on. And this column – unclear purpose. We look at the data – these are some numbers from 0 to 1. We measure the average and the spread. We ask the user – maybe they’ll provide some comments. They did. It turned out to be a KPI issued to this user from an external system. We record it. And so on. Documentation emerges. Later, when there is documentation, one can request to perform some operations with all this, since the LLM now more or less understands the purpose of the data and their connection, and can build some hypotheses on detecting outliers and verifying them.

The Crucial Role of Data Quality Oversight in Development Projects | May 06 2026, 16:07

Almost every development project features a dedicated functional testing automation team, yet surprisingly, a similar emphasis on Data Quality is rarely found. Regardless of whether data comes from external integrations, users, or is generated by the system itself, it often remains without proper control simply because no one seems to consider it important, and later they struggle with the consequences – they accumulate like a snowball. The longer such issues persist, the harder they are to resolve, eventually leading to a situation where people just resign themselves to the “irreparable” state of the database. It is much better to identify these problems at the moment they arise, while the technical debt has not yet become insurmountable, rather than later figuring out how to prevent them from causing everything to crash;

In essence, there needs to be a constant “supervisor” over all types of databases used by the system (relational, NoSQL, search indexes, or graph databases) — essentially, this is a layer of data quality checking over processes. Of course, there must be clear rules – specifically what to check and which flags to use to mark specific anomalies.

There must be a responsible party for the process (a human, not AI), who will integrate these reports into the development and support workflows. Many data integrity issues cannot just be resolved through an interface — they require the engineering team to develop scripts for mass correction and data cleansing.

Incidentally, this also transitions into the realm of anomaly detection (outlier detection). Machine learning and LLMs for identifying subtle “bad” patterns that traditional rule-based systems might miss.

What do you think about this? Are similar mechanisms implemented in your processes?

Harnessing Chat Data for Semantic Q&A Search | April 30 2026, 04:05

In one evening, I created a simple utility that extracts the Natural Language Processing chat for a year and a half – there are 65,000 messages, and converts it into question-answer pairs with semantic search available. Clicking on a search result (on the left) opens the dialogue in the chat. The messages that are responses to the question are highlighted. And at the top, the original phrasing of the question is highlighted as well.

How it works: the system assumes that people mainly reply to messages that are relatively close in the past. If several replies are made to one message, then it is likely useful and caught the interest of others in the chat. The system takes messages starting from the one many have replied to, ending with the last in the reply-to chain – and among such messages, it selects those that have at least 3 reply-tos to the original question. In essence, it cuts a piece from the chat starting with a popular question so that after the bottom cut, most likely, irrelevant content follows. Such blocks can overlap each other – for example, if someone asked a question while others were replying to something else.

So, if user A asked what the weather was like, and they received answers like “good,” “bad,” “rain,” and there were five messages without a reply-to, and then someone replied to “rain” with the question “why rain”, and five more people replied to this question, then the first question about the weather makes it into the system – the piece ends with 13 messages.

Afterwards, these pieces are summarized into question-answer pairs.

It turns out quite cool.

P.S. In the screenshot, the search query has nothing to do with the search result because I foolishly took the screenshot after I changed the query but before I hit send.