AI – Hi, I'm Rauf Aliev.

Innovative DIY Program for Live Transcription and Screen Capture Analysis | June 18 2026, 04:47

I made a really cool thing for myself. I launch a program, it turns on the microphone and listens. I switch to, say, a browser, comment on what I see on the screen, periodically pressing a hotkey to take a screenshot. Meanwhile, my program makes a time-stamped transcript of my comments, saves the screenshots with time stamps, then it recognizes the screenshots, extracting therefrom the spellings of various words, brands, identifiers, people’s names, so as to then transform the transcript of my speech into correct text. And all this – local models, running on my laptop, which means, absolutely free.

After I finish talking to the computer, I start processing the transcript, which takes the raw transcript and text-recognized screenshots as input and outputs a processed transcript, which now looks presentable (Gemini API is used here). One could even go a step further and automatically cut out fragments from the screenshots that were discussed, and insert them in the text exactly where they were mentioned.

Or here’s another thing I can do: just turn on a video on the speakers and the program immediately makes such a transcript for me. Google on YouTube the video “Angular HttpClient Under The Hood. Design Patterns & Source Code Overview” starting at 3:51 – I just put it on autopilot for a couple of minutes, then stopped my script.

Transforming Image Proportions with Generative AI: Smart Redesign Solutions | June 16 2026, 10:08

I published an article about how to transform images with changing proportions. Using generative AI, of course, because transforming a square into a rectangle can either result in data loss, their extrapolation, or by stretching and compressing the image itself. Here, I describe a method where smart extrapolation is performed. When processing hundreds and thousands of images, this approach is not without errors, but their number is relatively small, and it turns out to be much more advantageous to focus on manually correcting the erroneous ones than to do all the work manually right away.

This is specifically necessary during a redesign, when it turns out that the new design slightly mismatches the old one in size, for instance with banners, and the number of these banners is measured in hundreds and thousands.

Automating Banner Crop/Resize Across Breakpoints with Generative AI

AI Revolutionizing Decision-Making in Sports and Business | June 14 2026, 02:06

Today, I pondered how AI is changing age-old, even centuries-old concepts about how people should make decisions in various situations, especially in sports and probably in business. It’s far more interesting than just automation. It’s more about fixing bugs in how people have long considered something to be correct and true.

For example, in the game of “Go,” it was believed for decades that invading the corner (3-3 point) was crude and premature. AI then proved otherwise: early capture of the corner is efficient, and chasing after “beautiful” shapes loses to pragmatic control over the center. Or consider the famous 37th move by AlphaGo in the match against Lee Sedol, which was very strange: people did not play that move because they thought it was “playing into empty space.” It was first taken for an AI mistake, but then recognized as brilliant (there are plenty of analyses on YT). In esports, OpenAI Five demonstrated that aggressive early buyback of fallen heroes in “Dota,” which people considered a waste of gold, works.

Pure mathematics almost erased the mid-range shot from the NBA: it has an accuracy of about 40-42% and yields ~0.8 points per attempt, while a three-point shot with even 35% accuracy brings 1.05 points per attempt, and clubs have restructured for pure profit. Well, this is not AI, but mathematics and statistics. The under-basket shot (lay-up/dunk) turned out to be statistically the most effective.

In soccer, there’s the xG – expected goals metric; AI debunked shots from 35 meters and from outside the penalty area as ineffective (chance of scoring ~5% and 20% respectively) and ultimately teams patiently bring the ball into the penalty area, where the xG of the shot increases to 15-40%. It turns out, DeepMind had a project with Liverpool, a system advising coaches on corners – TacticAI. Expert assessors in 90% of cases preferred TacticAI’s recommendations over the tactical setups used in practice.

So, interestingly, if this continues, will a team or athlete using more powerful AI have an advantage due to more successful methods than a team that does not have such knowledge? Will AI game methods be so complex that they can’t be “stolen” to another team through outside observation – just like in the case with Go?

Helicopter Installs Anti-Drone System on Moscow Residential Building | June 06 2026, 16:26

I read the news that a “Pantsir” anti-drone system was installed by helicopter onto the roof of a residential high-rise (“House in Sokolniki”) in Moscow. Yes, it’s a full-fledged Pantsir, specifically its anti-drone modification (SMD-E), but I couldn’t resist making this AI photo.

[SKIP]

The Mystery of Tal’s Havana Incident: Chess, Reality, and AI | June 02 2026, 00:50

In one of the chess communities, they posted this photo.

It seems to follow a theme. The Olympics, Havana, Tal really did get hit on the head with a bottle in one of the bars – and he was out of action for several days – but then he returned to the board.

But there are seven obvious differences from reality. The most interesting detail – too many pens and fingers on the right hand. But the most interesting – AI would never correctly portray it, no matter how hard it tries.

Art Beyond AI: When Technology Fails to See What Humans Perceive | May 24 2026, 22:56

It’s funny, but Gemini, Claude, ChatGPT couldn’t figure out what I’ve drawn here. It’s the first time that something a human can see, the model can’t decipher.

Mastering Cross-Posting: From Facebook Frustrations to Dual Blogging Excellence | May 23 2026, 14:28

I have perfected the cross-posting from Facebook to my two blog sites [which almost no one visits] – beinginamerica.com and raufaliev.com. When a new post is published on Facebook, a mechanism is triggered to translate the post into English, process attached images, generate descriptions for them, create a title based on the text of the post and descriptions of the images, generate tags from the same basis, record the post in turso db – this is a cloud database, free up to certain limits, create embeddings via openai, record in qdrant cloud – this is also a cloud database, but vector-based, and finally, upload images to wordpress via API, and publish the post in English and Russian via API.

All would be well, but of all the APIs, the silliest one is Facebook’s. Firstly, for pages like mine, transitioned to New Experience, it’s almost impossible to use most of this API. Well, it’s possible, but you have to spend a long time proving to Facebook that you really need it, by showing startup documents, demonstrating the application, etc. Obviously, they are reluctant to deal with something that takes content out of their system. In addition, the token that gives access to the latest messages is relatively short-lived (possibly a few weeks), and it needs to be obtained anew through a browser only. So, any automation requires regular attention, otherwise it breaks.

If you mess up and don’t offload the latest posts through this Facebook Graph API in time, they just disappear from the list of recent ones and that’s it, no more API access to them. The only way is to request an archive download from Facebook. This download is also rather silly – it requires a lot of transformations and removing unnecessary stuff. For example, in the file containing posts, which I process, for some reason there are links that I sent in comments without accompanying text. And the comments are in a separate file!

To assign tags, I had to solve a separate challenge. Here’s the thing: there are about 10,000 posts over all time. That’s a big chunk, and you can’t build tags from it because it doesn’t fit into the contextual window of the LLM. But you need to. So, I did this: a script takes random posts from the 10,000 in such a volume that their total size is just below the specified limit in tokens, and at the end of this block, it adds the prompt “generate the most common tags for me, 30 pieces” (I simplify the prompt used). In the end, I ran this 10 times and got 10 sets of tags with 30 pieces each, generated for different slices of the database. That made 300 tags, some of which are complete duplicates, while others are synonyms and closely related in meaning. All this is fed into the LLM, and we get a list of tags and a hierarchy of tags. Now we have a limited set of tags that reflect the 10,000 posts as closely as possible. Turns out, that in almost 20 years on Facebook, my breakdown is as follows:

Tag Posts

==================================================

#Russia 3412

#Thoughts 3146

#Tech 3105

#Culture 2765

#Hobbies 2726

#AI 1603

#Science 1367

#Software 1358

#Travel 1298

#Learning 1138

#Society 1050

#Nature 958

#Education 915

#Business 902

#Art 894

#Programming 889

#Humor 840

#History 807

#Gadgets 750

#Moscow 713

#USA 614

#Cinema 567

#Webdev 493

#Music 476

#Sports 473

#Mindset 443

#Auto 400

#Books 386

…

and so on. This list includes both tags from the limited list and tags that the LLM appointed to content simply because it didn’t find anything suitable in the limited one.

Tags from the limited list became categories on the site. The rest of the tags + these just became regular wordpress tags.

As for image search. I had two ideas on how to do it. The first – OpenCLIP. It’s pretty straightforward but requires hosting the model somewhere. Easy on my machine, but inconvenient to start it each time, plus I planned to move the migrator to a cheap server on Amazon. It’s also okay to calculate in cloud models, but you have to pay a bit, which is yet another dependency. But the main thing – it works quite well without it. I generate descriptions for images using OpenAI, which is used for translating into English anyway, and then create embeddings using a large model. So far, all search tests are a great success. Especially when there’s text on the image, and it’s a big question whether OpenCLIP would have interpreted it successfully.

In the end:

1) wordpress raufaliev.com – free

2) wordpress beinginamerica.com – free

3) turso db where all posts are stored – free

4) qdrant cloud where embeddings are stored – free

5) openai for translation and image descriptions – not free, but inexpensive (cost $30 for post processing over a year).

I attach two screenshots – how the search by images works, and by texts, as well as the migrator dashboard.

YouTube Fascinates Foxes and Rabbits Alike: A Curious Phenomenon | May 12 2026, 13:26

In this video, nothing happens. It’s funny that YouTube sparks lively interest not only among the local foxes but also the rabbits.

Navigating Simple English in “Project Hail Mary” | May 10 2026, 15:30

I’ve read about a quarter of Project Hail Mary so far. The English is very simple, easy to read, captivating; the movie so far follows the book closely, but still, it makes reading quite interesting. However, I generally find it hard to read fiction because I keep getting distracted to google stuff. I reached the phrase “..I used the bathroom (or “head” I guess, because I was on the ship)…” and it got me thinking, it’s interesting to learn that the toilet is called differently on a ship not just in Russian. And why “head”? Turns out that “galley” in Danish and German is “head”. Interestingly, galleys are also found on airplanes, and historically, galleys were used only by sailors; officers did not use them.

The text is very childish, and understandably so – the main character is a physics teacher at a school after all. All these motherfluffer and dang it, gosh darn it, fudge, holy moly, for cripes’ sake instead of for Christ’s sake, there’s even bull-puckey instead of bullshit. “To go wee” is how they say “to pee” in the book. I recall, the day before yesterday we entered a mattress store, and the consultant, while discussing the topic “if one of you goes to the toilet, the other won’t even notice that the first one got up” – well, because the mattresses are so soft – actively used the verb “to pee”. So what? 🙂

Update: when the physics teacher encounters an alien ship on page 120, the chapter ends with holy fucking shit! That’s what all the rest was leading to;)

Occasionally, there are quite funny expressions that can even be used in life 🙂 For instance, the main character asks, “Who pooped in your Rice Krispies?” which is the idiom “to poop in someone’s cereal” – “who messed up your meal”.

In conclusion, if you’re choosing your first book to read in English – this one is at the top of my list. Even something seemingly simple like “Harry Potter” is more sophisticated, in my opinion. Here, there’s a lot of dialogue, school level but almost slang-free vocabulary, and a pretty interesting plot. Plus, it’s real science fiction, where the author educates the reader about the scientific method, how the world works, etc., all from the viewpoint of the hero, a physics teacher, who shares various facts and thoughts on how physics works, relating it to the plot in his interactions with other characters or thoughts to himself (rather than directly to the reader). It’s middle school level so far, but maybe it’ll get more complex later on.

Category: AI