Introducing the AI-Powered Text-to-Diagram Generator | September 30 2025, 20:57

While working on a book, I realized what kind of product I’m missing. It’s an AI diagram generator based on textual descriptions.

The idea is that the master document for the diagram is text. This textual description can be (and should be) quite detailed, so the generated diagram exactly matches the author’s vision. The diagram itself is not edited. That is, it can be edited – moving circles around, but ideally, after making changes, the system should update the text, generating from which will result in what the user adjusted.

The result — the diagram — should correspond as closely as possible to the description. If it does not match the description because, for example, it’s impossible to make a triangle with three obtuse angles, the system should do its best and provide a verbal response about what didn’t work. The user can then modify the task so that the system complies and produces the diagram correctly.

But then we understand that the author might have randomly achieved something that they liked with their flawed text. And if regenerated, it might turn out differently, and not necessarily better. Therefore —

You could ask the system to generate a diagram description from the diagram, which, if inputted back into the diagram generator, would result exactly in what the description was generated from. Yes, this description would be more verbose and complex, but it would more reliably describe the result.

So, from this point, you are no longer working with the diagram. You are working with text. If a diagram is needed — you simply compile the text into a diagram and it turns out as needed. But you don’t even work directly with the text. You work with this diagram-description text through an LLM, asking it to add some block, and the text changes, but changes in a way that everything doesn’t suddenly shift.

The final diagram should be in an object form, from which raster (PNG) or vector (SVG, EPS) images can be created.

It would also be great if such a system could take existing diagrams or diagram templates so that it could borrow styles and existing conventions on how to display what.

So, these are my fantasies. If anyone has ideas on how to implement this — let’s discuss 🙂

Crafting the Future of Recommender Systems: A Deep Dive into Algorithms and Implementation | September 26 2025, 21:17

I decided a while ago to write a book on recommendation algorithms. With mathematics, code examples, a repository, etc. English, of course.

Accordingly, I am looking for volunteer reviewers who are knowledgeable in the field. Also those who have experience with print-on-demand on Amazon.

There’s already about 200 pages of content. About three months of work left. Working title Recommender Algorithms in 2026: A Practitioner’s Guide. Roughly half of it is still in draft form, with the first 80 pages about 80% complete.

I’ve built a mechanism to publish in HTML and PDF simultaneously. The HTML version is fully functional, with navigation. The navigation block reflects the current section, and as you scroll, it shifts to the one in front of the reader. Clicking on a section, of course, teleports you to what you clicked on. It’s all completely automatic.

Revolutionizing Car Safety: Pre-Collision Airbag Deployment and Smart Updates in Modern Vehicles | September 24 2025, 12:54

So far, I have only one car model and brand that can deploy airbags not at the moment of impact, but a moment earlier, so that by the time of the impact, it’s not too late to do so. We’ll see what the news shows, but tests indicate that this thing works better than the traditional method. Reality might turn out to be harsher, but we’ll keep an eye on it.

It’s also interesting that the car started to receive new exciting features after purchase. I never had this experience before. What you bought it with, you lived with, and sometimes you could go to the dealership for something new, and it usually involved replacing something physical.

The previous update (not very useful to me, but maybe to someone) was about automatic detection of children and animals in the cabin. And if it turns out they were left inside while the owner left, the car does not turn off the climate control. And of course, it screams into the app that this is not a good thing to do.

Exploring AI Search Agent: Revolutionizing Automated Browsing and Task Completion | August 19 2025, 01:21

In addition to the main product for search testing, I am developing an AI Search Agent in my leisure time. You only need to provide it with two pieces of information: a website to visit and a goal (described in a short paragraph). In other words, this thing is smart enough to function without any setup – just the site and the goal, and then it’s on its own.

How it works: This virtual agent generates search queries on its own, refines them based on the results obtained (for example, simplifies them), and analyzes how well they match the intended purpose. If suitable results are found, the agent can add items to the cart and place an order — if this is configured in the settings.

I’ve already written about this recently – today is just a slightly nicer demo. It will be even nicer as it is still being pulled from the middle of development, but you can already see how the page is analyzed, and there are initial results that can be used.

The agent can be used for several purposes. Firstly, it’s an excellent way to create ground truth—a set of queries with perfect results. These data can then be used for search testing without involving often slow and expensive large language models (LLM). Secondly, it helps to test the search functions before deploying them to users. Thirdly, the agent generates realistic usage data needed for training recommendation models that require authentic interactions.

The colorful rectangles in the video are the language of interaction of the agent with AI (or LLM). To understand where to click, the system annotates the page and sends a structured description of the page to AI—often along with a screenshot—so it can analyze everything and make a decision about the next action.

Exploring TestMySearch.com’s Virtual Shopper System | August 15 2025, 04:27

As part of the TestMySearch.com project, I am creating a “virtual shopper” system that simulates the behavior of a real user in an online store: it starts with an abstract goal (for example, “something bright and sexy for the gym”), turns it into a specific search query, performs the search on the site, and depending on the results, may either continue browsing or, with a certain probability, reformulate the query if the findings do not match the original goal; the system then evaluates the pages for their alignment with the initial idea, opens product cards, randomly changes parameters such as color or size, makes decisions about adding to the cart and placing an order, and may also leave the site, which allows generating many sessions similar to real ones overnight for testing search, filters, and recommendations even before live users arrive.

The system is fully automatic. That is, the browser in the video opens by itself, the search field appears by itself (i.e., independent of the site), the system itself concocts the text based on that very initial goal, then the facets and search results are displayed, which may also be in a form unpredictable to the system — but it still understands what is what, and makes decisions about whether to rephrase the query, select a facet or click on a search result. There is a certain probability that the virtual user will leave the site. If the query is reformulated, for example, this virtual user does not repeat queries that have already led to empty or irrelevant results, so within the session there is “memory”.

Navigating Code Generation with AI: Essential Skills for Programmers | August 04 2025, 14:28

I am currently using Gemini extensively for code generation, and I see a skill that programmers need to have to be successful in this field. It’s the ability to quickly read and understand someone else’s code, as well as explain why AI generation needs to be redone and how. For the former, you simply need to know the language very well and read “from the sheet,” because there will be little time to ponder. For the latter, you need to know patterns well and understand where they apply and where they do not. AI will still mess up using patterns inappropriately for a long time.

Moreover, a person will still need to understand “as a whole” 90% of the code generated by AI, and also manage to find time to comprehend each generated line of code. If you relax and miss it, the system may produce even working, but very poorly maintainable code. For instance, there is an unwritten rule that individual files should not contain so much code, and if it grows, you need to refactor, breaking one large into two or three. Sometimes this requires rewriting logic, but this rewriting is always aimed at one task – to simplify maintenance. And AI, while rewriting, also “improves” the code at the same time. And this is quite difficult to prohibit.

In addition, the very concept of LLM implies the limitation of the contextual window. Which gets filled with code very quickly. To create an illusion for the user that everything is working even with a large volume of code, LLMs are able to do preliminary processing, extracting only relevant pieces for processing and setting aside irrelevant ones, so that the relevant ones fit into the actual contextual window. But this process is very unreliable, and once it works, and the second time it turns out that something important was set aside, and as a result, the system did not see the whole picture and generated code, which includes a function very similar to the function set aside, and now we have two almost identical ones.

Besides, currently logic is distributed between the DB and the code. That is, data often controls the code. And data in LLMs simply often do not fit. There is too much of it. In the end, without programmers, current LLM architectures cannot cope. But the requirements for programmers’ qualifications will only increase with LLMs, not decrease. So yes, juniors should be worried, but leads not so much 🙂

North Korea’s Tech Control: Red Star OS and Surveillance Smartphones | July 13 2025, 00:58

In the latest video about North Korea from Lankov, I heard something interesting: a device owner cannot open someone else’s file, whether on a computer or on a phone, unless it is signed with a special digital signature from the government. Intrigued, I researched the details for myself and for you.

On their phones, they use a modified old “KitKat” Android (2013), and on computers—a modified Fedora Linux, Red Star OS 3, with a shell that mimics the macOS interface from Apple (the previous one mimicked Windows XP). It is said that this design choice may have been influenced by the fact that leader Kim Jong Un was seen with an iMac on his desk, and apparently, he said make it the same.

North Korean smartphones are equipped with hidden surveillance features that automatically take screenshots every five minutes, storing them in a secret folder accessible only to authorities, not the user. According to other sources, screenshots are taken when applications start, apparently pseudo-randomly. There is also censorship: if you type “South Korea” (남조선) in any app, the system automatically replaces it with “puppet state” (괴뢰국가). One hundred percent of the phones are obviously Chinese, modified by China for Korea. By the way, the collected screenshots are accessible to users, but they cannot be deleted. This application, Trace Viewer, is clearly created to remind users: everything that they do on the tablet or phone can be known to the government.

All media content in Red Star OS, including documents, images, audio and video files, is automatically marked with a watermark containing a unique serial number of the hard drive, which allows authorities to track its origin and distribution. That is, you cannot take a photo and send it to someone, because it will either just not open on that phone, or, apparently, in rare cases, if sharing is allowed, in the new place there will be traces of both who is the author of the photo and who is the next owner. But this is underdeveloped, and direct file sharing is still limited. You can only use it yourself. Of course, nothing can be deleted from the phone without a trace. It is not allowed to have more than one device per person (seems to apply separately to a tablet and a phone).

North Korean mobile devices use a strict system of digital signatures (NATISIGN for government-approved content and SELFSIGN for content created on the device), which means that any file without these signatures cannot be opened at all. The system of signatures and signature verification is at the level of the operating system, not applications. This applies to all files that people create, both on phones and on computers. I see a huge number of edge cases here, but there is little information and no one to ask.

The penalties for accessing unauthorized foreign media, such as K-pop or South Korean dramas, are extremely harsh. If an “undesirable file” is found on a CD inserted into a computer with Red Star OS, the system will eject the CD, record the path to the file, display a graphical warning, take screenshots, and then forcefully reboot the system after 1000 seconds.

North Korea manages a national intranet network called Kwangmyong, “walled garden,” which is completely isolated from the global internet and is available to most citizens only for government-approved websites and email systems.

When you first launch the browser Naenara (based on Firefox 3.5), the default homepage is the IP address “10.76.1.11.” That is, their internet is essentially an intranet.

Advancing Full-Text Search: Testing and Refining with Multi-User Platforms | July 06 2025, 04:35

I have developed expertise in full-text search testing. Essentially, it’s a turnkey multi-user platform that, given roughly 1000 queries and several search engine configurations, can produce reports with graphs, metrics, and conclusions by morning, showing why configuration A performs better than B, and here’s why. It calculates all those NDCG@k, MAP, precision, recall, and about a dozen other metrics. It uses LLM, but only at the final stage, after all the math is done.

So, here’s my question. I’m looking for someone who has faced the same issue in their project, to understand the demand and the ask.

The problem the system solves is defined as follows: there is a functional search for goods, documents – Solr, Coveo, Elasticsearch, Algolia – it doesn’t matter, and there are hypotheses on how to improve it, but there is also the fear that improving one aspect might break another. Well, my thing helps to see this in numbers and graphs, providing a conclusion with justification, including statistical significance and other metrics.

It also acts as a virtual search assessor. For each search result, it can give a rating, assessing how well each document matches the query. This is a very non-trivial task (especially for large documents), involving chunking, embeddings, LLM evaluation of relevant chunks, etc. Non-trivial, but it works.

It also can analyze search queries and break them into groups based on similarity. For instance, such segmentation might show that users sometimes separate the words forming a brand name with a space, and sometimes not. These different variants would be grouped together.

I would like to discuss this with someone who knows more about this topic than I do, someone who has/had such problems and has somehow solved them.

I currently feel like my product is unique in the market. Actually, it’s not even on the market yet. But I really don’t see anything similar out there. Maybe nobody needs it?

I won’t publically post screenshots yet. The picture is merely for attracting attention.

Please share if there might be relevant people in your network.