Software – Hi, I'm Rauf Aliev.

Mind-Blowing Facts About SQLite: From Naval Beginnings to Mars | June 18 2026, 12:48

Today I learned some mind-blowing and brain-blasting facts about SQLite — the most widely used database in the world (A trillion installations. In every smartphone, browser, vehicle, A350 aircraft, even on Mars). So, it was born on the military destroyer USS Oscar Austin. It’s developed by JUST THREE people. Open source. But. You can’t just walk into this open source – it’s invitation-only and through an affidavit. The company is called Hwaci (“Hipp, Wyrick & Company”). Also involved in music (founder’s wife is a musician). Check out the website. Office — in a residential house in Charlotte. 600+ lines of tests for every line of code. 100% branch coverage and MC/DC. That is, they simulate OS crashes, power outages, I/O errors, and memory shortages. The main test suite is proprietary and closed. Imagine that, open source with paid private tests. Want access — join the consortium for $120,000 a year.

And the strangest thing — the spirit of the project is almost monastic. Instead of a Code of Conduct, they have a Code of Ethics, taken from chapter 4 of the Rule of Saint Benedict (literally 1500-year-old “tools for good deeds”). At the beginning of each source file instead of a legal notice — a blessing: “May you do good and not evil…”.

(They have not yet found a suitable version control system and wrote their own for themselves — Fossil (based on SQLite, of course). And their parser-generator Lemon is also homegrown. Just like Linus with Git.)

Innovative DIY Program for Live Transcription and Screen Capture Analysis | June 18 2026, 04:47

I made a really cool thing for myself. I launch a program, it turns on the microphone and listens. I switch to, say, a browser, comment on what I see on the screen, periodically pressing a hotkey to take a screenshot. Meanwhile, my program makes a time-stamped transcript of my comments, saves the screenshots with time stamps, then it recognizes the screenshots, extracting therefrom the spellings of various words, brands, identifiers, people’s names, so as to then transform the transcript of my speech into correct text. And all this – local models, running on my laptop, which means, absolutely free.

After I finish talking to the computer, I start processing the transcript, which takes the raw transcript and text-recognized screenshots as input and outputs a processed transcript, which now looks presentable (Gemini API is used here). One could even go a step further and automatically cut out fragments from the screenshots that were discussed, and insert them in the text exactly where they were mentioned.

Or here’s another thing I can do: just turn on a video on the speakers and the program immediately makes such a transcript for me. Google on YouTube the video “Angular HttpClient Under The Hood. Design Patterns & Source Code Overview” starting at 3:51 – I just put it on autopilot for a couple of minutes, then stopped my script.

Transforming Image Proportions with Generative AI: Smart Redesign Solutions | June 16 2026, 10:08

I published an article about how to transform images with changing proportions. Using generative AI, of course, because transforming a square into a rectangle can either result in data loss, their extrapolation, or by stretching and compressing the image itself. Here, I describe a method where smart extrapolation is performed. When processing hundreds and thousands of images, this approach is not without errors, but their number is relatively small, and it turns out to be much more advantageous to focus on manually correcting the erroneous ones than to do all the work manually right away.

This is specifically necessary during a redesign, when it turns out that the new design slightly mismatches the old one in size, for instance with banners, and the number of these banners is measured in hundreds and thousands.

Automating Banner Crop/Resize Across Breakpoints with Generative AI

Migrating SAP Commerce Content with Graph Databases: A Neo4j and Memgraph Guide | June 10 2026, 03:12

Published a new article on Hybrismart after a long hiatus. It’s about how to migrate data from an old site to a new one using a graph db (specifically, I used neo4j and memgraph). The case is as follows: there is an old site and a new site, and you need to transfer CMS data – components, pages, layout from the old to the new, while along the way making various transformations – for example, in the new site the styles are different, the layout is different, some of the components are different. For this task, I used a graph db.

It’s been a while since I wrote on my blog about SAP Commerce Cloud. I worked at SAP for two years, and thought it inappropriate to write about their products while formally having access to internal documents. Currently, I am working on two projects simultaneously – one about migrating SAP Commerce Cloud, and the other significantly about graph databases. At the junction of these worlds, the article was born.

https://hybrismart.com/2026/06/10/migrating-sap-commerce-content-with-a-graph-database/

Migrating SAP Commerce Content with a Graph Database

Script Evolution: Creating Multi-Dimensional Word Art | May 27 2026, 21:12

I created a script that generates inscriptions readable as three different words from the left, right, and top. Overall, this is a development of what I had in my previous post – there it was only left-right. One script generates triplets of words from a dictionary, which technically can be done. Another creates a 3D model that can be thrown onto a printer (might do that today), and the third does a visualization of this model – see video

Exploring Algorithmic Image Processing for Large Format Printing | May 24 2026, 22:40

I’m playing with algorithmic image processing. Images only look interesting when printed in a large format – because all these fine lines merge when scaled to a phone screen. I’ll post a close-up in the comments.

It works like this: an image is given as input, and it is divided into squares of different sizes. Each square represents one number: how dark it is. The darker it is, the more lines are drawn inside. The lines are not straight – they are Bezier splines. They smoothly transition from one square to another because the points at the boundaries are shared. What results is not a grid, but a single continuous thread. Color – the image is split into CMYK channels (like in printing). Each channel is processed separately: its own grid, its own lines. Then the layers are superimposed on each other – and from three or four black-and-white plates, a colored picture emerges.

The image doesn’t look blocky because the splines smoothly transition from one square to another, but there is a problem: dividing the image into 10×10 squares essentially reduces the resolution tenfold. To correct this, several passes are made with different square sizes and shifted grids. The first pass uses large cells, the second is finer and shifted 10 pixels to the right, the third is even finer and shifted diagonally.

The entire process is controlled by a JSON config – separate parameters for each channel, specific settings for each pass within a channel. On output – SVG, which can be scaled to the size of a wall without loss of quality, and PNG, in which CMYK layers are superimposed with transparency.

Mastering Cross-Posting: From Facebook Frustrations to Dual Blogging Excellence | May 23 2026, 14:28

I have perfected the cross-posting from Facebook to my two blog sites [which almost no one visits] – beinginamerica.com and raufaliev.com. When a new post is published on Facebook, a mechanism is triggered to translate the post into English, process attached images, generate descriptions for them, create a title based on the text of the post and descriptions of the images, generate tags from the same basis, record the post in turso db – this is a cloud database, free up to certain limits, create embeddings via openai, record in qdrant cloud – this is also a cloud database, but vector-based, and finally, upload images to wordpress via API, and publish the post in English and Russian via API.

All would be well, but of all the APIs, the silliest one is Facebook’s. Firstly, for pages like mine, transitioned to New Experience, it’s almost impossible to use most of this API. Well, it’s possible, but you have to spend a long time proving to Facebook that you really need it, by showing startup documents, demonstrating the application, etc. Obviously, they are reluctant to deal with something that takes content out of their system. In addition, the token that gives access to the latest messages is relatively short-lived (possibly a few weeks), and it needs to be obtained anew through a browser only. So, any automation requires regular attention, otherwise it breaks.

If you mess up and don’t offload the latest posts through this Facebook Graph API in time, they just disappear from the list of recent ones and that’s it, no more API access to them. The only way is to request an archive download from Facebook. This download is also rather silly – it requires a lot of transformations and removing unnecessary stuff. For example, in the file containing posts, which I process, for some reason there are links that I sent in comments without accompanying text. And the comments are in a separate file!

To assign tags, I had to solve a separate challenge. Here’s the thing: there are about 10,000 posts over all time. That’s a big chunk, and you can’t build tags from it because it doesn’t fit into the contextual window of the LLM. But you need to. So, I did this: a script takes random posts from the 10,000 in such a volume that their total size is just below the specified limit in tokens, and at the end of this block, it adds the prompt “generate the most common tags for me, 30 pieces” (I simplify the prompt used). In the end, I ran this 10 times and got 10 sets of tags with 30 pieces each, generated for different slices of the database. That made 300 tags, some of which are complete duplicates, while others are synonyms and closely related in meaning. All this is fed into the LLM, and we get a list of tags and a hierarchy of tags. Now we have a limited set of tags that reflect the 10,000 posts as closely as possible. Turns out, that in almost 20 years on Facebook, my breakdown is as follows:

Tag Posts

==================================================

#Russia 3412

#Thoughts 3146

#Tech 3105

#Culture 2765

#Hobbies 2726

#AI 1603

#Science 1367

#Software 1358

#Travel 1298

#Learning 1138

#Society 1050

#Nature 958

#Education 915

#Business 902

#Art 894

#Programming 889

#Humor 840

#History 807

#Gadgets 750

#Moscow 713

#USA 614

#Cinema 567

#Webdev 493

#Music 476

#Sports 473

#Mindset 443

#Auto 400

#Books 386

…

and so on. This list includes both tags from the limited list and tags that the LLM appointed to content simply because it didn’t find anything suitable in the limited one.

Tags from the limited list became categories on the site. The rest of the tags + these just became regular wordpress tags.

As for image search. I had two ideas on how to do it. The first – OpenCLIP. It’s pretty straightforward but requires hosting the model somewhere. Easy on my machine, but inconvenient to start it each time, plus I planned to move the migrator to a cheap server on Amazon. It’s also okay to calculate in cloud models, but you have to pay a bit, which is yet another dependency. But the main thing – it works quite well without it. I generate descriptions for images using OpenAI, which is used for translating into English anyway, and then create embeddings using a large model. So far, all search tests are a great success. Especially when there’s text on the image, and it’s a big question whether OpenCLIP would have interpreted it successfully.

In the end:

1) wordpress raufaliev.com – free

2) wordpress beinginamerica.com – free

3) turso db where all posts are stored – free

4) qdrant cloud where embeddings are stored – free

5) openai for translation and image descriptions – not free, but inexpensive (cost $30 for post processing over a year).

I attach two screenshots – how the search by images works, and by texts, as well as the migrator dashboard.

Exploring Automated Documentation of Large Excel Datasets | May 06 2026, 22:28

I wonder if there exists an agent that takes an Excel table significantly larger than the context window and begins to document its essence. Here are several tabs. Here on tab 5, there is a table with a million rows and five columns. The columns are as follows. We take random data from the table, looks like there are numbers, and there – surnames. We assume that there are numbers everywhere – we write a code that checks this assumption and at the same time calculates min/max and a set of unique values. So, few values, only five. We record it. Now we check the surnames. Yes, these are just strings, new sampling showed that they are indeed surnames. Here’s a formula. We see where it points. And so on. And this column – unclear purpose. We look at the data – these are some numbers from 0 to 1. We measure the average and the spread. We ask the user – maybe they’ll provide some comments. They did. It turned out to be a KPI issued to this user from an external system. We record it. And so on. Documentation emerges. Later, when there is documentation, one can request to perform some operations with all this, since the LLM now more or less understands the purpose of the data and their connection, and can build some hypotheses on detecting outliers and verifying them.

The Crucial Role of Data Quality Oversight in Development Projects | May 06 2026, 16:07

Almost every development project features a dedicated functional testing automation team, yet surprisingly, a similar emphasis on Data Quality is rarely found. Regardless of whether data comes from external integrations, users, or is generated by the system itself, it often remains without proper control simply because no one seems to consider it important, and later they struggle with the consequences – they accumulate like a snowball. The longer such issues persist, the harder they are to resolve, eventually leading to a situation where people just resign themselves to the “irreparable” state of the database. It is much better to identify these problems at the moment they arise, while the technical debt has not yet become insurmountable, rather than later figuring out how to prevent them from causing everything to crash;

In essence, there needs to be a constant “supervisor” over all types of databases used by the system (relational, NoSQL, search indexes, or graph databases) — essentially, this is a layer of data quality checking over processes. Of course, there must be clear rules – specifically what to check and which flags to use to mark specific anomalies.

There must be a responsible party for the process (a human, not AI), who will integrate these reports into the development and support workflows. Many data integrity issues cannot just be resolved through an interface — they require the engineering team to develop scripts for mass correction and data cleansing.

Incidentally, this also transitions into the realm of anomaly detection (outlier detection). Machine learning and LLMs for identifying subtle “bad” patterns that traditional rule-based systems might miss.

What do you think about this? Are similar mechanisms implemented in your processes?

Repurposing Components from a Broken Air Purifier | May 03 2026, 15:00

The air purifier broke down, so I bought a used one with a new cartridge for the price of a replacement cartridge plus $40. I completely disassembled the old one, extracted the reusable components, and figured out how it works. Just like in school 🙂

Inside, it comprises:

– an ESP32-WROOM-32D controller. But a part of the board responsible for voltage burned out, so it’s trash now.

– a CO sensor MQ-7 (unfortunately soldered to the board, but can be desoldered). Though, it needs a heating cycle for correct operation. First 5V (60 sec) for sensor cleaning, then 1.5V (90 sec) for measurement. But, it can also be used elsewhere.

– Plantower PMS9103M — a high-precision laser sensor for measuring airborne particulate matter concentrations (PM1.0, PM2.5, PM10). Can be connected to Arduino, specification available.

– a microwave motion sensor (radar), model RCWL-0516. Can be connected to Arduino, very simple interface. Detects motion up to 5-7 meters around within 360 degrees.

– 200W Snowfan YY225H310B motor. Also quite simple to connect, but it requires 310V DC plus 15V for speed control. But that’s all.

– a Hall sensor (magnet)

The motor is the most valuable part. It’s priced at $100 on eBay. Though, it should probably be tested first to see if it hasn’t burned out.