Harnessing GPU Power Beyond Machine Learning: A Data Processing Experiment | December 13 2025, 01:16

Torturing my supercomputer. Illustration that the GPU is not just for machine learning and some complex math.

My script takes a thick English dictionary (Webster) and multiplies it by 30, creating a list of 12 million words. Then, the algorithm looks through all 12 million words and replaces all the vowels with asterisks using regex. To add more load, a “word length” column is added, and then we take words longer than 10 letters and find the most frequent (top 5).

So, in Python this is

df[‘masked’] = df[‘text’].str.replace(r'[aeiou]’, ‘*’, regex=True)

df[‘len’] = df[‘masked’].str.len()

res = df[df[‘len’] > 10][‘masked’].value_counts().head(5)

and this code is executed first through the main processor, then through a GPU.

The main processor (I have the top-tier Intel i9 285k) completes this task in 24 seconds, while the Nvidia RTX 5090 does it in 0.51 seconds. That’s a 46 times difference!

[Pandas CPU] Top Patterns:

masked

s*r w. sc*tt. 23280

s*r t. br*wn*. 23220

j*r. t*yl*r. 16140

bl*ckst*n*. 10860

b***. & fl. 10830

Name: count, dtype: int64

[Pandas CPU] Computation Time: 23.5596 sec.

Transferring data to GPU…

Transfer complete in 1.16s

— Running Benchmark: cuDF GPU —

[cuDF GPU] Top Patterns:

masked

s*r w. sc*tt. 23280

s*r t. br*wn*. 23220

j*r. t*yl*r. 16140

bl*ckst*n*. 10860

b***. & fl. 10830

Name: count, dtype: int64

[cuDF GPU] Computation Time: 0.5108 sec.

TOTAL SPEEDUP: 46.12x

Misadventures in AWS: Misusing aws-nuke for Configuration Exports | December 12 2025, 16:29

Just for laughs. I asked Gemini how to export the entire AWS configuration for local analysis, and they recommended using the aws-nuke command for permanently deleting everything, but if you add a dry-run flag, you’ll get the configuration… and someone actually follows such advice 🙂 and then we wonder

Two Weeks on Linux: From Mac to ArchLinux+KDE Bliss | December 12 2025, 16:24

Two weeks on Linux, wildly satisfied. After a Mac. I specifically have a setup of ArchLinux+KDE/Plasma 6.5. Everything here is customizable. For instance, I made a program from scratch in half an hour (no lie, thirty minutes) using Gemini that translates selected text to English or corrects errors if the selected text is already in English when ScrollLock is pressed. There seems to be an app for every situation in life, at least in my field. Everything flies (even though this is an Intel i9 285K/64Gb). I just enter a folder that contains 470,000 files, and it opens instantaneously. I’ve never seen anything like this anywhere else. I launch IntelliJ Idea, and there is practically no delay between clicking the icon and the editor being ready with the loaded project. All devices connected perfectly, unlike with the Mac, for which there are simply no drivers for my HP LaserJet 1018 and I need to perform tricks.

Now I occasionally switch to a Mac, and it drives me crazy that the hotkeys are different. Of course, they can be reconfigured for Mac, and probably I will do that. Muscle memory builds up, and switching quickly doesn’t work out. I miss iMessage a bit – I’m used to writing and responding to messages from the computer. Apple iMusic works, through a browser.

Overall, the impression is very good so far.

Nostalgia and Innovation: The Story of Starchat.ru | December 09 2025, 23:41

2003. We had a chat, my creation, Starchat.ru, where people constantly hung out and communicated with each other. It had a Java applet! Nobody even remembers what that is nowadays, probably. Initially, some programmer I found on the internet wrote this thing, who then disappeared, and I took over the support.

Just for laughs, I made a bot that you could chat with by simply sending it a private message. It was always online, and not everyone realized that it was a bot. When the robot received a message, it searched through massive chat logs for messages that contained the most words from the query and had some response. A response is the next message directed at the user by someone (like “Vasya: oh just go you know where!” is a response to Vasya’s message). In the chat interface, you had to click on a message and then reply to it. In the presence of several options (and there were always several options, given the traffic of chatters), a random one was chosen.

It turned out to be a robot that very amusingly answers questions. If you ask it what its name is, it always replies with different names but appropriately, with emojis and suffixes, often swearing. Also, the bot always gave adequate responses to standard questions like “where do you live” or “how old are you.” Since there was a huge history, and they talked about everything in general, it was hard to find a question to which the system did not give an interesting/correct/funny answer.

So, the bot had an interesting side effect. If you start swearing at it offensively, it begins to swear back even more offensively. And generally, it often reacts inadequately to attacks and reproaches. Simply because in real conversations, a polite question is answered politely, and a rude one — of course, rudely. The audience had a lot of fun with this bot.

It was especially interesting to read the bot’s logs afterward. People there didn’t understand that it was a robot. They asked it questions, quarreled and made up with it. It was fun)

In-Flight French: Building a Language App on the Fly | December 01 2025, 15:45

By the way, yesterday morning, while waiting at the gate for my flight to Miami, I quickly wrote a French language learning app using Gemini based on an idea I sketched out to a friend while driving to the airport, and then used this app during the flight.

The idea is that in an unfamiliar foreign language text, the user first marks unknown words and then sees their translations — but without the original text, and then returns to the text itself — but no longer seeing the translations. It’s as if the “dictionary was in the next room.” The hypothesis is that this method helps better memorize than when the translation is shown immediately upon clicking on a word, and when no effort is needed.

I am pleased that creating the app from scratch to the finished version took only about 35-40 minutes, and then I used it for some time during the flight, without the internet. Since all translations of all words/phrases were already made in advance.

I just deployed it on Render. It’s also nice that demonstrating the code in action was free and took another 10 minutes.

https://readandlearn.onrender.com/

Unleashing the Power: RTX 5090 for Advanced AI and Digital Art Creations | December 01 2025, 01:39

Nvidia RTX 5090 32Gb! Happy as an elephant. Installed ArchLinux and CUDA. Planning to soon get smart about boosting transformer deep neural networks and have a bunch of ideas for digital art based on concepts other than diffusion models.

Performance: Just ran a test, model GPT_OSS_20b_UD_Q4_K_XL generates 350 tokens per second with a context of 131072 tokens. That’s roughly an A4 page in a few seconds. Gemma3 27B – 55 tokens per second. Qwen3_30B_A3B_Q6_K – 259 tokens per second.

Exploring Recommender Algorithms Through Interactive Visualizations and Sandbox Simulations | November 11 2025, 05:23

I’ve launched an electronic open source application for my book Recommender Algorithms! It’s a “sandbox” where you can “run” various recommendation algorithms with different settings, and view specific visualizations for each algorithm that help understand how it works. For instance, for algorithms like ItemKNN, SLIM, or EASE, a key visualization is a heatmap of the learned similarity matrix (item-item similarity matrix). This allows you to see which pairs of items the model considers “similar” (or “influencing” each other). For SLIM, for example, a useful “Sparsity Plot” shows that the similarity matrix indeed turned out to be sparse. For associative rule algorithms (Apriori, FP-Growth, Eclat) the visualization is not a graph, but interactive tables with found “Frequent Itemsets” and generated “Association Rules,” which can be filtered and sorted.

Additionally, there is a parametric mechanism for creating a “game dataset” — Dataset Wizard. It works like this – there are template datasets that describe items through characteristics. For example, recipes through flavors. Or movies through genres. The system generates random users with a random set of characteristics from the same set — and there are many sliders to make this distribution more contrasted or complex. Next, a matrix of user ratings of items is created – conditionally, if the characteristics of the user and the item match, then the rating will be higher because “tastes match”; conversely, if they differ, then the rating will be lower. Here too, sliders add noise and scarcity – randomly removing part of the matrix. The characteristics of products and users are not fed into the recommendation algorithm; they are hidden, but they are used to visualize the results.

The third component of the application is the tuning of hyperparameters. Essentially, it’s an auto-configurator for a specific dataset. An iterative approach is used, which is much more efficient than a full search (Grid Search) or random search (Random Search). In short, the system analyzes the history of past runs (trials) and builds a probability “map” (surrogate model) of which parameters will likely yield the best result. Then, it uses this map to smartly choose the next combination to test. This method is called Sequential Model-Based Optimization (SMBO).

The code is open source and will be further supplemented with new algorithms and new visualizations.

Link to the code in the comments.

Link to the site where the code is deployed and where you can check out the application is also in the comments.

The Evolution of the Albanian Virus: From Joke to Cyberthreat | November 07 2025, 14:21

“Hello. I am an Albanian virus, but due to the low level of technology in my country, I cannot do anything to your computer. Please kindly delete one file on your computer and then forward me to other users.”

Here’s the 2025 version. The line they ask to insert into the terminal – echo “” | base64 -d | bash

This line contains curl, pointing to 217.119.139.117 whose result is passed to `nohup bash`. And from this address, a script is loaded, of course obfuscated.

Naturally, no available LLM agrees to decrypt it. But Qwen didn’t mind.

Upon execution, the script gathers information from Chrome, Brave, Edge, Firefox, and others, extracting cookie files, autocomplete history, and system login data, collects crypto wallets like Electrum, Coinomi, Exodus, Atomic, Wasabi, Ledger Live, and others, gathers content from the “Notes” macOS app with attached media files, data from the Keychain (passwords), and also scans the desktop and documents for files of certain extensions. The collected data are archived and sent to a remote server with the IP address 217.119.139.117.

To ensure persistent access, the script creates hidden launch services (LaunchDaemons) with random names, making it difficult to detect. It can download and replace the legitimate Ledger Live application with a modified version.

Such is the Albanian virus)

Unveiling “Recommender Algorithms”: A Comprehensive Guide on Recommendation Systems | October 25 2025, 17:36

I finally released a book on #RecSys! It’s called Recommender Algorithms, where I’ve compiled over 50 recommendation algorithms with detailed mathematical derivations, thorough explanations, and code examples.

https://www.testmysearch.com/books/recommender-algorithms.html

It all started early this spring in Germany, when I attended an ACM conference and sketched out the first structure of the book while analyzing the talks from the RecSys track. And now, just six months later, it has come to life.

Why did I write it? Because neither online nor in print is there a single, accessible resource that deeply explores recommendation algorithms of various types and purposes. There are articles focused on small subsets, but collecting and systematizing approaches—from foundational methods to the very latest—seems to have never been done before. I don’t know if I succeeded, but I’d love to hear your feedback.

Please like & share!

P.S. Click at READ SAMPLE to see the first 40 pages. The table of contents is there as well.

https://www.testmysearch.com/books/recommender-algorithms.html

https://www.testmysearch.com/books/recommender-algorithms.html