From Idea to Chess AI: Building a Neural Network to Predict Moves | December 15 2025, 04:33

While figuring out neural networks, I decided to come up with a game-related task for myself. What if I find some ready-made games, and train a neural net to predict moves based on the board situation. Said and done. Of course, generating code is faster with LLM, but I wrote the detailed assignment myself and designed the architecture on my own. In 40 minutes (!) from the idea to the result, I already had a working solution that, at least in the first half of the game, does not mess up too much.

In the screenshot is CuteChess – it works with any chess engine, and in my case, it’s a simple Python script. The script takes the board situation and feeds it to the model. It selects the top 5 moves, and only these top 5 are analyzed deeply for several moves ahead and assesses the position. That is, the neural network suggests possible moves based on the analysis of 20,000 games (534,453 positions). From the results, the best is chosen. It uses the minimax algorithm for this, if that means anything to anyone (it didn’t to me, so Gemini here helped me)

How the model is trained. On the lichess website, you can download games, there are hundreds of gigabytes. I took a file with 800,000 played games from the year 2014. From these 800,000, I select 20,000, specifically looking with a script for games where the result is not a draw (1-0 or 0-1). Next, I calculate the difference (Winner_Rating minus Loser_Rating). It’s not the best metric, but it’s better than nothing. The bigger this difference, the more “confident” the win should be (the strong punish the weak). Thus, I get 20,000 such games.

“Ignoring the moves of the weak” (to avoid teaching the model bad play) is implemented during the training stage of the model. Essentially, the logic is: “If it’s White’s turn now, and White won this game β€” we learn. If it’s Black’s turn now, and Black lost β€” we skip and don’t teach the net this move.”.

The neural network is trained in batches of 128 positions at a time. The network receives a board position as input and outputs 4096 β€” the probability assessment for each possible move.

Selecting games takes about 5 minutes. Training the model on my computer takes about 10 minutes for 20,000 games. You could leave it to train on 100K or a million, and it would definitely be better. No need anymore – I figured it out πŸ™‚

You can view the game here:

https://lichess.org/JWeaIrVW

Exploring the Magic of Neural Networks in Letter Prediction and Visualization | December 14 2025, 23:35

I am currently experimenting with training simple neural networks – primarily to automate the existing toolkit, and some things just seem like magic.

There is a database of 32,000 names. There is a neural network filled with random numbers. I start training, with only this list of names as input. The first layer of the neural network is embeddings, and I set the number of dimensions to 2 for easy visualization. And after 200,000 iterations of training, the system clearly separates vowels from consonants, and for some reason, places the letter “q” slightly apart from other consonants. It seems that this is because the letter ‘q’ almost exclusively predicts the letter ‘u’ (Queen, Quincy, Quentin).

It also very reliably separates vowels and consonants in Russian names. In Russian names, the letters b and l are somewhat away from the other consonants, as are the soft and hard signs (well, that’s understandable).

I wonder how it works. If trained on a normal corpus of texts, the difference would be very clear. Why are vowels separated from consonants? Apparently, from the network’s mathematical perspective, ‘a’ and ‘o’ serve the same function: they “trigger” the prediction of the consonant following them, so the alternation of vowels and consonants is to blame. But damn, it’s interesting πŸ™‚

And since the model can predict the next letters, you might try running it on Russian. On a model with 30-dimensional embeddings, it invents names like: Byaketta, Afsena, Erakey, Zasbat, Daraya, Gaiomahad, Rain, Razhul, Gzhatsiy, Reben, Vureb, Durodira, Turuzhul, Regravgava, Razsan, Gabila, Avganzh, Raksi, Khalebkokhorta, Rather. The model – for those who understand – is this: input of 6×33 characters (because we take up to 6 characters of context), encoded into embeddings of 60, goes to a layer of 100 neurons, and from there back to 33 characters. Some nonsense, but at least it’s clear how it all works at all levels.

Modern Reading: More Words, Digital Shifts, and Surprising Data Insights from 2008 | December 14 2025, 22:33

An interesting study caught my eye, dating back to 2009. According to it, the modern human indeed reads significantly more than in the past, although the format of this reading has changed. The study suggests that in 2008, an average American consumed about 100,000 words a day (approximately a quarter of “War and Peace”) – this is an approximate number of words that passed through consciousness per day (via ears or eyes), calculated based on activity chronometry. This is 140% more than in 1980.

Therefore, contrary to the myth about the degradation of reading, at least in 2008, we processed 2.4 times more textual information than our parents’ generation. Moreover, the study only considered information consumed outside of work (at home, in transit, during leisure).

The structure of reading – if in 1960, 26% of words came from paper, by 2008 this share had fallen to 9%. However, digital media (internet, email, social networks) not only compensated for this decline but also tripled the total reading time. The reason β€” the internet, as it is predominantly a textual environment (web surfing, email).

But it’s interesting that although the Internet accounts for 25% of consumed words, it only makes up for 2% of bytes (since video on the internet in 2008 was of low quality). Thus, they estimated the information flow from different channels and converted it into bytes πŸ™‚ Radio accounted for 19% of the time but only generated 0.3% of bytes (as audio requires less data). Voice communication (telephone) β€” accounted for only 5% of words and a negligible share of bytes, but it was the only fully interactive channel before the internet era. TV remained the main source of information by time in 2008 (41% of all hours) and quantity of words (45%), however, in terms of data volume (bytes), television was only second (35%), behind computer games.

Now about games, quite interesting. The main finding from the report: Games generated (or did in 2008) 55% of all “bytes” consumed by households. Meanwhile, they only accounted for 8% of user time. This is quite a controversial topic in their report.

Those 100,500 words β€” that’s an assessment of actual words that a person either read or heard. This is not a metaphorical “equivalent,” but an attempt to calculate the verbal information precisely. They took the consumption time of each media and multiplied it by the average word inflow rate for that channel. Reading (books, newspapers, internet texts): 240 words per minute. Email and web surfing – 240 words per minute. Television (dialogues in shows/movies): 153 words per minute. Radio: 80 words per minute (less because of many pauses and music). Music: 41 words per minute (song lyrics).

Link in the comments

The Evolution of Personalized Video Advertising | December 14 2025, 17:08

I kept seeing ads for an AI language tutor that I ignored, and the system forgot about me for a while before coming back with a noticeably older tutor.

But really, how soon will video advertising become personalized for us? Where in the same ad, New Yorkers will see their city, black people will see black people, in the morning the main character will be drinking coffee, and a car with the logo of their alma mater will flicker in the background?

Harnessing GPU Power Beyond Machine Learning: A Data Processing Experiment | December 13 2025, 01:16

Torturing my supercomputer. Illustration that the GPU is not just for machine learning and some complex math.

My script takes a thick English dictionary (Webster) and multiplies it by 30, creating a list of 12 million words. Then, the algorithm looks through all 12 million words and replaces all the vowels with asterisks using regex. To add more load, a “word length” column is added, and then we take words longer than 10 letters and find the most frequent (top 5).

So, in Python this is

df[‘masked’] = df[‘text’].str.replace(r'[aeiou]’, ‘*’, regex=True)

df[‘len’] = df[‘masked’].str.len()

res = df[df[‘len’] > 10][‘masked’].value_counts().head(5)

and this code is executed first through the main processor, then through a GPU.

The main processor (I have the top-tier Intel i9 285k) completes this task in 24 seconds, while the Nvidia RTX 5090 does it in 0.51 seconds. That’s a 46 times difference!

[Pandas CPU] Top Patterns:

masked

s*r w. sc*tt. 23280

s*r t. br*wn*. 23220

j*r. t*yl*r. 16140

bl*ckst*n*. 10860

b***. & fl. 10830

Name: count, dtype: int64

[Pandas CPU] Computation Time: 23.5596 sec.

Transferring data to GPU…

Transfer complete in 1.16s

— Running Benchmark: cuDF GPU —

[cuDF GPU] Top Patterns:

masked

s*r w. sc*tt. 23280

s*r t. br*wn*. 23220

j*r. t*yl*r. 16140

bl*ckst*n*. 10860

b***. & fl. 10830

Name: count, dtype: int64

[cuDF GPU] Computation Time: 0.5108 sec.

TOTAL SPEEDUP: 46.12x

Misadventures in AWS: Misusing aws-nuke for Configuration Exports | December 12 2025, 16:29

Just for laughs. I asked Gemini how to export the entire AWS configuration for local analysis, and they recommended using the aws-nuke command for permanently deleting everything, but if you add a dry-run flag, you’ll get the configuration… and someone actually follows such advice πŸ™‚ and then we wonder

Stages of Understanding Scientific Papers | December 10 2025, 19:38

As I periodically read scientific papers on my topic, I will try to articulate the levels of understanding the truth.

Level 0: “Read Later Folder” Downloaded the PDF, the title sounds genius, the abstract seems like the solution to all my problems. The file is forever buried in the ~/Downloads/Papers/ToRead folder.

Level 1: “Sumerian Cuneiform” Don’t understand anything at all. Random symbols, the Greek alphabet is over. “Orthogonal extrapolation of cognitive entropy within a quasi-stationary discourse inevitably induces a bifurcation of transcendental synergism.” Such materials really lower self-esteem. Most often from this level, you either fall back to zero, or gradually move to the second level.

Level 2: “Illusion of Competence” The Abstract is clear, the Introduction reads like a good detective story. But as soon as the main section starts, the text turns into a pumpkin. I can’t paraphrase it in my own words, only in general phrases: “Well, they trained a neural net… kind of.”

Level 3: “Formulas where needed and where not” The Abstract is clear, the first half of the article is also okay (architecture, pictures). But then comes formula (4), where “magic” happens. I take the authors’ word for it that equation (3) leads to (4) because, of course, I won’t check it. Beyond that β€” sheer horror and belief in a miracle.

Level 4: “Goldfish Effect” While reading β€” everything is crystal clear. The logic is solid, conclusions are obvious, the authors are smart. I close the tab, someone asks me, “What was the article about?” β€” and I freeze. My mind goes blank. If you take away the paper, I can’t reproduce even the idea because there essentially isn’t an idea, there is a process.

Level 5: “Armchair Expert” Everything’s clear, I can retell the essence over a beer. I know that Input transforms into Output, but the “black box” inside is still black. Give me a computer, I wouldn’t be able to reproduce even the skeleton because, it turns out, the article lacks half of the important stuff.

Level 6: “Critic-Practitioner” Everything is clear, I can recount, understand how to reproduce (even without their code). I see where they cut corners. I definitely know that the “state-of-the-art” result is achieved only thanks to a lucky seed or dataset and this strange trick in preprocessing, mentioned in the footnote on page 12.

Level 7: “Deconstructor” Hooray, I’ve understood everything and implemented it myself. It works worse than in the article, but I know why. However, I understand this work better than the second author (who just made charts). I see that all this complex mathematics over 5 pages boils down to two paragraphs in the middle.

Level 8: “Nirvana” The article is trivial. The idea is secondary, it was all in the ’90s with Schmidhuber, just named differently. Formulas are overcomplicated for importance. I can write the same in 10 lines of code and it will work faster. Reject.

If anything — I’m stuck somewhere between 2 and 4.

Living Without Autopilot: A Surprising Reunion with My Tesla’s Upgraded Skills | December 09 2025, 19:30

Lived several months without autopilot in the car, now I turned it on, and during this time the car has learned not only to drive to a location across the city and through backroads, but also to find parking at the destination and park itself. But when I told it to come home, specifically pointing it to where it gets fed (charger), it stopped in front of the neighbor’s house. Makes you think;) but overall, very cool, Tesla

In-Flight French: Building a Language App on the Fly | December 01 2025, 15:45

By the way, yesterday morning, while waiting at the gate for my flight to Miami, I quickly wrote a French language learning app using Gemini based on an idea I sketched out to a friend while driving to the airport, and then used this app during the flight.

The idea is that in an unfamiliar foreign language text, the user first marks unknown words and then sees their translations β€” but without the original text, and then returns to the text itself β€” but no longer seeing the translations. It’s as if the “dictionary was in the next room.” The hypothesis is that this method helps better memorize than when the translation is shown immediately upon clicking on a word, and when no effort is needed.

I am pleased that creating the app from scratch to the finished version took only about 35-40 minutes, and then I used it for some time during the flight, without the internet. Since all translations of all words/phrases were already made in advance.

I just deployed it on Render. It’s also nice that demonstrating the code in action was free and took another 10 minutes.

https://readandlearn.onrender.com/

Unleashing the Power: RTX 5090 for Advanced AI and Digital Art Creations | December 01 2025, 01:39

Nvidia RTX 5090 32Gb! Happy as an elephant. Installed ArchLinux and CUDA. Planning to soon get smart about boosting transformer deep neural networks and have a bunch of ideas for digital art based on concepts other than diffusion models.

Performance: Just ran a test, model GPT_OSS_20b_UD_Q4_K_XL generates 350 tokens per second with a context of 131072 tokens. That’s roughly an A4 page in a few seconds. Gemma3 27B – 55 tokens per second. Qwen3_30B_A3B_Q6_K – 259 tokens per second.