Bridging Brain Functions and Language Models through Predictive Processing | February 09 2025, 21:39

Here is the requested translation with your style and HTML markup preserved:

I’ve been thinking that understanding how large language models (LLM; like ChatGPT) function explains how our (at least my) brain probably works, and vice versa—observing how the brain functions can lead to a better understanding of how to train LLMs.

You know, LLMs are based on a simple logic—choosing the appropriate next word after N known ones, forming a “context”. For this, LLMs are trained on a gigantic corpus of texts, to demonstrate what words typically follow others in various contexts.

So, when you study any language, like English, this stage is inevitable. You need to encounter a stream of words in any form—written or spoken—so that your brain can discover and assimilate patterns simply through observation or listening (and better yet, both—multimodality).

In LLMs, the basic units are not words, but tokens—words and often parts of words. After processing this vast corpus of texts, it turned out to be straightforward to find simply the most common sequences, which of course turned out to be somewhere full words, and sometimes parts of words. So, when you start to speak a foreign language, especially with a system of endings, you begin to pronounce the beginning of a word, and your brain at that moment boils over the “calculation” of the ending.

When we read text or listen, we actually don’t analyze words letter by letter, because very often important pieces just disappear due to fast or unclear speech, typos. But the brain doesn’t need to sift through all the words that look or sound like the given one, it needs to understand whether what is heard or seen matches a very limited set of words that could logically follow the previous one.

It’s a separate story with whole phrases. In our brain, they form a single “token”. That is, they are not broken down into separate words, unless you specifically think about it. And such tokens also appear in the stream not accidentally—the brain expects them, and as soon as it hears or sees signs that the phrase has appeared, the circle of options narrows down to literally 1-2 possible phrases with such a beginning, and that’s it—one of them is what was said or written.

But the most interesting thing is that recent research has shown: the human brain really works very similar to LLMs. In the study “The neural architecture of language: Integrative modeling converges on predictive processing”, MIT scientists showed that models that better predict the next word also more accurately model brain activity during language processing. Thus, the mechanism used in modern neural networks is not just inspired by cognitive processes, but actually reflects them.

During the experiment, fMRI and electrocorticography (ECoG) data were analyzed during language perception. The researchers found that the best predictive model at the time (GPT-2 XL) could explain almost 100% of the explainable variation in neural responses. This means that the process of understanding language in humans is really built on predictive processing, not on sequential analysis of words and grammatical structures. Moreover, the task of predicting the next word turned out to be key—models trained on other language tasks (for example, grammatical parsing) were worse at predicting brain activity.

If this is true, then the key to fluent reading and speaking in a foreign language is precisely training predictive processing. The more the brain encounters a stream of natural language (both written and spoken), the better it can form expectations about the next word or phrase. This also explains why native speakers don’t notice grammatical errors or can’t always explain the rules—their brain isn’t analyzing individual elements, but predicting entire speech patterns.

So, if you want to speak freely, you don’t just need to learn the rules, but literally immerse your brain in the flow of language—listen, read, speak, so that the neural network in your head gets trained to predict words and structures just as GPT does.

Meanwhile, there’s the theory of predictive coding, asserting that unlike language models predicting only the nearest words, the human brain forms predictions at different levels and time scales. This was tested by other researchers (google Evidence of a predictive coding hierarchy in the human brain listening to speech).

Briefly, the brain works not only to predict the next word, but as if several processes of different “resolutions” are launched. The temporal cortex (lower level) predicts short-term and local elements (sounds, words). The frontal and parietal cortex (higher level) predicts long-term and global language structures. Semantic predictions (meaning of words and phrases) cover longer time intervals (≈8 words ahead). Syntactic predictions (grammatical structure) have a shorter time horizon (≈5 words ahead).

If you try to transfer this concept to the architecture of language models (LLM), you can improve their performance through a hierarchical predictive system. Currently, models like GPT operate with a fixed contextual window—they analyze a limited number of previous words and predict the next, not exceeding these boundaries. However, in the brain, predictions work at different levels: locally—at the level of words and sentences, and globally—at the level of entire semantic blocks.

One of the possible ways to improve LLMs is to add a mechanism that simultaneously works with different time horizons.

Interestingly, can you set up LLM so that some layers specialize in short language dependencies (e.g., adjacent words), and others—in longer structures (e.g., the semantic content of a paragraph)? I google it, and there’s something similar in the topic of “hierarchical transformers”, where layers interact with each other at different levels of abstraction, but still, it’s more for processing super-long documents.

As I understand it, the problem is that for such, you need to train fundamental models from scratch, and probably, this does not work well on unlabelled or poorly labelled content.

Another option is to use multitask learning, so that the model not only predicts the next word, but also tries to guess what the nearest sentence or even the whole paragraph will be about. Again, google search shows that this can be implemented, for example, through the division of attention heads in the transformer, where some parts of the model analyze short language dependencies, and others predict longer-term semantic connections. But as soon as I dive into this topic, my brain explodes. It’s all really complex.

But perhaps, if it’s possible to integrate such a multilevel prediction system into LLMs, they could better understand the context and generate more meaningful and consistent texts, getting closer to how the human brain works.

I’ll be at a conference on the subject in March; will need to talk with the scientists then.

Navigating Life with ChatGPT: My AI Assistant Addiction | February 05 2025, 21:04

So, I’ve developed a bit of a ChatGPT addiction. It has overtaken Google and Facebook and is slowly creeping into all areas of life.

(Specifically, I use not only ChatGPT because for certain needs we have to use an analog developed by our engineers on our internal corporate network, so everything below is not only about ChatGPT, but about AI assistants in general. But for personal needs, it’s only ChatGPT for me.)

(1) Over the last six months, I’ve probably created a couple hundred Python scripts for data processing. I didn’t write any of the scripts myself (although I could; ask me again in a year or two, I might no longer be able to). To write a script for processing data, I just clearly state what I need, then closely examine the result, and if I like it, I run it. If it doesn’t work, and something needs tweaking, I tweak it myself. If it’s completely off, I ask for it to be redone. Most often, I end up with what I need. Example: read a CSV, create embeddings for all lines, cluster them, then write the results in separate files with the cluster number in the name. Or implement some complex data grouping.

I must mention bash commands separately. For example, I can’t recall how to sort lines from a file by length using command line and get the longest ones. Or I’m too lazy to remember detailed syntax for awk or jq to process something from the files through a pipe, it’s easier to ask ChatGPT.

(2) Lately, I frequently translate between Russian and English using LLMs. Rather than writing something in English myself, it’s easier to write it in Russian, get the translation, and then throw it into an email. It’s simply faster. It’s not even about the proficiency in English – of course, I could write it all myself. It’s about how much time is spent on phrasing. The argument “it’s twice as fast and clearer” beats all else. A downside—my English isn’t improving because of this.

(3) Generally, I run nearly 100% of the English texts I write through various LLMs, depending on the type of text. I ask them to correct the grammar, then copy-paste the result wherever I need—into an email or a Jira ticket. It seems I will soon have an anxiety that I sent something unreviewed. Because they always find something to correct, even if it’s just a minor thing like a missing article or a comma.

(4) When I’m too lazy to read large chunks of English text, I frequently throw them into ChatGPT and ask for a summary—sometimes in Russian. Can’t do this for work because the texts are often from clients, but if it’s really necessary, I also have access to a local LLM.

(5) I’m increasingly validating various design decisions (not visual design, but software design) through ChatGPT/LLM. I ask for criticism or additions. Often, the results make me think about what needs to be improved or what assumptions need to be added.

(6) I also use it for summarizing YouTube videos. Just download the subtitles in TXT format through Youtube subtitle downloader, throw them into an LLM, and then you can request summaries or ask questions based on them. It really helps to decide whether to watch the video or not.

What are your usage patterns?

Describing 20 Countries in Two Words Each | January 29 2025, 21:22

Here is the translated text with the original HTML markup preserved:

I asked ChatGPT to pick 20 countries and describe them in two words.

Contemplatively-tranquil country

Indomitably-defensive country

Profoundly-decaying country

Technologically-scientific country

Academically-philosophical country

Passionately-creative country

Eco-progressively country

Legendarily-touristic country

Aristocratically-financial country

Scorchingly-royal country

Soccer-sportive country

Iron-isolated country

Sacredly-religious country

Brightly-explosive country

Disciplined-collectivist country

Pastorally-peaceful country

Fiery-geyser country

Fjord-fairytale country

Alcoholically-reckless country

Rainily-emerald country

Write how many countries you did NOT guess right 🙂

Modern Take on Theodora: Opera, Martyrs, and Pole Dancing | January 28 2025, 01:55

I finished “Theodora”. It’s a three-hour opera in a production by the Royal Opera House. About Christian saints and martyrs Theodora and Didymus, who lived in the 4th century in what’s now modern Syria. On stage – prostitutes, pole dances, a bomb, essentially, the full package.

And yes, originally it’s not an opera, but an oratorio, meaning originally on stage there is a chorus that sings for three hours, and nothing else happens. In the production, however, the oratorio is decked out like an opera, plus a bit more.

In short. The plot. Briefly. Valens, the Roman envoy, forces everyone to worship Roman gods, and threatens to execute those who refuse. Theodora, a Christian, does not comply. Her lover, Didymus, secretly converted to Christianity, tries to save her by disguising himself in her dress. In the end, Theodora surrenders to the enemies to save Didymus, and both die as martyrs for their faith. Afterwards, they were canonized by Christians in gratitude.

The oratorio is in English. That’s unusual in itself. Well.. in English. “Vouchsafe, dread Sir, a gracious ear. Lowly the matron bow’d, and bore away the prize…”. English from three hundred years ago. I understood “Carmen” in French with subtitles better. But no matter, there are translations you can hold in your hand and glance at one-eyed, plus everything happens veeery slowly there.

So, what we have here. A classic plot on a religious theme. In Katie Mitchell’s production, they decided to break all norms at once, making the oratorio into an opera and also setting it in modern times. It turned out pretty cool, actually.

Katie Mitchell situates the action in, as they called in an Alicante publication, a “Putin-like” embassy in Antioch, where rooms function as a brothel. This is the first theatre piece to involve an intimacy coordinator for sex and violence scenes (Ita O’Brien).

Valens, the Roman envoy in Antioch, wears a red sweater. He hasn’t heard of the #MeToo movement, hence the brothel accommodates “comfort women” for him and his bodyguards. They in red lingerie dance on poles in the red room (kind of a striptease; Holly Weston and Kelly Vee).

Next, we are introduced to Septimius, Valens’ head of security. His task is to ensure that all citizens publicly worship Roman gods as a sign of loyalty. Otherwise – death.

Here comes Didymus, one of the bodyguards. Didymus used to believe in Roman gods but secretly converted to Christianity. He’s in love with the Christian Theodora, the head of the household staff at the embassy.

Theodora plans an assassination attempt on Valens with a homemade explosive. They actually assemble it on stage with duct tape and some stuff.

Septimius uncovers the conspiracy and defuses the bomb. Theodora’s punishment – she becomes a “comfort woman”. For this, they dress her up as Marilyn Monroe. Oh, actually, it seems more like Louise Brooks, but never mind, they look alike.

Then the drama continues with an escape, Didymus saves Theodora, then the other way around. But ultimately, as in all operas, things end up not very well, but specifically in Mitchell’s production, good prevails over evil.

The role of Didymus is played by Jakub Józef Orliński. He has a beautiful scene where he changes into heels and a shimmering dress, in which he continues to perform until the end of the opera.

Jakub has a rather unusual voice. He is a countertenor. It’s the highest male voice. After castrati fell out of favor – quite rare. Google it, his voice is very beautiful. I’ll leave a few links in the comments.

One of the scenes towards the end reminds me of the café scene from “Pulp Fiction”.

The first performance of “Theodora” was in London, at the Royal Theatre in Covent Garden in 1750, and this production 272 years later comes from there too. Quite symbolic. True, back then it flopped – almost no audience. But now, it’s a classic.

The Unpredictable Rise of a Small Jewish Sect Over the Roman Empire | January 19 2025, 14:54

“How many Romans or Jews in the time of Tiberius could have predicted that a small Jewish sect would ultimately conquer the Roman empire, and emperors would forsake the old Roman gods to worship a crucified Jewish rabbi?”

Indeed, a good question. As far as I am aware, at present, there is no religion that continues the traditions of ancient Roman or ancient Greek polytheism in their original form. Curiously, why is that?

I think that religions without centralized power simply stand no chance. On the other hand, what about Hinduism and Taoism? I’m not well-versed in this subject, but it’s interesting. I had never contemplated how it turned out that a dominant religion across a vast territory was completely obliterated.

Exploring the Boundaries of AI in Dreyfus’s Pioneering Work | January 10 2025, 01:40

Currently skimming through a book, Dreyfus (1972) – What Computers Can’t Do – The Limits of Artificial Intelligence. In it, across 300 pages, the author convincingly, with numerous references to scientific papers, argues that, for example, programming a chess game is impossible, and intuitive and situational human tasks, such as understanding natural language, are even more profoundly unprogrammable.

The conclusion of the book is that instead of striving for complete autonomy, AI researchers should focus on enhancing human intelligence and exploring the fundamental differences between human and machine minds. They should probably read this book first.

And 53 years later, I am using AI to translate and extract key ideas from this book.

Hubert Dreyfus passed away 7 years ago. Overall, he probably began to suspect long ago that things were not as he had written in the book, because in 1992 he wrote a second series “What Computers Still Can’t Do”.

But the funniest thing is that the 1972 book was printed in Russian in 2010 and can be purchased; it is still widely sold on “Ozone” for 976 rubles. Labeled as NEW!

Navigating Nabokov’s Narratives: A Journey Through “Lolita” and Beyond | January 09 2025, 00:51

I finished reading Nabokov’s “Lolita.” Started it in the original English, sporadically switched to the Russian translation, and fully switched to it in the second part.

In brief: it’s Lynch’s “Mulholland Drive” in prose due to the convoluted plot and “Leon” for its straightforwardness.

Indeed, the novel’s title features what is essentially a secondary character. The novel is not really about Dolores Haze. Essentially, it’s Humbert’s confession, as Humbert himself titled this book within a book.

I must admit, it feels like I missed half of the subtext, surely so obvious to more sophisticated readers.

Did Quilty exist? Was there an Annabel Lee? And overall, can Humbert be trusted? Is there anyone good in the novel at all?

“Lolita,” like the “The Defense” I read before it, is largely about form, not plot. It’s about “how,” not “what.” Why does Nabokov remind me of Lynch here? Because both seem to overestimate their audience – reader and viewer, respectively. They believe that the intricacies and minutiae cannot only be noticed but also not fail to be seen how beautifully they come together into a pattern and change, like a prism, a generally simple plot.

I was “re-reading” “The Defense” while listening to the audiobook on a drive from New Orleans. 12 hours. For instance, I noticed a reference to the very ending of the book (which you simply don’t know at first reading) at the very beginning, and then essentially a foreshadowing of what the plot would end up like — a book in a book, which is part of the plot (trying to avoid spoilers here). As the author himself wrote: “A book should not be read — it can only be re-read. A good reader, a choice reader, an active and creative reader, is a re-reader.”

Well, now “Pnin” is next in line. Wish me luck — its complexity promises an even bigger challenge. And after that, I might dare to take on “The Gift” — I foresee drowning there altogether.

Victor Nenko: A Russian Artist’s Journey in New Orleans | January 04 2025, 17:36

Strolling along Royal Street in New Orleans, we passed a variety of galleries. At one point, Nadia pointed to a painting and said, “Can you paint me something like that? I liked it too, so we stepped inside, and—what a surprise!—the artist spoke Russian. Meet Victor Nenko.

Victor’s works are strikingly expressive, quick, and vivid, mostly done in acrylic. “I gave up on oil paints—they’re harmful, breathing in all those chemicals! Acrylic is a different story, he said. Originally from Siberia, Victor moved to the U.S. nearly 30 years ago. He started out painting portraits of passersby on the street, and now he owns a gallery in the French Quarter of New Orleans.

“I have a degree in architecture, but for years, people kept telling me, ‘Why stick with architecture when you’re clearly drawn to painting? Just paint!’ But back in those days in Russia, it was almost impossible to make a living from art. So I moved to the U.S.

We felt his style perfectly suited New Orleans, especially the French Quarter. While we were in his studio, several people bought prints. “Prints—that’s what pays the bills. Paintings sell less often, he remarked. On Royal Street, it’s hard to find two galleries alike, just as it’s rare to see two identical houses in the French Quarter.

There’s little information about Victor Nenko (Puzanenko) online beyond his artwork and official social media. But perhaps that’s how it should be—an artist’s work speaks for itself.

We left with warm and pleasant impressions.

Posts like this are grouped under the hashtag #artrauflikes, and all 137 of them can be found on the “Art Rauf Likes” section of beinginamerica.com—unlike Facebook, which tends to overlook (or neglect) nearly half of them.

Unexpected Discoveries from a Pair of Tights | January 02 2025, 22:17

The last thing you expect to find in a package of women’s tights is a salad recipe. With mozzarella and avocado, no less.

So, I delved into a new topic for me and uncovered quite a lot.

For instance, it turns out that lycra is not the name of the fabric, as I always thought, but a brand name for the fabric known as spandex, which I would have guessed was the brand name. And by the way, spandex is officially an anagram of the word expands. In Europe, spandex is also known as elastane. There is a brand of elastane called Elaspan belonging to The Lycra Company. All in all, it’s complicated.

By the way, this spandex was invented by Joseph Shivers, just two hours away from me, in Waynesboro, VA.

In the USA, tights are extremely unpopular. Moreover, if they are worn, they tend to be black and nearly opaque. It’s almost impossible to find nude and sheer ones (at least around here). You can buy anything on Amazon, but you’ll never find them in stores. Apparently no one is interested in salad recipes. However, leggings have conquered the market. Especially lululemon. Meanwhile, women’s high-heeled shoes and tight mini-dresses are also extremely unpopular and are only worn about three times a year. For example, by schoolgirls for Prom and Homecoming—but even then, without tights. Both the shoes and dresses are often of very poor quality, but they suffice for two or three times a year, after which new ones are simply bought.

That same day in New Orleans at the antique M.S. Rau I saw a thing, the name (darner) and appearance of which seemed very perplexing to me. I went to Google it, and searches for darner only show dragonflies. Turns out, a stocking darner is a tool for darning stockings. Specifically, this glass darner, looking like a ball on a handle, was being sold at M.S. Rau for $4400. Google shows mostly metal darners, looking like a hoop with brackets. In Soviet times, they were “mushrooms”.

There’s also a linguistic aspect. In English, tights are called both tights and pantyhose. Generally, pantyhose are considered to be thin (8-40), and tights—are thick (40+). In British English, the word “tights” covers the entire spectrum of tights, unlike in American English.

Interestingly, back in 1972, Australian lifeguards came up with the idea of wearing tights to protect against potentially fatal stings from box jellyfish (sea wasps). Funny, but it’s supposed to help.

All in all, at 47, I’m discovering new horizons, and I hope you found this interesting too.