Bridging Brain Functions and Language Models through Predictive Processing | February 09 2025, 21:39

Here is the requested translation with your style and HTML markup preserved:

I’ve been thinking that understanding how large language models (LLM; like ChatGPT) function explains how our (at least my) brain probably works, and vice versa—observing how the brain functions can lead to a better understanding of how to train LLMs.

You know, LLMs are based on a simple logic—choosing the appropriate next word after N known ones, forming a “context”. For this, LLMs are trained on a gigantic corpus of texts, to demonstrate what words typically follow others in various contexts.

So, when you study any language, like English, this stage is inevitable. You need to encounter a stream of words in any form—written or spoken—so that your brain can discover and assimilate patterns simply through observation or listening (and better yet, both—multimodality).

In LLMs, the basic units are not words, but tokens—words and often parts of words. After processing this vast corpus of texts, it turned out to be straightforward to find simply the most common sequences, which of course turned out to be somewhere full words, and sometimes parts of words. So, when you start to speak a foreign language, especially with a system of endings, you begin to pronounce the beginning of a word, and your brain at that moment boils over the “calculation” of the ending.

When we read text or listen, we actually don’t analyze words letter by letter, because very often important pieces just disappear due to fast or unclear speech, typos. But the brain doesn’t need to sift through all the words that look or sound like the given one, it needs to understand whether what is heard or seen matches a very limited set of words that could logically follow the previous one.

It’s a separate story with whole phrases. In our brain, they form a single “token”. That is, they are not broken down into separate words, unless you specifically think about it. And such tokens also appear in the stream not accidentally—the brain expects them, and as soon as it hears or sees signs that the phrase has appeared, the circle of options narrows down to literally 1-2 possible phrases with such a beginning, and that’s it—one of them is what was said or written.

But the most interesting thing is that recent research has shown: the human brain really works very similar to LLMs. In the study “The neural architecture of language: Integrative modeling converges on predictive processing”, MIT scientists showed that models that better predict the next word also more accurately model brain activity during language processing. Thus, the mechanism used in modern neural networks is not just inspired by cognitive processes, but actually reflects them.

During the experiment, fMRI and electrocorticography (ECoG) data were analyzed during language perception. The researchers found that the best predictive model at the time (GPT-2 XL) could explain almost 100% of the explainable variation in neural responses. This means that the process of understanding language in humans is really built on predictive processing, not on sequential analysis of words and grammatical structures. Moreover, the task of predicting the next word turned out to be key—models trained on other language tasks (for example, grammatical parsing) were worse at predicting brain activity.

If this is true, then the key to fluent reading and speaking in a foreign language is precisely training predictive processing. The more the brain encounters a stream of natural language (both written and spoken), the better it can form expectations about the next word or phrase. This also explains why native speakers don’t notice grammatical errors or can’t always explain the rules—their brain isn’t analyzing individual elements, but predicting entire speech patterns.

So, if you want to speak freely, you don’t just need to learn the rules, but literally immerse your brain in the flow of language—listen, read, speak, so that the neural network in your head gets trained to predict words and structures just as GPT does.

Meanwhile, there’s the theory of predictive coding, asserting that unlike language models predicting only the nearest words, the human brain forms predictions at different levels and time scales. This was tested by other researchers (google Evidence of a predictive coding hierarchy in the human brain listening to speech).

Briefly, the brain works not only to predict the next word, but as if several processes of different “resolutions” are launched. The temporal cortex (lower level) predicts short-term and local elements (sounds, words). The frontal and parietal cortex (higher level) predicts long-term and global language structures. Semantic predictions (meaning of words and phrases) cover longer time intervals (≈8 words ahead). Syntactic predictions (grammatical structure) have a shorter time horizon (≈5 words ahead).

If you try to transfer this concept to the architecture of language models (LLM), you can improve their performance through a hierarchical predictive system. Currently, models like GPT operate with a fixed contextual window—they analyze a limited number of previous words and predict the next, not exceeding these boundaries. However, in the brain, predictions work at different levels: locally—at the level of words and sentences, and globally—at the level of entire semantic blocks.

One of the possible ways to improve LLMs is to add a mechanism that simultaneously works with different time horizons.

Interestingly, can you set up LLM so that some layers specialize in short language dependencies (e.g., adjacent words), and others—in longer structures (e.g., the semantic content of a paragraph)? I google it, and there’s something similar in the topic of “hierarchical transformers”, where layers interact with each other at different levels of abstraction, but still, it’s more for processing super-long documents.

As I understand it, the problem is that for such, you need to train fundamental models from scratch, and probably, this does not work well on unlabelled or poorly labelled content.

Another option is to use multitask learning, so that the model not only predicts the next word, but also tries to guess what the nearest sentence or even the whole paragraph will be about. Again, google search shows that this can be implemented, for example, through the division of attention heads in the transformer, where some parts of the model analyze short language dependencies, and others predict longer-term semantic connections. But as soon as I dive into this topic, my brain explodes. It’s all really complex.

But perhaps, if it’s possible to integrate such a multilevel prediction system into LLMs, they could better understand the context and generate more meaningful and consistent texts, getting closer to how the human brain works.

I’ll be at a conference on the subject in March; will need to talk with the scientists then.

Nuclear Legacy: Carbon-14 and the Science of Dating Life | February 09 2025, 14:35

It turns out that nuclear tests between 1955 and 1963 left their mark in every living organism on Earth, and scientists are able to use this fact to determine the age of cells in any living (at that time) creature on Earth and the frequency of their renewal, which would have been significantly more challenging without the nuclear tests. There is even a specific term “C-14 bomb-pulse dating”.

This is how radiocarbon analysis works. From 1955 to 1963, the use of atomic bombs doubled the amount of carbon-14 in the atmosphere. Atmospheric carbon-14, which is usually only produced by cosmic radiation, reacts with oxygen, forming carbon dioxide (¹⁴CO₂). This ¹⁴CO₂ is absorbed by plants during photosynthesis and then transferred into the human body directly through plant food and indirectly through the meat of animals, aligning its quantity roughly with the concentration in the atmosphere. Animals eat these plants, and we eat these animals—thus carbon-14 becomes incorporated into our bodies, integrating into our tissues.

Most tissues in living organisms gradually renew over weeks or months, so the carbon-14 content in them corresponds to the current atmospheric level. However, tissues that either do not renew or renew very slowly will contain a carbon-14 level close to that of the atmosphere at the time they were formed. Thus, by measuring the carbon-14 content in the tissues of people who lived during and after the peak of the “bomb pulse”, the rate of replacement of certain tissues or their components can be precisely estimated.

This means that nuclear tests, inadvertently, have provided scientists with a way to understand when tissues are formed, how long they last, and how rapidly they are replaced.

It turns out that practically every tree that has lived since 1954 contains a “spike” – a kind of souvenir from the atomic bombs. Wherever botanists look, they find this marker. There are studies in Thailand, studies in Mexico, studies in Brazil—wherever you measure the carbon-14 level, it’s there. All trees carry this “marker”—trees of northern latitudes, tropical trees, rainforest trees—it’s a worldwide phenomenon.

But there’s a catch. Every eleven years, the amount of carbon-14 in the atmosphere halves. Once the carbon-14 level returns to its original value, this method will become useless. Scientific American explains that “scientists have the opportunity to use this unique dating method only for a few decades until the carbon-14 level returns to normal.” This means that if they want to use this method, they need to hurry. Unless there are new nuclear explosions—but no one wants that.

Besides, this method enables the determination of a person’s age through their teeth and hair. Once a tooth is formed, the amount of carbon-14 in its enamel remains unchanged, making it an ideal tool for dating. Because certain teeth form at specific ages, measuring the 14C content in different teeth can help researchers estimate a range of birth years. The same holds true for hair, which grows about 1 cm per month, and conclusions can also be drawn from the carbon content in different parts of the hair.

About one-third of an entire tooth, or 100 milligrams, is needed for dating the carbon in teeth. To prepare the sample, it is ground and dissolved in acid, which releases CO2. With hair, instead of dissolving it in acid, it is burned. As hair has a high carbon content, only 3-4 milligrams of hair is needed. CO2 from the tooth or hair sample is then reduced to graphite—a crystalline form of carbon—and placed in an ion source at CAMS, where neutral graphite atoms are ionized by giving them a negative charge. The accelerator can then use this negative charge to speed up the sample, enabling detection, counting, and comparison of carbon isotope ratios. On graphs, pMC represents the ratio of concentrations.

In the 1960s, when the concentration of C-14 was sharply changing, the method allowed the determination of tissue age to an accuracy of ±1 year. However, after 2000, as the C-14 levels evened out, the accuracy dropped to ±2–4 years.

Luck Over Talent: Decoding the True Drivers of Success | February 08 2025, 00:51

A lengthy post on how to achieve success! For free! No registration or SMS required! I just stumbled upon a scientific study proving that the role of chance in success is greater than that of talent. And this resonated with my belief that successful people are successful because they are lucky, not because they are extraordinarily talented, smart, or unusual. Rather on the contrary, they are so because they’ve been lucky. Note, not because they are “lucky ducks,” but because they’ve been lucky. These are different things.

Let me argue this. There’s a study “Talent vs Luck: the role of randomness in success and failure,” authors Alessandro Pluchino, Alessio Emanuele Biondo, and Andrea Rapisarda. Yes, the funny part is that Alessandro received the Ig Nobel Prize for this work (“a symbolic award for scientific discoveries that ‘first make people laugh, and then make them think'”). They used agent-based modeling to analyze the contributions of talent and luck to success.

As initial data, they took supposedly objective things: talent and intelligence are distributed among the population according to the normal (Gaussian) distribution, where most people have an average level of these qualities, and extreme values are rare, while wealth, often considered an indicator of success, follows the Pareto distribution (power law), where a small number of people own a significant portion of the resources, and the majority owns only a small share.

Further, the authors developed a simple model in which agents (1000) with varying levels of talent are exposed to random events over the hypothetical 40 years, which could be either favorable (luck) or unfavorable (misfortune). Each such event affects the “capital” of an agent, serving as a measure of his success.

Result: Though a certain level of talent is necessary to achieve success, it is often not the most talented individuals who become the most successful, but those with an average level of talent who experience more fortunate events. There is a strong correlation between the number of fortunate events and the level of success: the most successful agents are also the luckiest.

My observation of how the world works completely agrees with these conclusions. You just need to do things so that you’re more fortunate. That’s it. Don’t try to be the smartest—it doesn’t help as much as the following things do:

1) Being in environments where important events occur. Silicon Valley for startuppers. New York for financiers. Hollywood for actors. If an environment increases the chance of meeting “key” people, it makes sense to place oneself in that environment.

2) Creating more points of contact with the world and maintaining them. Running a blog, writing articles, giving interviews. Attending conferences, participating in communities. Calling and writing to acquaintances and semi-acquaintances, especially when such calls and letters are potentially important to them. Expanding the number of contacts—even if 99% are useless, 1% can change your life.

3) Increasing the number of attempts. The more projects, the higher the chance that one of them will “hit.” The best example – venture funds: they invest in dozens of startups, knowing that success will come from only one. Artists, writers, musicians create hundreds of works, knowing that only one will become a hit.

Unfortunately, for this point, you need to love your work. So choose a task where attempts are enjoyable.

Organizational psychologist Tomas Chamorro-Premuzic in his book “Why Do So Many Incompetent Men Become Leaders?” asserts that luck accounts for about 55% of success, including such factors as the place of birth and family wealth. This is true, but since you are sitting on Facebook on an iPhone with a cup of coffee and not herding cows in a loincloth in Africa, you already have pretty good initial conditions.

From here, an interesting conclusion — is it necessary to study at a university to achieve success in life? Look at the points above. Being in the right environment, creating more points of contact, increasing the number of attempts. Out of these three points, two work better in the case of face-to-face learning, while the third does not work well because the university consumes 4-5 years of life (and the university is one attempt). But the other two criteria are very important—during the period of study, the average student interacts with hundreds of peers, who can make a significant contribution to the likelihood of this student’s success.

But sitting at home with books for five years does not meet any criteria. Online education lies somewhere in between, see for yourself, it varies, but it’s closer to the option of “sitting with textbooks.”

The authors of the study confirmed the concept of “The Matthew Effect.” This is from the Bible: “For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken away even that which he hath.” (Matthew 25:29). They explain why success accumulates even if it is initially random:

People who are fortunate in the early stages receive more resources, opportunities, and attention. This, in turn, increases their chances for new fortunate events. As a result, those who were initially in a better position continue to build on their success, while the rest lag behind.

This explains why wealthy people often receive profitable investments, popular artists become even more popular, and less known ones remain in the shadows, and companies that “hit the stream” attract more customers and resources than their less fortunate competitors.

That’s why success also requires following the principle of “Fake it till you make it.” Successful people often exaggerate their skills or achievements, and then catch up to the proclaimed level. Society easily forgives and quickly forgets such things, but when they work (and they often do), the person no longer really needs them. There’s also a self-fulfilling prophecy—the idea that if a person states something as a fact (even if it’s an exaggeration), they and those around them start behaving as if it’s true, and eventually, it becomes reality.

There’s also the principle of “there’s no harm in asking” (It doesn’t hurt to ask). The principle is that if the likelihood of success is increased by asking someone a question (“can you raise my salary starting in March or put me in charge of that project”), then it’s worth asking. You never know unless you ask.

And one more thing. Act now, apologize later. Actions speak louder than words. As you know, being at the right time in the right place not only involves the right place (this is the first point from my list), but also the right time. Therefore, just do it. People who don’t dream but act never end up homeless on the street because they rushed.

And finally. Time is a finite resource. There was a good idea about the sheet with squares—google “90 years of life in weeks.” You can color the lived weeks and look at the remaining ones.

So, in summary.

Success is determined by luck, not talent. Talent helps, but is often formed under the influence of success. Knowledge is useful, but experience is more valuable. Time is a finite resource. Planning doesn’t work, three things do:

1) being in an environment where important events occur,

2) creating more points of contact with the world and maintaining them,

3) increasing the number of attempts where luck might work.

Three principles:

1) Fake it till you make it

2) It doesn’t hurt to ask

3) Actions speak louder than words

The Paradox of Software Complexity and AI’s Role in Legacy Systems | February 07 2025, 14:30

It is fascinating to observe how, with increasing complexity and over time, software transitions into a state of being “a thing in itself”, where even the developers do not fully understand how it works, or more precisely, why it sometimes suddenly malfunctions, and prefer to minimally interfere with it, leading them to understand it even less over time, and it solidifies into what it is for years. This process is known as software rot or legacy paralysis.

However, bosses and the market demand development, so instead of fundamentally changing and improving something, developers add “bells and whistles” which grow alongside, rather than changing the core product. It’s well understood that diving into the core product might set you on a path leading to disappointments, deadline failures, layoffs, etc.

Interestingly, with the advent of AI, this problem will only intensify on one hand because the team will understand even less about how things work, but on the other hand, complexity can be managed better because AI can analyze complex matters more easily than a single biological brain.

For instance, AI could be used to create tests for existing code, as well as to perform anomaly detection and potential bug hunting, for creating documentation and explaining the code structure from simple to complex, and it might partly automate refactoring and detect performance bottlenecks.

I believe such AI solutions for working with legacy will soon be a major market.

Edible Gold: A Luxurious Yet Ineffective Delicacy | February 03 2025, 21:58

Recently, I was surprised to discover that gold leaf is edible, and when you see golden flakes on a quality cake, it’s actually real gold, not just some props. Here’s a kebab from Arkadiy Novikov and Jihan Deniz costing 23,550 rubles.

Another revelation was that such gold is quite affordable. A single sheet of purest 99.8% gold, palm-sized, sells for just 4 bucks. It’s sold in very thin sheets—about 100-500 nanometers thick (depending on the manufacturer). 100 nanometers is 0.0001 millimeters. To compare, the thickness of writing paper is 130,000 nanometers, and a human hair is about 60,000 nanometers thick, while a gold leaf is only 100-500. If calculated, a sheet is approximately 600 atoms thick. Edible gold also comes in powder and flakes.

Turns out, this gold has its own E-number, E175 (while E174 is edible silver). Gold is not absorbed by the body at all; it passes through unchanged, so logically, it offers no benefits. However, sellers of edible gold claim its benefits are sky-high and it cures almost all ailments. Studies conducted in 1975 and 2016 showed, however, that there are indeed no health benefits.

The only benefit here is to show off your wealth and brag to your friends that you are, literally, pooping gold (remember, it’s not digested). Whether you should sift through your toilet matters looking for gold is up to you…

Interestingly, even anciently, gold sheets were somehow made to be 500 times thinner than a human hair.

The production of gold leaf started around the end of the third millennium BC when craftsmen learned how to purify the metal and hammer it into thin sheets. Traditionally, during the Middle Ages, gold leaf was prepared by rolling or hammering gold ducats (trade coins used in Medieval Europe) into approximately the thickness of foil. As the metal became thinner, it became more challenging to prevent the foil from sticking to nearby moist or greasy surfaces. To prevent this, “gold beaters would lay a small square of thin metal in the middle of a paper or parchment square and other metal squares on top of it in sequence, until a decent stack was formed; then they skillfully hammered it until the small squares of metal spread to the edges of the parchment.” Then these squares were cut into smaller squares, and the process was repeated. For the final stage of beating, when the gold reached its thinnest point, a special type of parchment called “goldbeaters’ skin” (made from the inner lining of calf’s intestine) was placed between the layers of foil. According to Cennino Cennini, about 145 sheets could be made from one ducat, and a Venetian ducat weighed about 54 troy grains. However, Cennini preferred his gold leaf to be thicker and recommended producing only 100 sheets from one ducat.

Exploring the Science and History of Superglue Through Personal Experience | February 03 2025, 21:11

Two and a half years ago, I printed this phone holder on an SLA printer, a holder of my own design. And then my cat broke it with its paw. I started to glue it together with superglue, and realized that this plastic does not bond very well (but it still bonded after sanding). I began to investigate why, and found a lot of interesting information about superglue.

How does superglue work? Inside the tube, it remains liquid and consists of molecules of ethyl cyanoacrylate monomers. When the glue is applied to a surface, it fills the pores and cracks, which must be present for the glue to work—hence the importance of roughly sanding the surface. The polymerization reaction begins due to contact with water (including moisture in the air). Therefore, you should not wash off the glue with water, as it will set even faster. Acetone can be used—unless the glue is in the eyes. And it does get into the eyes, often because it is packaged in a container that resembles eye drops. The molecules begin to connect with each other, forming long polymeric chains, transforming from a liquid to a solid state.

Thanks to rapid polymerization, the glue sets in 10–30 seconds.

The glue also poorly bonds hydrophobic surfaces, such as polyethylene, polypropylene, Teflon. They lack free electrons for the glue to interact with and do not absorb moisture, which is necessary to initiate the reaction.

Impacts and shearing—superglue works excellently under tension, but is very brittle under impacts and shearing. This is its weak spot.

– Cyanoacrylate was discovered accidentally by the photo company Eastman Kodak (specifically by Harry Coover), who was trying to create a transparent plastic for gun sights.

– Unlike most plastics, which deteriorate after being recycled, superglue can be heated to 210°C and decomposed back into monomers. These monomers can then be reassembled to create a new, durable material. This allows for the recycling of plastic without loss of quality.

– The properties of the glue caught the interest of the US Army, particularly during the height of the Vietnam War. Transportation of the wounded took just minutes, but many soldiers died from uncontrollable bleeding. Therefore, in 1966, the US Army sent a special surgical brigade to South Vietnam, armed with aerosol sprays of cyanoacrylate. Although this method was used in a limited number of cases, out of 30 documented cases of using the glue to stop bleeding, it was successful in 26. A safer surgical glue was invented in 1998.

And the green sphere at the bottom is also an interesting object, I also printed it. It is a spherical section of a gyroid. A gyroid is a continuous (without self-intersections) infinitely repeating structure in three dimensions without any reflection symmetries. It is, incidentally, the only such structure known to science. Overall, it’s a way to create elements with minimum weight and maximum strength. Inside, essentially, it consists of sinusoids along three coordinates.

Describing 20 Countries in Two Words Each | January 29 2025, 21:22

Here is the translated text with the original HTML markup preserved:

I asked ChatGPT to pick 20 countries and describe them in two words.

Contemplatively-tranquil country

Indomitably-defensive country

Profoundly-decaying country

Technologically-scientific country

Academically-philosophical country

Passionately-creative country

Eco-progressively country

Legendarily-touristic country

Aristocratically-financial country

Scorchingly-royal country

Soccer-sportive country

Iron-isolated country

Sacredly-religious country

Brightly-explosive country

Disciplined-collectivist country

Pastorally-peaceful country

Fiery-geyser country

Fjord-fairytale country

Alcoholically-reckless country

Rainily-emerald country

Write how many countries you did NOT guess right 🙂

Circular Glass Cracks: An Unusual Phenomenon | January 29 2025, 05:25

Look at how interestingly the glass has cracked. The crack goes in a circle. Usually, on glass, cracks tend to spread towards the nearest edge because that’s where the stress can be minimal. However, in this case, it seems that the crack stopped before reaching the edge of the glass, which is quite unusual.

Unfortunately, this is already the third glass with a similar crack; in the other two, the crack indeed completed the circle. The dishwasher, which torments the glass with its temperature, is to blame for everything. The second photo shows how it was originally

Modern Take on Theodora: Opera, Martyrs, and Pole Dancing | January 28 2025, 01:55

I finished “Theodora”. It’s a three-hour opera in a production by the Royal Opera House. About Christian saints and martyrs Theodora and Didymus, who lived in the 4th century in what’s now modern Syria. On stage – prostitutes, pole dances, a bomb, essentially, the full package.

And yes, originally it’s not an opera, but an oratorio, meaning originally on stage there is a chorus that sings for three hours, and nothing else happens. In the production, however, the oratorio is decked out like an opera, plus a bit more.

In short. The plot. Briefly. Valens, the Roman envoy, forces everyone to worship Roman gods, and threatens to execute those who refuse. Theodora, a Christian, does not comply. Her lover, Didymus, secretly converted to Christianity, tries to save her by disguising himself in her dress. In the end, Theodora surrenders to the enemies to save Didymus, and both die as martyrs for their faith. Afterwards, they were canonized by Christians in gratitude.

The oratorio is in English. That’s unusual in itself. Well.. in English. “Vouchsafe, dread Sir, a gracious ear. Lowly the matron bow’d, and bore away the prize…”. English from three hundred years ago. I understood “Carmen” in French with subtitles better. But no matter, there are translations you can hold in your hand and glance at one-eyed, plus everything happens veeery slowly there.

So, what we have here. A classic plot on a religious theme. In Katie Mitchell’s production, they decided to break all norms at once, making the oratorio into an opera and also setting it in modern times. It turned out pretty cool, actually.

Katie Mitchell situates the action in, as they called in an Alicante publication, a “Putin-like” embassy in Antioch, where rooms function as a brothel. This is the first theatre piece to involve an intimacy coordinator for sex and violence scenes (Ita O’Brien).

Valens, the Roman envoy in Antioch, wears a red sweater. He hasn’t heard of the #MeToo movement, hence the brothel accommodates “comfort women” for him and his bodyguards. They in red lingerie dance on poles in the red room (kind of a striptease; Holly Weston and Kelly Vee).

Next, we are introduced to Septimius, Valens’ head of security. His task is to ensure that all citizens publicly worship Roman gods as a sign of loyalty. Otherwise – death.

Here comes Didymus, one of the bodyguards. Didymus used to believe in Roman gods but secretly converted to Christianity. He’s in love with the Christian Theodora, the head of the household staff at the embassy.

Theodora plans an assassination attempt on Valens with a homemade explosive. They actually assemble it on stage with duct tape and some stuff.

Septimius uncovers the conspiracy and defuses the bomb. Theodora’s punishment – she becomes a “comfort woman”. For this, they dress her up as Marilyn Monroe. Oh, actually, it seems more like Louise Brooks, but never mind, they look alike.

Then the drama continues with an escape, Didymus saves Theodora, then the other way around. But ultimately, as in all operas, things end up not very well, but specifically in Mitchell’s production, good prevails over evil.

The role of Didymus is played by Jakub Józef Orliński. He has a beautiful scene where he changes into heels and a shimmering dress, in which he continues to perform until the end of the opera.

Jakub has a rather unusual voice. He is a countertenor. It’s the highest male voice. After castrati fell out of favor – quite rare. Google it, his voice is very beautiful. I’ll leave a few links in the comments.

One of the scenes towards the end reminds me of the café scene from “Pulp Fiction”.

The first performance of “Theodora” was in London, at the Royal Theatre in Covent Garden in 1750, and this production 272 years later comes from there too. Quite symbolic. True, back then it flopped – almost no audience. But now, it’s a classic.