Prototype to Production: The Tale of the Worst E-Bike | February 11 2025, 23:02

A really cool video about what happens when you let a prototype into “prod.”

Here’s the original video about “the worst e-bike in history”: https://www.youtube.com/watch?v=AB7pBrudFbg

Essentially, the developers tackled a problem that didn’t exist. They decided to create the first bicycle with a futuristic hubless wheel. However, they didn’t think to alter the laws of physics. Which is a pity, because it would have really helped them. Besides that, they were just assuming it would be “good enough.”

In the video attached to the post, the guys disassemble this bike and show the engineering solutions inside. Essentially, it’s reverse engineering.

I fully understand that this is exactly how IT startups are done. But the bike example shows how poorly this approach translates to hardware.

Right now, such a bike is on sale about half an hour’s drive away for 120 bucks on Facebook Marketplace. Probably in the hope that some museum might buy it.

The video should be especially interesting to cyclists and engineers.

https://www.youtube.com/watch?v=MgPUpccQ_mw

Global Leaders in the Sneaker Market | February 11 2025, 22:05

Today we went shopping for sneakers, and I decided to investigate which countries are currently the world leaders in sneakers.

Overall, no surprises—the US is in the absolute lead. Germany and Japan are notable. The rest are catching up.

American brands—at least 9 of them: Nike (+Converse), New Balance, Brooks, Saucony (+Merrell), Reebok, Skechers, Vans, Hoka. Purely sport-wise, probably 7 from the list.

Japanese—Asics, Mizuno.

German—Adidas, Puma (by the way, both founded by the Dassler brothers, yet they are competitors). Swiss—On. Korean—Fila.

Of course, production is all in China, Vietnam, Indonesia.

Personally, I’ve been buying almost exclusively Asics for a long time. They are very comfortable, although the design is so-so, a mere pass.

By the way, want an interesting fact you probably didn’t know? The thin layer of felt on the sole of Converse sneakers was added (at least as of 10 years ago—it was added) not for functional reasons but for economic ones. Footwear with a fabric sole was subject to lower customs duties when imported compared to footwear with a rubber sole because it was classified as slippers. And the duty was reduced from 37.5% to 3%.

Who else from other countries – are there any brands that are very noticeable and popular in your markets, and have yet to make it to the US?

A Walk Through the Pentagon: A Glimpse Inside America’s Defense Headquarters | February 11 2025, 21:23

Today, I walked through the corridors of the Pentagon.

The Pentagon is the headquarters of the United States Department of Defense, located in Arlington. It is the second-largest office building in the world, built in the shape of a pentagon.

There will be no photos because they asked to leave phones and even Apple Watches at the entrance. But honestly, there’s not much to capture. It’s not that the spectacle is utterly dreary, but overall, 90% of the corridors (and there are 28 kilometers of them) look almost the same as 90% of the corridors in any American university. That is, everything is clean, bright, tidy, and that’s it. The only difference is that at a university, you find bulletin boards with interesting things on the walls, but in the Pentagon, there are no boards in the corridors, everything is hidden. Everything else is the same. Endless doors of heightened dreariness with numbers and code locks, some corridors adorned with patriotic installations. I’m sure there’s a lot of interesting stuff behind many of these doors, but to enter many of them, you need to leave your phone out in the corridor (and I remind you, I left mine at the entrance).

About 26,000 people work in the building. About a third of them are civilians, the rest are military. Although the Pentagon is located in Arlington, Virginia, it has a Washington address — 1400 Defense Pentagon, Washington, DC 20301-1400. It’s said that the Pentagon has six Washington ZIP codes, and that the US Secretary of Defense, the Joint Chiefs of Staff, and each of the four branches of the armed forces have their own ZIP code (like 20301, 20318, 20310, 20330, 20350, and 20380).

The building was constructed in 1943, so at that time they built separate restrooms for blacks and whites due to segregation. Of course, it’s not like this anymore.

Since 26,000 people work in the building — that’s essentially the population of a small town, and parking there is quite limited (large, but still insufficient), there’s a metro station serving the Pentagon that’s practically unnecessary for anything else. Inside the perimeter, there’s everything needed to last until the end of the workday — Subway, McDonald’s, Dunkin’ Donuts, Panda Express, Starbucks, Sbarro, KFC, Pizza Hut, and Taco Bell, pharmacies, even a Best Buy.

From an architectural perspective, it’s a very interesting project. Look, with such a number of people and such expanses, you can get from any point to any point in no more than 10 minutes. No elevators, just wide corridors and stairs. Even in some emergency evacuations, rescuing people would be much easier. Although, of course, there was a sad experience in 2001 — remember, the plane hijacked by terrorists crashed into the building. Then, a hundred and fifty Pentagon employees died, and of course, everyone on board that plane.

Around the Pentagon is Crystal City — a typical city with shopping centers and multi-story residential complexes of varying degrees of luxury, and on the other side is Arlington National Cemetery, where 400,000 people are buried.

Exploring Sous Vide: Adding to My Kitchen Gadget Collection | February 11 2025, 02:55

Well, now I’ve finally gotten around to sous vide. As a result, the kitchen’s electrical gadgetry involved in cooking now includes the Power Quick Pot electric pressure cooker, the Ninja air fryer, the Crock-Pot slow cooker, and now the Anova sous vide. Made my first steaks, they turned out awesome, but next time instead of 150 F (65C) I’ll set it to 140F (60 C).

Alphabet Recall: A Simple Technique for Remembering Forgotten Words and Numbers | February 11 2025, 02:23

I have a life hack for recalling a forgotten word that works quite reliably in my case. Maybe, it will work for yours too.

It involves listing the letters of the alphabet, trying to recall that specific word by asking myself “does it start with A? B? C?”. And on the letter that the word actually starts with, I remember it entirely.

For instance, today I needed to recall a band from the 90s. I remembered nothing. No song titles, nothing I could quickly find by Googling. But I had a certain “picture” in my head. Probably, if I had struggled a bit more, I would have come up with search queries that would lead me where I needed, but I pulled out this technique and started going through the letters.

And as I was going through A, B, C, … at the letter K I remember — “Karmen”!

Sometimes, rarely, a “second pass” is necessary. Of course, it doesn’t always work, but on the other hand, if there’s absolutely no system, it’s unclear how to recall anything at all. This system exists, it’s a starting point, and it quite often works.

And as for remembering short numbers, to later recall them more easily, I mentally draw a zigzag line navigating the keypad of a button phone. This results in a visual squiggle, which serves as an additional mnemonic to the numbers. True, unlike the first approach, I use this one rarely, because in life, there’s rarely a need to remember and then recall numbers.

Surprising Facts About Nature and Science | February 10 2025, 22:11

Live a century, learn a century.

Strawberries and wild strawberries are not berries, but nuts. More precisely, not the fruits themselves but the seeds, and the pulp is the receptacle. Potatoes are bi-locular berries. A pear is an apple. Cherries, plums, apricots, peaches are all drupes. They are divided into one-seeded (e.g., cherry, plum, peach, coconut) and many-seeded (e.g., raspberry, blackberry, cloudberry). Bananas are berries. Pineapple is a grass. Watermelon is a berry (a type of pumpkin). Almonds are not nuts, but a dry fruit. Apple seeds, and the pits of cherries, apricots, peaches, or plums contain cyanides (amygdalin converts in them). Just like in almonds. Chocolate contains theobromine – a couple of bars might be lethal for a dog or close to it, half a bar will definitely knock it down. Vanilla is made from a Mexican orchid vine, while vanillin, an artificial vanilla substitute, is a byproduct of the pulp and paper industry.

There is no such animal as a panther. In popular usage, “panthers” are black jaguars or leopards. Black panthers also have spots, they’re just less visible. Polar bears have black skin and transparent fur. And they are white for the same reason clouds are white. Woodpeckers have tongues four times the length of their beaks, wrapped around their skulls that can stretch out. The tongue of the European green woodpecker goes down into the throat, stretches across the back of the neck, around the back of the skull under the skin, across the crown between the eyes, and usually ends right under the eye socket. In some woodpeckers, the tongue exits the skull between the eyes and enters the beak through one of the nostrils.

Anteaters have their tongues attached to their sternums, between the clavicles. Elephants are the only animals with four fully-developed knee joints. Koalas have fingerprints that are almost indistinguishable from human ones. Sharks have no bones and their closest relatives are rays. Crocodiles can go without eating for a whole year (but they feel blue). Zebras are black with white stripes, not the other way round (white appears on black skin). 1% of people have cervical ribs. Squids, cuttlefish, and octopuses can edit their RNA “on the fly”.

As it turns out, René Descartes invented the Cartesian coordinate system for Russia and for the rest of the world. Since Descartes’s name is Descartes, i.e., Des Cartes, it corresponds to Cartesian.

Bridging Brain Functions and Language Models through Predictive Processing | February 09 2025, 21:39

Here is the requested translation with your style and HTML markup preserved:

I’ve been thinking that understanding how large language models (LLM; like ChatGPT) function explains how our (at least my) brain probably works, and vice versa—observing how the brain functions can lead to a better understanding of how to train LLMs.

You know, LLMs are based on a simple logic—choosing the appropriate next word after N known ones, forming a “context”. For this, LLMs are trained on a gigantic corpus of texts, to demonstrate what words typically follow others in various contexts.

So, when you study any language, like English, this stage is inevitable. You need to encounter a stream of words in any form—written or spoken—so that your brain can discover and assimilate patterns simply through observation or listening (and better yet, both—multimodality).

In LLMs, the basic units are not words, but tokens—words and often parts of words. After processing this vast corpus of texts, it turned out to be straightforward to find simply the most common sequences, which of course turned out to be somewhere full words, and sometimes parts of words. So, when you start to speak a foreign language, especially with a system of endings, you begin to pronounce the beginning of a word, and your brain at that moment boils over the “calculation” of the ending.

When we read text or listen, we actually don’t analyze words letter by letter, because very often important pieces just disappear due to fast or unclear speech, typos. But the brain doesn’t need to sift through all the words that look or sound like the given one, it needs to understand whether what is heard or seen matches a very limited set of words that could logically follow the previous one.

It’s a separate story with whole phrases. In our brain, they form a single “token”. That is, they are not broken down into separate words, unless you specifically think about it. And such tokens also appear in the stream not accidentally—the brain expects them, and as soon as it hears or sees signs that the phrase has appeared, the circle of options narrows down to literally 1-2 possible phrases with such a beginning, and that’s it—one of them is what was said or written.

But the most interesting thing is that recent research has shown: the human brain really works very similar to LLMs. In the study “The neural architecture of language: Integrative modeling converges on predictive processing”, MIT scientists showed that models that better predict the next word also more accurately model brain activity during language processing. Thus, the mechanism used in modern neural networks is not just inspired by cognitive processes, but actually reflects them.

During the experiment, fMRI and electrocorticography (ECoG) data were analyzed during language perception. The researchers found that the best predictive model at the time (GPT-2 XL) could explain almost 100% of the explainable variation in neural responses. This means that the process of understanding language in humans is really built on predictive processing, not on sequential analysis of words and grammatical structures. Moreover, the task of predicting the next word turned out to be key—models trained on other language tasks (for example, grammatical parsing) were worse at predicting brain activity.

If this is true, then the key to fluent reading and speaking in a foreign language is precisely training predictive processing. The more the brain encounters a stream of natural language (both written and spoken), the better it can form expectations about the next word or phrase. This also explains why native speakers don’t notice grammatical errors or can’t always explain the rules—their brain isn’t analyzing individual elements, but predicting entire speech patterns.

So, if you want to speak freely, you don’t just need to learn the rules, but literally immerse your brain in the flow of language—listen, read, speak, so that the neural network in your head gets trained to predict words and structures just as GPT does.

Meanwhile, there’s the theory of predictive coding, asserting that unlike language models predicting only the nearest words, the human brain forms predictions at different levels and time scales. This was tested by other researchers (google Evidence of a predictive coding hierarchy in the human brain listening to speech).

Briefly, the brain works not only to predict the next word, but as if several processes of different “resolutions” are launched. The temporal cortex (lower level) predicts short-term and local elements (sounds, words). The frontal and parietal cortex (higher level) predicts long-term and global language structures. Semantic predictions (meaning of words and phrases) cover longer time intervals (≈8 words ahead). Syntactic predictions (grammatical structure) have a shorter time horizon (≈5 words ahead).

If you try to transfer this concept to the architecture of language models (LLM), you can improve their performance through a hierarchical predictive system. Currently, models like GPT operate with a fixed contextual window—they analyze a limited number of previous words and predict the next, not exceeding these boundaries. However, in the brain, predictions work at different levels: locally—at the level of words and sentences, and globally—at the level of entire semantic blocks.

One of the possible ways to improve LLMs is to add a mechanism that simultaneously works with different time horizons.

Interestingly, can you set up LLM so that some layers specialize in short language dependencies (e.g., adjacent words), and others—in longer structures (e.g., the semantic content of a paragraph)? I google it, and there’s something similar in the topic of “hierarchical transformers”, where layers interact with each other at different levels of abstraction, but still, it’s more for processing super-long documents.

As I understand it, the problem is that for such, you need to train fundamental models from scratch, and probably, this does not work well on unlabelled or poorly labelled content.

Another option is to use multitask learning, so that the model not only predicts the next word, but also tries to guess what the nearest sentence or even the whole paragraph will be about. Again, google search shows that this can be implemented, for example, through the division of attention heads in the transformer, where some parts of the model analyze short language dependencies, and others predict longer-term semantic connections. But as soon as I dive into this topic, my brain explodes. It’s all really complex.

But perhaps, if it’s possible to integrate such a multilevel prediction system into LLMs, they could better understand the context and generate more meaningful and consistent texts, getting closer to how the human brain works.

I’ll be at a conference on the subject in March; will need to talk with the scientists then.

Nuclear Legacy: Carbon-14 and the Science of Dating Life | February 09 2025, 14:35

It turns out that nuclear tests between 1955 and 1963 left their mark in every living organism on Earth, and scientists are able to use this fact to determine the age of cells in any living (at that time) creature on Earth and the frequency of their renewal, which would have been significantly more challenging without the nuclear tests. There is even a specific term “C-14 bomb-pulse dating”.

This is how radiocarbon analysis works. From 1955 to 1963, the use of atomic bombs doubled the amount of carbon-14 in the atmosphere. Atmospheric carbon-14, which is usually only produced by cosmic radiation, reacts with oxygen, forming carbon dioxide (¹⁴CO₂). This ¹⁴CO₂ is absorbed by plants during photosynthesis and then transferred into the human body directly through plant food and indirectly through the meat of animals, aligning its quantity roughly with the concentration in the atmosphere. Animals eat these plants, and we eat these animals—thus carbon-14 becomes incorporated into our bodies, integrating into our tissues.

Most tissues in living organisms gradually renew over weeks or months, so the carbon-14 content in them corresponds to the current atmospheric level. However, tissues that either do not renew or renew very slowly will contain a carbon-14 level close to that of the atmosphere at the time they were formed. Thus, by measuring the carbon-14 content in the tissues of people who lived during and after the peak of the “bomb pulse”, the rate of replacement of certain tissues or their components can be precisely estimated.

This means that nuclear tests, inadvertently, have provided scientists with a way to understand when tissues are formed, how long they last, and how rapidly they are replaced.

It turns out that practically every tree that has lived since 1954 contains a “spike” – a kind of souvenir from the atomic bombs. Wherever botanists look, they find this marker. There are studies in Thailand, studies in Mexico, studies in Brazil—wherever you measure the carbon-14 level, it’s there. All trees carry this “marker”—trees of northern latitudes, tropical trees, rainforest trees—it’s a worldwide phenomenon.

But there’s a catch. Every eleven years, the amount of carbon-14 in the atmosphere halves. Once the carbon-14 level returns to its original value, this method will become useless. Scientific American explains that “scientists have the opportunity to use this unique dating method only for a few decades until the carbon-14 level returns to normal.” This means that if they want to use this method, they need to hurry. Unless there are new nuclear explosions—but no one wants that.

Besides, this method enables the determination of a person’s age through their teeth and hair. Once a tooth is formed, the amount of carbon-14 in its enamel remains unchanged, making it an ideal tool for dating. Because certain teeth form at specific ages, measuring the 14C content in different teeth can help researchers estimate a range of birth years. The same holds true for hair, which grows about 1 cm per month, and conclusions can also be drawn from the carbon content in different parts of the hair.

About one-third of an entire tooth, or 100 milligrams, is needed for dating the carbon in teeth. To prepare the sample, it is ground and dissolved in acid, which releases CO2. With hair, instead of dissolving it in acid, it is burned. As hair has a high carbon content, only 3-4 milligrams of hair is needed. CO2 from the tooth or hair sample is then reduced to graphite—a crystalline form of carbon—and placed in an ion source at CAMS, where neutral graphite atoms are ionized by giving them a negative charge. The accelerator can then use this negative charge to speed up the sample, enabling detection, counting, and comparison of carbon isotope ratios. On graphs, pMC represents the ratio of concentrations.

In the 1960s, when the concentration of C-14 was sharply changing, the method allowed the determination of tissue age to an accuracy of ±1 year. However, after 2000, as the C-14 levels evened out, the accuracy dropped to ±2–4 years.

Unpacking Hidden Data Collection in Mobile Apps | February 08 2025, 16:20

I recently stumbled upon an intriguing study on the Timsh org website, where the author dissected how applications collect and transmit your data. The experiment employed an old iPhone device and intercepted traffic. A certain random application was installed on the phone for the experiment—it was Stack by KetchApp. The author intercepted the traffic and observed what was transmitted from the application to the outside world. A lot of data was transmitted, even when answering “no” to the question “Allow tracking?”.

Specifically, the IP address (which allows your location to be determined via reverse DNS), approximate geolocation (even with geolocation services disabled),

device model, battery charge level, screen brightness level, amount of free memory, and other parameters.

The data does not go to the company that created the application, but rather to various third parties. That is, these third parties collect data from most of the applications on your phone, and the data flows occur every time the application operates.

The author writes about two major groups of players – SSP and DSP.

SSP (Supply-Side Platforms) include those that collect data from the application—Unity Ads, IronSource, Adjust. There are also DSPs (Demand-Side Platforms), which manage advertising auctions, such as Moloco Ads, Criteo.

Advertisers gain access to the data through DSPs. Data brokers—aggregate and sell data. For example, Redmob, AGR Marketing Solutions. The latter sells databases that include PII, such as name, address, phone number, and even advertising identifiers (IDFA/MAID).

What data is sent? For instance, that Stack app from KetchApp sent to Unity Ads the geolocation (latitude, longitude), IP address (including server IPs, for example, Amazon AWS), unique device identifiers: IDFV (identifier for a specific developer) and IDFA (advertising identifier), as well as other additional parameters like the model of the phone, battery level, memory status, screen brightness, headphone connection, and even the exact system load time.

At DSPs, a RTB (real-time bidding) system exists for selling information. Data is transferred from the app via SSP (such as Unity Ads), and then to DSP (such as Moloco Ads), where auctions are held in real time to display relevant advertising. At each stage, data is transmitted to dozens, if not hundreds, of companies.

Yes, by answering “I do not want to share data,” you only deactivate the sending of IDFA (advertising identifier), but other data, such as IP address, User-Agent, and geolocation, and all these phone model and free memory, are still transmitted. Combined, they serve as a fingerprint at the moment, just like the advertising identifier. If desired, applications can still identify you by many parameters: IP address, device model, OS version, fonts, screen resolution, battery level, time zone, and other data, as they receive this information from hundreds of other places. Another question is that “end applications” do not need this, it is not free, but those who show you ads need this, and they have this info. And, of course, various special services can easily access it if necessary.

If you use several apps from one developer, the IDFV identifier allows linking data from all the apps.

Perhaps it’s not a secret at all, but almost every app sends data to Facebook (Meta) without asking for the user’s consent. That is, if you have Facebook on your phone, then bingo, any data from any other apps begin to be tagged with your profile, even if you have forbidden sharing information in those apps.

Companies exchange user data with each other. For instance, Facebook exchanges information with Amazon, Google, TikTok, and mobile SDKs (such as Appsflyer, Adjust) perform cross-linking of users between different services because such exchanges enhance the value and quality of information immediately for all participants.

Meanwhile, it turned out that Unity, which actually deals with 3D engines for games, primarily earns from selling these collected data. Specifically, in 2023, they had revenue from this direction amounting to $2 billion (“Mobile Game Ad Network”). In 2022, Unity absorbed IronSource — another giant of mobile advertising. IronSource deals with analyzing user behavior and optimizing monetization, as well as selling data to advertisers. Now, Unity through LevelPlay can manage not just ad placement but also data aggregation, selling them to other companies.

A significant portion of mobile games are created on Unity, especially free-to-play games. This allows Unity to have access to data from millions of devices globally, even without explicit user consent. Developers often do not realize how deeply Unity tracks data in their games.

Conclusion: disabling ads or prohibiting tracking at the OS level is just a minor obstacle. Data about you is still being collected, analyzed, and transmitted to hundreds of companies.

See the link below