Language – Hi, I'm Rauf Aliev.

Navigating Simple English in “Project Hail Mary” | May 10 2026, 15:30

I’ve read about a quarter of Project Hail Mary so far. The English is very simple, easy to read, captivating; the movie so far follows the book closely, but still, it makes reading quite interesting. However, I generally find it hard to read fiction because I keep getting distracted to google stuff. I reached the phrase “..I used the bathroom (or “head” I guess, because I was on the ship)…” and it got me thinking, it’s interesting to learn that the toilet is called differently on a ship not just in Russian. And why “head”? Turns out that “galley” in Danish and German is “head”. Interestingly, galleys are also found on airplanes, and historically, galleys were used only by sailors; officers did not use them.

The text is very childish, and understandably so – the main character is a physics teacher at a school after all. All these motherfluffer and dang it, gosh darn it, fudge, holy moly, for cripes’ sake instead of for Christ’s sake, there’s even bull-puckey instead of bullshit. “To go wee” is how they say “to pee” in the book. I recall, the day before yesterday we entered a mattress store, and the consultant, while discussing the topic “if one of you goes to the toilet, the other won’t even notice that the first one got up” – well, because the mattresses are so soft – actively used the verb “to pee”. So what? 🙂

Update: when the physics teacher encounters an alien ship on page 120, the chapter ends with holy fucking shit! That’s what all the rest was leading to;)

Occasionally, there are quite funny expressions that can even be used in life 🙂 For instance, the main character asks, “Who pooped in your Rice Krispies?” which is the idiom “to poop in someone’s cereal” – “who messed up your meal”.

In conclusion, if you’re choosing your first book to read in English – this one is at the top of my list. Even something seemingly simple like “Harry Potter” is more sophisticated, in my opinion. Here, there’s a lot of dialogue, school level but almost slang-free vocabulary, and a pretty interesting plot. Plus, it’s real science fiction, where the author educates the reader about the scientific method, how the world works, etc., all from the viewpoint of the hero, a physics teacher, who shares various facts and thoughts on how physics works, relating it to the plot in his interactions with other characters or thoughts to himself (rather than directly to the reader). It’s middle school level so far, but maybe it’ll get more complex later on.

Exploring Word Clusters in Religious Texts from Gutenberg’s Library | May 02 2026, 03:28

It’s interesting that if you take 8000 books from the Gutenberg library and construct a graph for each based on word connections to see how “friendly” words are—if word A often appears with B, and B with C, then how often does A appear with C? There’s a metric for this—the average clustering coefficient. Then, simply sort the books by decreasing this coefficient, about 70 percent of the top will be religious books—bibles, the Book of Mormon, the Quran. Well, some of them are duplicates in a sense, because a Bible in different formats remains the Bible. But clearly, its different parts are grouped together, meaning, they definitely share commonality in these triangular words.

But what unites all the books in this top— is that they were written many years ago or, as in the case of The Night Land, written relatively recently in the same style as many years ago.

By the way, among these books shines An Introductorie for to Lerne to Read, To Pronounce, and to Speke French Trewly. This is a French language textbook, written in English during the Tudor times (around the 1530s). Soverayn lorde kyng Henry the Eight. It was written by Gilles Du Guez—a French teacher at the English court. This particular textbook was compiled for Princess Mary (the future Queen Mary I, known as “Bloody Mary”), the daughter of Henry VIII. Check out a page from the textbook. Very cool English 🙂 …ye must pronounce it letyng your lippes jointe close, so that there be but a lyttell hole in the middes.

So, I delved into this textbook. It mentions a fruit called “openarses.” As you understand, this is “open arses” in English. In Tudor England, they called a medlar an openarse. If you Google what a medlar looks like, you’ll have no questions why it’s called openarses 😉

In the anatomical section (MEMBRES LONGYNG TO MANNES BODY), the author mentions next to the eyes and ears “the nether beerde” (literally— “the lower beard”).

Misadventures in Keyboard Layouts: Searching for Gremlin, Finding Surprises | April 28 2026, 20:33

This is me typing the word gremlin, without switching the keyboard layout. Wanted to read about the query language for graph databases, need it for work. Google surprises, it does surprise

Navigating Nabokov: A Companion Glossary for “Lolita” | April 08 2026, 11:24

I have finally finished the book The Reader’s Glossary – essentially a 5200-word dictionary for “Lolita” by Nabokov, but organized not alphabetically, like regular dictionaries, but in order of the occurrence of complex words, divided by chapters and indicating the context of the word or phrase. The website – readersglossary dot com (see the first comment). It is expected to be used, among other things, as a companion book while reading the original. Yes, it’s twice as thick 🙂

The dictionary turned out quite thick – 600-700 pages. It is available in four languages – Russian, English, French, and German. Moreover, the translations (RU, FR, DE) or clarifications (in ENG) are not abstract but contextual, taking into account how Nabokov himself translated the fragment from English (“Lolita” was first written in English, then translated into Russian).

On my website, there are huge fragments of these dictionaries RU, FR, DE, EN available for review (each about 1/3 of the total volume).

There is also a full-fledged interactive dictionary on the site, where you can enter a word and see its translation or explanation. The dictionary mainly contains complex words, but we know that complexity has its own definition for everyone, so all words are divided into three categories and highlighted with different frames. Probably for a well-read Anglophone, the first category (dotted) is completely useless (about 50% of the dictionary), for the less-read, maybe 20% are useless. But I decided not to cut it further, because the book is not only for Anglophones but also for those for whom English is a second language, and there those dotted frames are very handy.

Overall, I did this “for myself and friends,” just for fun, not as a commercial project. Therefore, I am quite sober in understanding that it has a super niche audience, and if even once a week someone finds it useful, it’s already nice.

Although it was something like a hobby, the book took a lot of time. To achieve what I did, I developed a dozen applications/scripts, a couple of which have their own interactive UI, in which I spent many hours over two months of work. And of course, I learned a lot in the process, which is actually the main fun of it.

So, come to the website – readersglossary dot com. Link in the comments

P.S. In Russian – only as a PDF for now. Amazon doesn’t allow selling books in Russian, only in a small number of European languages in addition to English. The French and German versions of the dictionary will be released on Amazon about a week from now.

Navigating the Lexical Complexity of Nabokov’s “Lolita” | April 02 2026, 15:56

I’ve finished the first version of a dictionary-style book on Nabokov’s “Lolita”. The chart shows how the complexity of vocabulary is distributed across the pages of the book. The lower chart averages 25 sentences, displaying the number of complex words on the vertical axis, with colors indicating their complexity/rarity (purple – the most complex, red – less complex, yellow – even less so). But I have already removed two levels, and overall, for a foreigner, all five levels are challenging. In the book, level 3 is marked with a dashed line, level 4 with a simple frame, and level 5 with a double frame. Currently, there are 5794 words, of which 541 are fifth level, 1070 are fourth, 1883 are third, 1393 are second, and 54 are first (the simplest ones). Considering that the first version ended up being 1148 pages, the dictionary will need to be significantly streamlined by removing what can be dispensed with. This mainly pertains to the first and second levels, and some from the third and fourth. The rarity of words is calculated in three ways: through LLM, and through two lists of word frequencies in the English language corpus (300K words).

Not all words are complex. For instance, in the sentence “With the ebb of lust, an ashen sense of awfulness, abetted by the realistic drabness of a gray neuralgic day, crept over me and hummed within my temples.” someone well-acquainted with English might not know the words ebb, abet, drabness, while everything else is familiar, but lower the requirements for the reader, and the dictionary might not be very useful for such cases.

Or consider the sentence:

Homo pollex of science, with all its many sub-species and forms; the modest soldier, spic and span, quietly waiting, quietly conscious of khaki’s viatric appeal; the schoolboy wishing to go two blocks; the killer wishing to go two thousand miles; the mysterious, nervous, elderly gent, with brand-new suitcase and clipped mustache; a trio of optimistic Mexicans; the college student displaying the grime of vacational outdoor work as proudly as the name of the famous college arching across the front of his sweatshirt; the desperate lady whose battery has just died on her; the clean-cut, glossy-haired, shifty-eyed, white-faced young beasts in loud shirts and coats, vigorously, almost priapically thrusting out tense thumbs to tempt lone women or sadsack salesmen with fancy cravings.

My browser even highlights four words here.

I have definitions of words in English, German, French, and Russian. I’ve encountered the issue that different words from the text are considered complex in different languages, yet they are unified for me. So, I’ll have to mark, for example, French words in the English text separately, so they are not included in the French version, since there, the reader knows, for instance, what quel mot means.

Overall, this weekend I’ll be manually removing about half, and then I can make the cover and list it on Amazon.

Exploring the Multifaceted Uses of “Oblong” in English and Russian | March 17 2026, 13:50

Sometimes in English, there are very unusual words that are very difficult to translate into Russian. Here, for example, is the word oblong. As an adjective, it translates as “elongated, oblong,” but in the book, both uses are nouns. Often oblong refers to a face – that is, close to an oval, but oblong is a broader concept that describes any figure having an elongated appearance. My mom bought an oblong tablecloth for her new table.

As a noun, it is also used, and quite frequently (though less so than as an adjective). As a noun, oblong means “a rectangular object or flat figure with unequal adjacent sides.” Rulers are considered elongated items (oblongs). Laptops, tablets, and flat-screen TVs are oblongs of different sizes. A rectangle can be defined as oblong; however, not all elongated figures are rectangles. The same face, for example. Additionally, in mathematics, an oblong number is what in Russian is called a rectangular number (the product of two consecutive numbers. For example, 12). In general, it’s utterly baffling.

The word has been alive since the 15th century, by the way. So, in my book, it appears twice, and both times as nouns. In the first case, Nabokov translated it as “corner,” and in the second – “a small oblong of smooth silver” as “a little piece.”

Exploring Multilingual Vocabulary in Nabokov’s Works with Apple Books | March 15 2026, 23:20

Man, it’s really convenient. Just sitting here reading.

The usage pattern is as follows: I hold the phone in my hands. There, in apple books, this and that book. You see an unfamiliar word – it will likely be in the word list of the chapter. The definition takes into account the translation by Nabokov himself. Then you look a couple words ahead, put the phone down, continue reading. You encounter those words, and they are still in your short-term memory, and hooray, you understand. During a break, you load the next couple of words into your brain. You have to hold the phone and flip through, each page contains 4-5 definitions.

Now, every word has definitions in English (interpretation), French, and German. Consequently, I can publish four books.

Overall, my level of English matches what my app predicts about which words will be challenging. But someday I’ll need the same for French, and it will require an assessment of the difficulty level for each word because even some basic words will be unclear to me. I’m not sure that a book with basic words will be handy. With rare ones – definitely handy.

Crafting Nabokov’s Dictionary: A Multilingual Lexical Journey | March 15 2026, 18:30

I’m reading Nabokov and decided to take a break to create a convenient app “Nabokov’s Dictionary” and am considering selling it on Amazon as a book. Essentially, it looks like this (see screenshot) – definitions of complex words in English, Russian, German, and French, in the same order they appear in the original book.

Would you buy such a book?

To accurately make their definitions, I also wrote an aligner – a program that matches sentences and paragraphs in English with their translations (Nabokovian) into Russian. And when a word’s definition is created, it uses not only the knowledge of LLM but also the Russian translation by the author. It’s worth separately discussing how the algorithm works (I invented it myself because everything I found online did not work as I needed). It first finds long sentences and matches the longest sentences with their pair through cosine similarity of embedding vectors created through the multilingual e5 model. These sentences become anchors. Then, assuming that for long sentences the error is almost excluded, the longest sentence between anchors is found, and everything repeats recursively. There are many situations where a sentence in Russian has no equivalent in English and vice versa, where a sentence is split into two, or conversely two are merged into one. The algorithm handles this as best as it can. The result is quite a good quality of alignment. To such an extent, that errors in alignment can hardly be found (but they are likely still there). Either way, it is only needed for the context for translating words, even if there are rare errors, it’s not a big deal.

Would you buy such a book?

The Curious Etymology of the Turkey: Naming Perceptions Across Languages | March 09 2026, 21:36

I wondered why turkey is called turkey here and what it’s called in Turkey. In Turkey, it’s called hindi – turkey! Decided to see what it’s called in India. Haha, in Hindi, it’s called Turkish (टर्की). Let’s see in other languages. Portuguese – Peru. That means, for them, it’s Peruvian. In Spanish – pavo, which refers to peacock 🙂 “pavone” in Italian – peacock. In French – dinde, because this bird came from the West Indies (America). Comes from poule d’Inde – “hen from India/West Indies”. Greek – “Γαλοπούλα” “French bird”.

Exploring Redundancy in Toponymy: From European Rivers to the Hill of Hills | March 08 2026, 02:54

Reading Nabokov, there “…with the dash of the Danube in his veins…”. Turns out, Danube is Дунай. But that’s okay, trivial stuff, the interesting thing is something else. That Don, Danube, Dniester, Dnieper, Donets, Dvina, and Disna essentially mean more or less the same thing – river. Apparently, the ancient people were not always rich in imagination when it came to toponymy. If you live by the water, you simply call it “River”. Over time, others came, heard this word, took it as a proper name, and altered it slightly to fit their accent. This way “River” (Danu) transformed into a dozen different names across the map of Europe.

The river Volga essentially is also just “river”. Okay, slightly different, “Volga” comes from the Proto-Slavic *Vòlga, which literally means “moisture” or “water”.

Also, it turned out that the Sahara desert is named so because Sahara (الصحراء) is desert. And the Gobi desert is called Gobi because Gobi in Mongolian is desert.

While googling, I stumbled upon another fun thing. There’s a place in England, Torpenhow Hill. The name is composed of four different linguistic layers: Tor — in Old English “hill”, Pen — in Cumbric “hill”, How — in Old Norse “hill”, Hill — in modern English “hill”. Result: “Hill-hill-hill-hill”. Likely, each new people arriving in this area didn’t understand that Tor, Pen, and How were already names for the hill, and added their variant of the word “hill”.