Challenges of Training a Shiba Inu with Unpredictable Eating Habits | November 11 2024, 16:22

This explains why training our Shiba Inu is such a challenge. Food generally doesn’t motivate him. It’s been at least 12 hours since he last ate. We had breakfast long ago, and lunchtime is approaching. And here you are, bringing him warm boiled meat, which generally he likes, but if it isn’t his usual mealtime, he doesn’t understand why he’s been given meat when he didn’t ask for it. And his response is like — what’s this for, just put it in the bowl, I’ll eat it eventually. And it’s been this way all 3.5 years. Moreover, he almost always eats when someone is at home. If nobody’s around, he’d rather sleep. So leaving food for him and going away almost guarantees you’ll come back to find it untouched. Overall, he enjoys tasty food, and when it is indeed time to dine or have dinner, he eats with great pleasure whatever you give him.

In general, when he doubts whether to eat the meat from the bowl or not, and after thinking it over decides to leave, the trick is to pull out a piece of meat and offer it from your hand. If he eats it (and if he’s already by the bowl, he’s more likely to eat from your hand), his decision will likely change. And within a minute, the bowl will be empty.

Or take cheese, for instance. On one hand, when we pour some wine and get a cheese platter to make watching a series or movie more fun, Yuka also comes over to watch the cheese, drooling copiously, ready to eat a kilogram of it at any time. But you need to pour the wine and turn on the projector. If, however, you bring cheese at some random time or anytime outdoors, his reaction to the cheese will be the same as to a stone.

Monkeys Released from Research Center During New Presidency | November 07 2024, 13:59

During Trump’s presidency, it was COVID. This time – we’re releasing 40 monkeys from a research center. 🙈🙉🙉🙊🙉🙈🙉🙊🙊🙉🙈🙉🙊🙊🙈🙉🙊🙉🙈🙉🙊🙉🙉🙊🙈🙈🙊🙉🙈🙉🙊🙊🙉🙈🙉🙊🙉🙈

https://www.nbcnews.com/news/us-news/monkeys-escape-alpha-genesis-research-facility-south-carolina-rcna179077

https://www.nbcnews.com/news/us-news/monkeys-escape-alpha-genesis-research-facility-south-carolina-rcna179077

Enhancing an EPUB Converter for Complex Texts | October 30 2024, 22:46

I have enhanced my EPUB converter for reading complex English literary texts. In the previous version, I used to send chapters to ChatGPT, asking it to translate (in brackets) the difficult words. I was asked in the comments how the difficult words are determined. In general, after having read the first quarter of the book this way, I realized that not all difficult words are considered difficult by ChatGPT, including some obviously complex ones, which it doesn’t translate.

Ultimately, I made a new version. Visually, it differs in that translations now appear above words. This arrangement does not break the sentences into pieces like when the translation was in brackets. But that’s not all.

I have changed the method for identifying “difficult words requiring translation.” It now operates with a list of 300,000 words based on their frequency of use in the English language. The first 3.5% of this frequency-sorted list (determined empirically) are now considered simple and do not require translation. The rest do. Technically, I also have a difficulty group for each word rated 1-30, but unfortunately, I cannot highlight them in colors in Books.

Then, the word needs to be translated into Russian somehow. To avoid using LLM for this, I found Müller’s dictionary with 55,954 words. The word that needs translation is put into its normal form and searched in the dictionary. If found, the first definition from the dictionary is taken. Unfortunately, the first one is not always correct, but it works most of the time. If Müller’s dictionary does not have it, the system moves to LLM. Here, I have two implementations – using local LLAMA3 and using OpenAI. The local one is obviously slower and the translation quality worse, but it is free. There is a separate system that checks what LLAMA3 has translated and makes it redo it if it returns something inappropriate (e.g., too long or containing special characters).

In addition, for LLM-based translations, the system is provided with more context — the sentence that contains the word to be translated. This makes the translation closer to the text. There are still minor flaws, but they are generally livable.

However, even with all this, the translation via LLM is of low-quality. Ideally, additional dictionaries should be connected so that if a word is not found in Müller’s, other dictionaries are tried, and only then, if still not found, would we use LLM. I’ve already acquired one and will be experimenting.

If the system tags too many obvious words, I can adjust a coefficient, and the frequency group from which words are not translated will be larger, and surely these obvious words will stop being translated. Of course, there are always “rare” words that do not need to be translated because their translation is obvious. But it’s not easy to teach the script to recognize such instances; it’s easier to just leave it as it rarely happens.

Next, the translation is displayed above the word. For Books, this also involves some complex maneuvers, but it eventually worked on both iPad and laptop. Unfortunately, for the phone, it needs to be done slightly differently, so the book version for the phone and the version for iPad/computer will be different. But this doesn’t really bother me much, what’s the difference.

Navigating the Complexity of Global Numerical Nomenclatures | October 29 2024, 22:55

Russian TV channels have demanded two undecillion (2*10^36) rubles from Google. But what amused me was something else — technically, Google, or rather Googol, stands for 10^100. So, they’ve got plenty left in reserve.

But it was also interesting to learn that for large numbers there are two different systems of nomenclature. They diverge starting from billion, which in one system is 10^9 (equivalent to a milliard in the other) and in the other system it’s a trillion, which is 1000 times more, and this trillion in the second system means quintillion in the first, and so on, ultimately making the undecillion of the first system equivalent to sextillion in the second. It’s quite a mess, really.

The complexity is further increased by a third variant, called “the first, but not quite” — with the amendment that 10^9 is still considered a milliard, not a billion.

Different countries historically use different scales. The first, which is called the short scale, has primarily been adopted in English-speaking countries. In their scale (thus, ours) — 10^9 is a billion. In the Arab world, it’s generally a milliard (مليار), like in Saudi Arabia, it’s a billion (بليون).

Russia is also among those using the short scale. Hence, they demand an undecillion from Google, not a sextillion.

The second scale, which is long, is used by the Danes, French, Germans, Portuguese, and Spanish. For them, 10^9 is called a milliard with adjustments for pronunciation and grammatical representation in the language.

And then there’s a slew of exceptions, including countries that don’t fit into either of these two “camps.”

But what’s even more interesting is that until 1974, Britain called a billion a milliard, a trillion was known as a billion, and a quadrillion as a billiard. In 1974, they officially switched to the short system.

Canada faces the toughest situation. There’s already confusion with units, and the big number systems add to the mix. Officially, it adopts the short system, like the US, but due to bilingualism (English and French) and significant cultural influence from France, you might occasionally encounter the long system. South Africa is in a similar situation.

Curiously, the only article about this in French (and it says sextillion!) — is from RT. No one else in the world seems to care about this stuff. 🙂

Exploring the Evolution of Typewriters and Their Impact | October 29 2024, 01:17

I just found out that IBM used to manufacture mechanical typewriters, which a) had a Backspace key b) featured a moving print head.

The 1984 model is called IBM Correcting Selectric III. It has an intriguing way of deleting a letter – it strikes the paper with a special adhesive tape that removes the ink without a trace.

Interestingly, in 1976, the USSR developed a keylogger for American typewriters and somehow installed them in the typewriters at the US Embassy. It is reported that many secrets were uncovered this way.

I was also curious about how they managed with this in Japan and China. Their typewriters don’t have a thousand buttons. Believe it or not, they have a single button. But. A thousand squares where they aim the “sight”. Well, I mean, there are different kinds, also like usual ones, but there are models where it’s like this (attaching a few photos). There is even a model with a cylinder that holds 2400 Japanese characters, and you need to rotate and shift the cylinder for each character. I’ll leave a video in the comments. A very elegant engineering solution.

Moreover, in 1947 in China, the Mingkwai typewriter was invented and released, which theoretically allowed typing up to 90,000 characters at a speed of 50 characters per minute. Imagine what an engineering feat that was for the time. You press a key – nothing happens, something clicks inside the typewriter. You press a second time – something else clicks, but this time options that meet the criteria set by those two presses appear on the screen. And the third press essentially selects one of these characters. Meanwhile, the screen… what screen in 1947… It was a window through which characters from a large set were displayed. One character – three presses.

Only today did I realize that the Shift key is called Shift because it physically shifted the basket on typewriters. And while I’m at it, I’ll write about the Return or CR key – carriage return (known as Enter), which is so named because it physically returned the carriage to the beginning of the line. And the underscore (_) was invented to underline previously typed words.

It’s also interesting that the QWERTY layout was dictated by the need to spread frequently consecutive characters further apart to prevent the levers from crashing into each other during fast typing.

My introduction to typewriters in childhood, it seems, began with electric ones, although, of course, I also typed on mechanical ones. Interestingly, Friedrich Nietzsche’s encounter with the typing machine also started with electric ones. I read that he had the first shrivekugel.

In New York, I once saw a store (the only one I know of) that still trades typewriters.

Another interesting fact: when Edwin Hunter McFarland was developing a typewriter for Thailand, he ran out of keys for two consonants (“ฎ” and “ฅ”), and ultimately they disappeared from the language.

Also interesting is that the record for typing speed of 216 words per minute was set 78 years ago by Stella Pajunas-Garnand on a typewriter. In 2005 Barbara Blackburn came close (212 wpm), and in 2019 Anthony “Chark” Ermolin broke the record (233 wpm). Interestingly, such championships are organized by the company daskeyboard, I have two keyboards from them at home and am thinking of buying a third (by the way, has anyone bought one recently?)

In the comments, links to various things from ^^^^

Exploring Brilliant Mechanisms with Alec Watson: A Must-Watch Video | October 27 2024, 20:49

If you enjoy brilliant mechanisms, then this video is for you. I’m utterly fascinated by such things, so Alec Watson is a must-see for me. I’m also subscribed to the Russian translations, and today it just popped up again, reminded me. The original video is about five years old. Here I’m posting the Russian translation, but really you should watch it on @technologyconnections

https://www.youtube.com/watch?v=zeWGsZGABDE

The Bitter Lesson: ABBYY’s Decline and the Shift in Computational Linguistics | October 27 2024, 12:43

Very interesting material about the decline of ABBYY and the crisis in computer linguistics, how AI is taking over ABBYY’s business and what Compreno is and why it didn’t take off as expected.

https://sysblok.ru/blog/gorkij-urok-abbyy-kak-lingvisty-proigrali-poslednjuju-bitvu-za-nlp/

Enhancing “Lolita”: Automated Annotations for Easier Reading | October 27 2024, 03:40

After reading the first few dozen pages, I almost considered giving up on “Lolita” because I had to consult the dictionary way too often. Well, additionally, there was studying various sentence structures and references, but that’s actually interesting, although it does slow down the reading.

Then I thought, well, am I not a programmer or what. So together with ChatGPT, we created automated annotations. First off, it’s worth mentioning that “Lolita” has an annotated version with 200 pages and an extensive introduction of 100 pages. These annotations cover many topics, but they rarely clarify obscure words, assuming the reader is educated enough to understand that conspicuousness (/kənˈspɪkjuːəsnɪs/) means noticeability, thingamabob is a thingamajig, and callipygian means the same as callipygous, translating to “having perfect buttock form”. For instance, at the very start of the book, “My father was a gentle, easy-going person, a salad of racial genes: a Swiss citizen, of mixed French and Austrian descent with a dash of the Danube” — I wondered what this Danube was, and it turns out to be the river, Dunai in Russian, which in my version now appears in grey brackets after Danube.

Ultimately, in addition to the existing annotations, my script also adds translations into Russian in italic brackets, and it also includes some opinions on individual phrases and references — for this, after a sentence, something is added in brackets, which you need to click on.

With such enhancements, reading becomes much easier. And more interesting too