(TIL) In my spare time from work, I watch Lavrenko’s lectures. This morning, I listened to the lecture “Laws of the Text”.
For example, did you know that there’s something called Zipf’s Law, stating that the frequency of the n-th word in the list of most common words of any language is roughly inversely proportional to its rank n?
Or here’s the empirical Benford’s Law: in number tables based on data from real-life sources (anything from electricity bills to house numbers in cities) the digit 1 appears at the beginning much more often than all the others (approximately in 30% of cases), the digit 2 appears more often than, for instance, 8 and so forth. Simply put, Benford’s Law can be described thus: there are always more small things in the world than large ones. The explanation for Benford’s Law lies in the fact that quantities in this world tend to grow exponentially, not linearly. Very intriguing.
Or take Heaps’ Law. The number of unique words in any text with N words follows the pattern f(N) = k*N^b, where b is most often equal to 1/2.
These laws allow, for example, to check data or a text for “naturalness”.
Or another example. For any very rare word, the probability that it will occur in a text is very low, which makes sense. But if this word does appear in the text, the likelihood of it appearing again is very high.

