About vocabulary in the English language and learning the language. Yesterday, for educational purposes, I watched the 2000 film “The Miracle Worker” about the childhood of Helen Keller. The English there is quite simple and understandable, with an interesting biographical plot.
Today, I decided to answer the question of how rich the vocabulary in the film is. How many words do you need to know to at least understand the subtitles?
Using simple tools (Wget, Perl, Bash), I did the following:
* Downloaded the subtitle file,
* Extracted only the dialogues. There were 1409 dialogues,
* Split them into words. Removed special characters, punctuation marks. Removed repeated words (left with unique ones). This resulted in 1095 words,
* Tried to remove word forms as much as possible (plural forms, past tense, adverbs, ing-forms). Ended up with 982 words,
* Fed them to the Lingvo translator. It recognized and translated 898 words,
* If you remove single-letter words, which I missed early on, it comes out to about 875-880 words.
The file with the complete list of words and translations, as well as a link to the film online, is on my blog http://beinginamerica.com/2016/02/26/how-many-words-you-need-to-know/.
So, to understand absolutely everything in the film 100%, you need a vocabulary of only 880 words, right?
http://beinginamerica.com/2016/02/26/how-many-words-you-need-to-know/
