To complete the estimate of your vocabulary you’ll need to know the total number of words in the dictionary – preferably without having to count them. This is quite easy: look up the number of the last page in the dictionary, and take that as the number of pages. Next, open the dictionary at random and count the number of different words listed on that page. Multiply the number of pages by the number of words per page, and you have an estimate of the number of words in the dictionary.
I thought I’d better test myself using this statistical sampling technique. The dictionary I used has about 60 entries on each page, and over 800 pages. That’s around 48,000 words altogether.
I opened the dictionary 125 times, and made a tick on a piece of paper if I knew the meaning of the word at the top of the page, and a cross if I didn’t. Like me, you’ll probably find it hard to stop yourself jumping ahead to other entries if the first is unfamiliar. Don’t – that’s cheating, and invalidates the statistical sampling!
The result: there were 25 words whose meaning I didn’t know. On that basis, my passive vocabulary is 48,000 multiplied by 100/125. That’s around 40,000 words. It sounds high, but it includes all the possible extensions of the stem of each word. For example, take the word ‘abstract’. The dictionary will include ‘abstractedly’, ‘abstractedness’, and so on. The number of stem words I know is a lot less than 40,000.
Still, I’m feeling pretty good about myself, so I’m going to exercise my gigantic male vocabulary by introducing the next chapter:
‘The, er, next chapter is, er, fucking interesting…’
‘IT IS A TRUTH UNIVERSALLY ACKNOWLEDGED THAT A SINGLE MAN IN POSSESSION OF A GOOD FORTUNE MUST BE IN WANT OF A WIFE.’
Some authors are instantly recognisable from their vocabulary. For example, everyone recognises the style of Jane Austen, and many would say that her writing’s distinguishing feature is its abundance of long words. But is this true? A bit of statistical analysis can reveal the answer.
The four longest words used by Jane Austen in Pride and Prejudice have 16 or 17 characters. They are ‘superciliousness’, ‘communicativeness’, ‘disinterestedness’ and ‘misrepresentation’. But just looking at the longest words is not enough: we need to examine the distribution of word lengths over her entire vocabulary, as shown in the graph below:
For comparison, here is the ‘fingerprint’ of the writer Ian McEwan, showing that his vocabulary includes many shorter words:
And, what about this book? In this work I intend to speak with candour, and without misrepresentation or superciliousness , of the accomplishments of the irreproachable retrospections…
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.