jump to navigation

How many words do you actually use? March 11, 2007

Posted by caveblogem in Blogs and Blogging, linguistics, Other, statistical analysis, vocabulary, writing.
trackback

A while back I was reading Litlove’s blog, Tales from the Reading Room and came across the following snippet:

I was idly looking around the internet to try to find out the average size of an English speaker’s vocabulary, but it turns out the very complexity of English makes it difficult to gauge. Estimates of a college graduate’s vocabulary range from 20-25,000 to the supposedly more accurate 60,000 active words and 75,000 passive words.

This interested me, because as a part-time researcher I immediately saw how incredibly difficult such an experiment would be.  Every word that came out of somebody’s mouth would have to be checked against a list of words already uttered.  So I suggested that somebody undertake an experiment whereby blog posts could be examined to unearth the number of different words one actually uses while blogging.  I had suggested that a macro could be written in MS Word that would yield the necessary data, but have found that, as in most areas of life, Word is needlessly complex, and not up to the task anyway. 

So with some additional work I managed to figure out how to count the number of different words I use on my blog.  I took a sample of about 15,000 words, stripped out all of the numbers, emoticons, and other junk that aren’t words (although “aren’t” would show up as one word, using my method, as does any word that is a hyphenated compound, like Anti-Christ).  I also stripped out any proper nouns, as well as misspellings, which was the time-consuming part of this little experiment.  I’m surprised how many words I mis-spell.

But here’s what I found:  Out of 13,000 words (what was left from the sample after getting rid of names, mis-spelled words, etc, and then making a round number out of it) I used 2,734 unique words.  That sounds pretty pathetic when you think about the numbers bandied about in the snippet quoted above, doesn’t it?  But it is a pretty small sample, 13,000.  On the other hand, it took a bit of time, even with 13,000 words.  So I decided to turn it into a bunch of progressively smaller samples and use the data to extrapolate a bit. 

The data formed what looked to be a quadratic equation:

vocabeq.jpg

Here’s the actual data plotted, with an extrapolation that tops out at the humbling figure of 3,127 (click picture to enlarge.  The red marks are experimental data, the blue ones plot the quadratic above):

pgop-curve.jpg

Again, not very impressive.  And it is just a statistical approximation, of course.  The real number wouldn’t actually have such a maxima, it would slowly increase toward an assymptote, and then jump right over it while you are arguing over a scrabble game or a crossword puzzle, or playing trivial pursuit. 

There are all sorts of words that I know, but that I have only used once–when taking the Graduate Record Exams.  And there are words that I find myself using very rarely, of course.  I was listening to “Car Talk” yesterday coming back from the tennis courts and one of the two, Click or Clack, I forget which, used the word “kine,” an archaic plural of the word “cow.”  The previous sentence, I think, is the only time I’ve ever used that word.  But if my little experiement shows anything, it shows that it would take me a lot of blogging to reach the 60,000 passive words that Litlove found as an estimate for an English-speaker’s active vocabulary.

Over the next few days I will be checking to see how this figure, this humbling 3,127, compares to other blogs out there, if only to make myself feel better.  I’ll entertain suggestions as to which blogs, of course.

Comments»

1. Moon Topples - March 11, 2007

Weirdly compelling post, sir. I’d be curious as to how many I use, but am fully aware that I use only a small fraction of words I know when writing on my blog. Many topics, and the words I associate with those topics, never come up. Also, I try not to use words that would require someone to look it up unless it is the ideal word for the sentence.

And I’ve never used assypmtote, either.

2. caveblogem - March 11, 2007

Moon Topples,

Weirdly compelling is always my goal, of course. Like most research it probably begs more questions than it answers. People who intentionally write for clarity use fewer words, I think. And I think that specialized, one-topic blogs probably use more than those that are more general, like this one, and yours, I think. And there are a lot of things I’m interested in that would never make it to this space, too. But this is subverted, in my case, by the fact that I idolize David Foster Wallace, so sometimes I use words that I really shouldn’t use, in pathetic, worshipful mimicry.

3. goodthomas - March 11, 2007

For the love of God, please don’t count the words on my site, unless, of course, you want to look really, really superior. Unlike Mr. Moon, I use every word I know. And then some. I know about 187 words and I like to repeat them. A lot.

I admire your quest, your research, a great deal.

4. caveblogem - March 11, 2007

goodthomas, Thanks for the kind compliment. But do not worry, gentle readers. I intend to examine “A-list” blogs, nice fat targets not my friends, unless somebody requests such a thing.

5. Cyndi - March 12, 2007

Do made up words count? If not, I’m out of the game my friend. ;)

6. litlove - March 12, 2007

I am so impressed that you can even think your way around doing this. That has to be the equivalent of another 10,000 words or so, as far as I’m concerned. I think that the desire for clarity and accessibility must influence the figures quite heavily. I try to keep my language relatively simple on the blog because I want people to understand me (and I’d prefer to understand myself as well). But the results you come up with are really intriguing. The things I was reading all suggested that calculating the figure was incredibly hard.

7. How many words do you actually use?–Three Quarks Daily « Pretty Good on Paper - March 12, 2007

[…] 3QD, vocabulary, Other, writing, Blogs and Blogging, linguistics, web 2.0. trackback I said in my last post that I was going to look at the vocabularies of some other blogs to make me feel better.  And I […]

8. How many words do you actually use?–DAILY KOS « Pretty Good on Paper - March 13, 2007

[…] web 2.0. trackback I may turn this examination of blogging vocabularies, which I began in this post, into a weekly series.  Although I am automating more steps as I perfect the technique, it is […]

9. caveblogem - March 13, 2007

Cyndi, made-up words only count if properly hyphenated. So made-up counts, but madeup does not. MS Word seems to ignore hyphenated words. It seems to assume that if you know enough to use a hyphen, you can spell what it connects. Not sure if that’s true. . . .

10. caveblogem - March 13, 2007

litlove,

Thanks for being impressed. Kind of a quirky, nerdy thing for me to be doing. It makes me a little self-conscious when I think about it. Clarity does heavily influence the totals. The number of people contributing to a blog is also a huge factor, as are abbreviations, which cut down considerably on the total number of unique words. Daily Kos had a ton of these, as well as names, that brought down their total.

As for it being hard, I imagine it would be a pain in the neck to get a really accurate count of verbal vocabulary. Some sort of speech-recognition software, plus following somebody around for days, trying to look natural, plus the huge pile of data to sift. But even that would be more time consuming than actually difficult, I think. The first two were fun, but there is an awful lot of pointing and clicking to get rid of the names and mis-spelled words.

11. A Vocabulary Cloud « Pretty Good on Paper - March 14, 2007

[…] Instead, the sample is from posts January 4 – March 9, 2007, totalling 20,000 words.  See the first post in this series if you don’t know what I am talking […]

12. Which words do you own?--Neil Gaiman « Pretty Good on Paper - March 16, 2007

[…] Which words do you own?–Neil Gaiman March 16, 2007 Posted by caveblogem in vocabulary, Cartooning, Neil Gaiman, bookmooch, fiction, Other, Books, writing, Blogs and Blogging, literature, web 2.0. trackback Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here.]  […]

13. EelKat - March 17, 2007

I’d love to know how many words I use, I am always looking for things like this, no idea why, odd hobby I guess. I do have to wonder though, after reading the comments, why do people try to hide who they really are? What you read on my blog is just as if you hear me speak, I make no changes for clarity, cause if you can’t understand my typing than you won’t understand my speaking either (which is something people often say to me… I never went to school, I have a thick Maine accent, and I’m told I use words that no one can understand quite often; they say my lingo is seriously outdated by about a couple hundred years. oh well, if that’s what they think, then so be it, it don’t bother me none, I like the way I talk so I ain’t changeing for them neither.) I just find it so puzzleing that you got so many comments from people who try to cover up the real them by changeing their words to fit how other folks read. Just plain wierd if you ask me.

14. Blogs for Writers: Third Round of Blog Additions « EK’s Star Log - March 17, 2007

[…] be of great interest to writers, and so I am adding it to this list as well: Pretty Good on Paper How many words do you actually use? Which words do you own?–Neil […]

15. caveblogem - March 17, 2007

EelKat,

I’ll make you the same offer I made to Alabaster Crippens: If you throw me a link I’ll analyze yours right after I’m done with his.

As to the rest of it, I think that some teachers try really hard to make people feel inadequate if they don’t write well, or use Standard Written English, big jargony words, etc. And I think some of the people on the comment thread acted more self-conscious than they actually are, as a sort of self-deprecating humor thing. If not, then it is indeed puzzling. On the internets, nobody knows you’re a dog. But on the other hand, all they see are the words and pictures you post . . .

16. Which words do you own?--EelKat « Pretty Good on Paper - March 19, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and seem to go on […]

17. Which words do you own?--Stiletto Girl « Pretty Good on Paper - March 22, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and seem to go on […]

18. Which words do you own?--kuipercliff « Pretty Good on Paper - March 23, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and continue even as I type this […]

19. Anxious Mofo's word cloud « Anxious MoFo - March 26, 2007

[…] 25th, 2007 · No Comments Caveblogem has a series of posts (starting here) in which he makes word clouds out of the contents of his own blog and others. His method is to […]

20. Which words do you own?--Moon Topples « Pretty Good on Paper - March 26, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and continue even as I type this […]

21. Which words do you own?--Shameless Words « Pretty Good on Paper - March 30, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a weekly […]

22. Which words do you own?--Grasshopper Ramblings « Pretty Good on Paper - April 6, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a weekly […]

23. How many words do you actually use?--Update « Pretty Good on Paper - April 9, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a more-or-less weekly […]

24. dmiessler.com | grep understanding knowledge - April 13, 2007

[…] fellow blogger over at Pretty Good on Paper has a very interesting project going. He takes blogs and pulls down all their content and analyzes the vocabulary used. He just […]

25. Which words do you own?--Mags « Pretty Good on Paper - April 20, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a weekly […]

26. Which words do you own?--the108 « Pretty Good on Paper - April 26, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly […]

27. Which words do you own?--Asara's Mental Meanderings « Pretty Good on Paper - May 2, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly […]

28. Which words do you own?--Are We There Yet? « Pretty Good on Paper - May 7, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly […]

29. Canterbury Soul - May 11, 2007

Hi there!

Can I link you over at mine? Perhaps, you can take a look at it and help me do some self-awareness?

You’ve got a fantastic blog here. Loads of ideas implemented!

Thanks!

30. caveblogem - May 12, 2007

Canterbury Soul,

Absolutely, and thanks for your kind comments. I’ll analyze your blog next. It will take a couple of days. I’ll let you know when it goes up on the site. Are you physically based in Singapore? (That would make yours the first blog from that part of the world in the study.)

31. Canterbury Soul - May 12, 2007

Yes, I’m a true blue, born and bred Singaporean.

Thanks for accepting my request! Looking forward to your analysis. :)

32. Which words do you own?--Dayngrous Discourse « Pretty Good on Paper - May 24, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly […]

33. Which words do you own?--Second Effort « Pretty Good on Paper - May 31, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly […]

34. Dayngr - May 31, 2007

Putting up a post on this and your analysis of my blog. Also recommending some others you can experiment with!

35. Yvette - May 31, 2007

Very, very, very cool. How do I get in queue for an analysis? I want to be an experiment!

36. Which words do you own?--2Dolphins « Pretty Good on Paper - June 6, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly basis. There is an interesting (to some) analysis of the most […]

37. Which words do you own?–Miami Rhapsody « Pretty Good on Paper - June 11, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly basis. There is an interesting (to some) analysis of the most […]

38. Which words do you own?–A Mom, a Blog, and the Life In-Between « Pretty Good on Paper - June 15, 2007

[…] series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly basis. There is an interesting (to some) analysis of the most […]

39. Graham Nash - July 4, 2007

I was attempting to discover how many words we have in every day use when I came across your page very interesting.
But I do think that to get a true figure you would need to record people’s conversations from a cross section of society and from different parts of England.

Graham Nash Thailand.

40. caveblogem - July 5, 2007

Mr. Nash,

Thanks for your kind words. A “true figure” would indeed be difficult, and probably expensive, to obtain. But it would be really neat to know, I agree. Perhaps as speech-recognition technology and covert surveillance techniques evolve somebody might attempt collecting data from actual conversations, stratified by race, gender, class, ethnicity, location, education, etc.

One of the nice things about the internet is that I can make a crude estimate like this and learn from it, and share it with other people, without all of the hurdles that would stand in the way of a formal study.

41. Cubby - January 2, 2009

I was specifically looking for these statistics and found your site. I also read:

male = 6073 words per day
female = 8805 words per day

Do you think blogs can be considered daily conversation? It sounds crazy I know, but I’m thinking that blog talk differs slightly from daily verbal conversation.

Do you think Y would be a constant over extended periods where you wouldn’t notice deviations when playing scrabble or other activities?

What about a analyzing your archives for the past year?

For language acquisition applications say for example for ESL students, it would be interesting to determine how many words students should expect to learn to have a basic understanding of the English language.

What was the time period represented by your 13,000 words? I would think that’s the most important part. Could you predict your word count for one year?

Cheers

42. cantueso - January 4, 2009

I have just finished writing on this same topic, but to be published tomorrow. Last night I read about this in an old “Companion to the English Language” from Oxford or Cambridge. I see you have a curve, but I have not yet read very closely what you said.

I use the same template as you, but I do not have such a beautiful header, nor such an incredible tag-line , but I have not yet seen what your line is on this blog. I have two blogs, and the main one, fitfully called “Fishing”, is trying to see what people, especially kids, will actually read concerning the great things of life. (¿ understand ?) And I have a more recent one called “Shoptalk” which is about language, especially the reasons why hispanics can’t learn English.

I will be back here soon to see more.

43. cantueso - January 4, 2009

(I said I was leavning but I got hooked again). You said (message 2 of this blog)

“People who intentionally write for clarity use fewer words, I think. ”

I am sure they do, and they tend to create systems. They use words to classify things (mea culpa, mea culpa), and that seems fine to me, but what if they try to make the world fit into that classification and get the power to try for a long time?

On the other hand, those Americans and Shakespeareans that delight in making a mess of their vocabulary by adding more and more words and puns and derivatives and toxic strings of bits…….grrrrrrr

44. cantueso - January 4, 2009

Sorry, here I am again. (No, I do not usually write on other blogs at all).

I just saw your book list. Are Tocqueville’s books mentioned as Stuart Mill’s ?!

45. victor rodreguez - August 4, 2009

So, in English, how many words do we use daily.

46. Anonymous - March 26, 2011

I am new to blogging and actually enjoyed your blog. I am going to bookmark your website and keep checking you out. I really have to say thank you for sharing your web site.

47. ohnasch.de » Archiv » Passwörter und Passphrases - August 19, 2011

[…] Blogem, ein Blogger aus Massachusetts, hat in seinem Blog “Pretty Good on Paper” mal seine Blogposts analysiert und herausgefunden, dass er im Schnitt kaum mehr als 3.000 unterschiedliche Wörter in seinen […]

48. Because You’re Dying To Know - February 13, 2013

[…] which I will surely feature in an upcoming Website Wednesday. It seems that Cave, has created a completely fascinating experiment that will analyze a sample section of your blog and create a diagram that tells you how many unique […]

49. My Vocabulary Cloud - November 23, 2014

[…] fellow blogger over at Pretty Good on Paper has a very interesting project going. He takes blogs and pulls down all their content and analyzes the vocabulary used. He just […]


Leave a reply to Which words do you own?--Asara's Mental Meanderings « Pretty Good on Paper Cancel reply