jump to navigation

Clarifications about the Blogging Vocabulary Study June 8, 2007

Posted by caveblogem in Blogs and Blogging, linguistics, Other, statistical analysis, vocabulary.

Dayngrous Discourse asked this question in a comment yesterday.

Of these words 3,907 were unique, meaning that she used 3,907 different words in the sample, well above the average of about 3,500. Does that mean I added that 3,907 new words to the database? 

I learned during the brief time I was teaching college that when you get a question from somebody it usually means that there are dozens of people who had the same question but didn’t ask it, so I thought I’d put the answer up here where others could see it, rather than just tack it into the comments section.

So, in response, when I said that 3,907 words in the sample were unique, I meant that there were 3,907 different words in the sample of 28,000 words from your blog.  I grabbed 28,000 words, which went in this order (I began the sample at May 13th, Mother’s Day):


That’s three different words, so far, from the title of the post.  Then we have the beginning of the post itself:


Which is 6 words toward the sample’s total of 28,000, but only three are unique to that sample.  And all of these words were already in the database, because somebody else had used them.  So they weren’t unique to the database, just to the sample.  Probably the first word that you added to the database was “dished,” which came at the last line of that Mother’s Day post.

I forgot to say in the post how many words you added to the database, so I’ll tell you now: 435, which is pretty darned good for someone who posts frequently, often more than once a day, as you do.  And it is a lot for short posts with lots of tags, which tend to push the word variety down.

I hope that helps, but I have my doubts.  Just as it is hard to talk about math using math, it is hard to talk about words with words sometimes, and here I am using both.  If Kurt Gödel didn’t say something like that, he probably should have.  This type of muddle is one of the reasons that literary critics invent jargon, not that we should forgive them, mind you.

The word database has more than half a million total words in it now.  And about 25,000 different words.  So it is getting more and more difficult to add new ones.  But the next two participants, both hailing from southern Florida, found some anyway.

Next up: Miami Rhapsody and  A Mom, A Blog, and the Life In-Between


1. Dayngr - June 12, 2007

Remarkably, I completely understood that. On another note, this experiment has made me much more aware of the words I choose to use now when I write. It has inspired me to stray from my comfort zone and find synonyms for the words typically used. Thanks so much for that!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: