jump to navigation

Which words do you own?–Tales from the Reading Room July 14, 2007

Posted by caveblogem in blogging, blogs, Blogs and Blogging, Books, COMBS, Haiku, linguistics, literature, Other, statistical analysis, vocabulary.

Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Other posts, whether analyzing particular blogs within the study or detailing the methodology of this thing or whatever, can be found at the Center for Occasional Meme and Blog-O-Sphere Studies [COMBS].  Go there by clicking here or the Center’s logo, which should be on the right (starboard) side-bar over there —->

About two months ago I took a sample of words from Litlove’s blog Tales from the Reading Room.  I added them to the vocabulary database, but I was reluctant to just do a normal post on them.  I wanted to do something a little special because Litlove had started this whole project, in a way, with one of her posts.  So I procrastinated, a favorite strategy of mine, until I could think of something more interesting.  I think I hit upon something, so without further ado . . .

Litlove’s word sample runs from March 31 – May 9, 2007.  Sample size was 25,741 words.  She added 905 words.   She used a wide variety of words–4,535 different words within the sample, pretty good, since her sample had 5,000 fewer words than most of the others.

Here is a word cloud comprised of the words used more than twice by Litlove but not at all by any of the other 18 blogs that went before her:


And here’s those words in a font called Love Letters:


And here’s the Venn diagram I usually make out of these words:


The left lobe consists of words that were new to the sample, that nobody else had used, sized relative to the frequency of use.  The middle part consists of words that everybody has used so far, sized according to how much more frequently Litlove used them in the sample than others did.  And the right lobe consists of words that everyone else sampled before her used, but that she did not. 

Here is another effort by my Haiku-generating algorithm, which crashed six times before yielding a Haiku made from only the most common words and the words Litlove added to the database (all of the crashes all had to do with a shortage of monosyllabic words of various types in Litlove’s pool of words.)

In boy’s forthright sneer
she adheres perilously
to the politeness.

Puzzling, like all good machine-generated poetry. 

And here is the new thing.  It’s an additional wordcloud that is a little more complicated than the others I have generated thus far.  This is the first time I have tried to explain it, so bear with me.  I calculated the average number of times each word in the database is used (per subject).  Then I subtracted the number of times each words was used in Litlove’s sample.  The postive numbers represent words that Litlove used more frequently than average.  Then I scaled these words by frequency of use in her sample.  But then I deleted the 65 most frequently used words in the database (see here for a partial list of these).  This yields a list of at least 100 words showing something new about the speech patterns/word choices of the blogger, Litlove, in this case.  I’m not at all sure what it shows, though.  So here’s Litlove’s cloud:


And for purposes of comparison, here’s one from last week’s subject, silverneurotic:


I find these a little more interesting than the other visuals, at this point.  And since their appearance is not so firmly tied to the size of the samples, I can generate them with a much smaller sample from someone’s blog.  So I may just keep doing this, if I keep getting volunteers.

As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish.  Thanks for participating, Litlove, and sorry about the long wait.


1. litlove - July 15, 2007

oh Caveblogem, this is magnificent! Thank you! I just loved reading this and I’m going to have a think about it now and post my own analysis of it on my site later on. I really did find it SO fascinating.

Thank you once again!

2. Stefanie - July 16, 2007

Hi. I’m her from Litlove’s blog and I just want to say what you’ve done is really cool.

3. Dorothy W. - July 16, 2007

Wow. What you’ve done is really, really interesting!

4. Phil - July 16, 2007

Agreed! Thoughtful and thought provoking both. Kudos on the interesting work!

5. verbivore - July 17, 2007

This project is fascinating – I’m going to go back now and read how you got and how the project has developed.

6. ranking words « Incurable Logophilia - July 17, 2007

[…] post about caveblogem’s enlightening dissection of her prose, I spent some time reading about caveblogem’s project (very interesting indeed!) and eventually found my way to […]

7. caveblogem - July 17, 2007

litlove, you are very welcome. Your analysis of these stats on your blog was more interesting to me than I can say. It made me glad I decided to do the new cloud. Looks like you got some interesting insights from it.

Stefanie, Phil, verbivore, I look forward to checking out your blogs. Thanks for stopping by, and for the kind words.

Dorothy, I have often read your blog in the past (it always shows up on my WordPress tag reader), but haven’t stopped by in a while. I’ll have to remedy that. Thanks!

Leave a Reply to verbivore Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: