Which words do you own?–Tales from the Reading Room July 14, 2007Posted by caveblogem in blogging, blogs, Blogs and Blogging, Books, COMBS, Haiku, linguistics, literature, Other, statistical analysis, vocabulary.
Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere. Other posts, whether analyzing particular blogs within the study or detailing the methodology of this thing or whatever, can be found at the Center for Occasional Meme and Blog-O-Sphere Studies [COMBS]. Go there by clicking here or the Center’s logo, which should be on the right (starboard) side-bar over there —->
About two months ago I took a sample of words from Litlove’s blog Tales from the Reading Room. I added them to the vocabulary database, but I was reluctant to just do a normal post on them. I wanted to do something a little special because Litlove had started this whole project, in a way, with one of her posts. So I procrastinated, a favorite strategy of mine, until I could think of something more interesting. I think I hit upon something, so without further ado . . .
Litlove’s word sample runs from March 31 – May 9, 2007. Sample size was 25,741 words. She added 905 words. She used a wide variety of words–4,535 different words within the sample, pretty good, since her sample had 5,000 fewer words than most of the others.
Here is a word cloud comprised of the words used more than twice by Litlove but not at all by any of the other 18 blogs that went before her:
And here’s those words in a font called Love Letters:
And here’s the Venn diagram I usually make out of these words:
The left lobe consists of words that were new to the sample, that nobody else had used, sized relative to the frequency of use. The middle part consists of words that everybody has used so far, sized according to how much more frequently Litlove used them in the sample than others did. And the right lobe consists of words that everyone else sampled before her used, but that she did not.
Here is another effort by my Haiku-generating algorithm, which crashed six times before yielding a Haiku made from only the most common words and the words Litlove added to the database (all of the crashes all had to do with a shortage of monosyllabic words of various types in Litlove’s pool of words.)
In boy’s forthright sneer
she adheres perilously
to the politeness.
Puzzling, like all good machine-generated poetry.
And here is the new thing. It’s an additional wordcloud that is a little more complicated than the others I have generated thus far. This is the first time I have tried to explain it, so bear with me. I calculated the average number of times each word in the database is used (per subject). Then I subtracted the number of times each words was used in Litlove’s sample. The postive numbers represent words that Litlove used more frequently than average. Then I scaled these words by frequency of use in her sample. But then I deleted the 65 most frequently used words in the database (see here for a partial list of these). This yields a list of at least 100 words showing something new about the speech patterns/word choices of the blogger, Litlove, in this case. I’m not at all sure what it shows, though. So here’s Litlove’s cloud:
And for purposes of comparison, here’s one from last week’s subject, silverneurotic:
I find these a little more interesting than the other visuals, at this point. And since their appearance is not so firmly tied to the size of the samples, I can generate them with a much smaller sample from someone’s blog. So I may just keep doing this, if I keep getting volunteers.
As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish. Thanks for participating, Litlove, and sorry about the long wait.