jump to navigation

Which words do you own?–Doors Left Open May 19, 2007

Posted by caveblogem in Blogs and Blogging, Haiku, linguistics, Other, statistical analysis, vocabulary.
trackback

[Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly basis.]

This week’s volunteer is Canterbury Soul, a Singaporean, whose blog is called Doors Left Open.  His took a little longer than others I have done because it is such a new blog that I feared the sample would be too small.  I like to take at least 25,000, normally, and I wanted to make sure that the sample was big enough to yield some new words.  As it turned out, that wasn’t a significant problem.  I took a sample of slightly more than 20,000 (a census, really) and got rid of all of the proper names, like I normally do, and ended up with about 19,000 (including the pages of the blog that don’t run chronologically, actually).  But Canterbury Soul still added a hefty 682 new words to the database.

Here’s the resulting cloud composed of words that did not yet show up in any of the seventeen blogs sampled (click to enlarge).

wordcloudonlytnr.jpg

And here’s the same cloud in a font called “Open Mind.”  I couldn’t find any fonts related to doors. . . .

wordcloudonlyopdr.jpg

And here’s the Venn diagram I usually make out of these words.  The left lobe consists of words that were new in the sample, that nobody else had used, sized relative to the frequency of use.  The middle lobe consists of words that everybody has used so far, sized according to how much more frequently Canterbury Soul used them in the sample.  And the right lobe consists of words which everyone else sampled thus far has used, but that Canterbury Soul did not, sized by freqeuency of use.

cantvenn.jpg

And finally, here is the Haiku generated by my Haiku-generating algorithm, which is improving rapidly, I think.  The words are those of Canterbury Soul, of course.  The arrangement is now almost purely mechanical.

Woeful on the oak,
germs of a paroxysm
recover midfield.

Next Up: Dayngrous Discourse, then Second Effort

Comments»

1. Canterbury Soul - May 20, 2007

Hey Cave!

Thanks for the detailed analysis! Wow, that must surely be a lot of hard work! At least now I know my work much deeper myself. 682 new words! That sounds a lot! Isn’t it amazing that there could still be words that are unused by anybody?! Truly and sincerely I appreciate what you have done.

Can I put the diagrams on my blog? :)

2. caveblogem - May 20, 2007

Canterbury Soul,

You are welcome. 682 new words is a lot at this point in the game, particularly considering the small sample I had to work with.

And yes, you may put the diagrams on your blog. Do with them what you will! Thanks again for participating. :)

3. L.M.Noonan - May 20, 2007

Shameless is always pointing bloggers to ‘the good stuff’. It will take some time to get through the back blocks/ back blogs of your site, but I’m sure it will be worth it.

4. caveblogem - May 21, 2007

L.M.Noonan,

Thanks! Shameless is a kind soul. I wish I could be sure that it will be worth it. Some of my early posts were pretty lame. By the way, I’m adding yours to my blogroll. Quite a visual feast.

5. Rob O'Daniel - May 22, 2007

Cavebloggem, as something of a self-professed wordsmith (my wife calls me a walking dictionary, but not always in a good way, I suspect), I’d sure love to jump in line to participate in your study.

6. caveblogem - May 23, 2007

Rob, I’ll put you in the queue. There are two ahead of you, so it will take a couple of weeks. I’ll let you know when it is up by linking to your site. I took a look at your blog this morning and it looks like it should make an interesting addition. Am I correct in understanding that both you and your wife contribute posts?

7. Rob O'Daniel - May 23, 2007

Yup, we both post, but the majority of the entries are mine on the main 2Dolphins blog whereas she handles all of the content on our secondary Russian Adoption Journal blog.

Should be apparent who’s the wordy one of us… :)


Leave a reply to caveblogem Cancel reply