A Vocabulary Cloud March 14, 2007

Posted by caveblogem in 3QD, Blogs and Blogging, daily kos, dailykos, linguistics, literature, Other, politics, tagging, Three Quarks Daily, vocabulary, web 2.0.

I collected a bigger sample of words from my own blog this morning (I did not include the recent posts on vocabulary stuff, because I might have influenced the number of unique words, as MoonTopples and strugglingwriter both pointed out.  Instead, the sample is from posts January 4 – March 9, 2007, totalling 20,000 words.  See the first post in this series if you don’t know what I am talking about.) 

Then I put them in a database with the samples from Three Quarks Daily and Daily Kos and just pulled out words that were unique to my site (words which did not appear in 3QD or Daily Kos samples at all).   Then I sorted these by the number of times they appeared on my site, assigning a font twice as large as the number of occurances on my blog (so that words appearing three times are in 6 point Verdana–I didn’t include words unique to my blog that appeared fewer than three times, because there were more than a thousand).  I then sorted them again alphabetically and the result is the vocabulary cloud below (click for a larger image). 


It is, in some ways, the opposite of the tag clouds you see in technorati, because so many of these are made up of proper names, which have been excluded from the samples I took. 

It’s like a blogger’s fingerprint.

How many words do you actually use?–DAILY KOS March 13, 2007

Posted by caveblogem in Blogs and Blogging, daily kos, dailykos, linguistics, Other, politics, vocabulary, web 2.0.

I may turn this examination of blogging vocabularies, which I began in this post, into a weekly series.  Although I am automating more steps as I perfect the technique, it is still a little time-consuming to do this every day. 

Today I examine the vocabulary of one of the foremost political blogs in the United States–Daily Kos.  (I should probably note that while this was the first blog that came to mind when I decided to look at the vocabularies of some of the “A-List” blogs, it is not one that I actually read.)

In the first 13,000 words of the sample, Daily Kos used 3,053 words, which compares favorably to my total from the other day of only 2,734.  The chart below shows data from the whole sample, which included 20,000 words taken from posts beginning with Tue Mar 13, 2007 at 11:29:01 AM PDT, and ending four pages later, in 500-word increments. 


(The extrapolation in the above chart uses the equation below.)


The estimated total vocabulary of this blog is therefore 4,152 unique words.