A Vocabulary Cloud March 14, 2007
Posted by caveblogem in 3QD, Blogs and Blogging, Other, Three Quarks Daily, daily kos, dailykos, linguistics, literature, politics, tagging, vocabulary, web 2.0.trackback
I collected a bigger sample of words from my own blog this morning (I did not include the recent posts on vocabulary stuff, because I might have influenced the number of unique words, as MoonTopples and strugglingwriter both pointed out. Instead, the sample is from posts January 4 – March 9, 2007, totalling 20,000 words. See the first post in this series if you don’t know what I am talking about.)
Then I put them in a database with the samples from Three Quarks Daily and Daily Kos and just pulled out words that were unique to my site (words which did not appear in 3QD or Daily Kos samples at all). Then I sorted these by the number of times they appeared on my site, assigning a font twice as large as the number of occurances on my blog (so that words appearing three times are in 6 point Verdana–I didn’t include words unique to my blog that appeared fewer than three times, because there were more than a thousand). I then sorted them again alphabetically and the result is the vocabulary cloud below (click for a larger image).
![]()
It is, in some ways, the opposite of the tag clouds you see in technorati, because so many of these are made up of proper names, which have been excluded from the samples I took.
It’s like a blogger’s fingerprint.



Very cool idea. I was kidding the other day, by the way :) .
These posts are very interesting.
strugglingwriter,
I knew you were kidding. But you were also right. That’s just the sort of thing I’d do!
[...] particularly got me thinking is the attempt to locate words unique to a specific blogger compared to others. Trying to get a kind of pattern of what words are distinctive to one person. [...]