jump to navigation

Which words do you own?–Raincoaster March 17, 2007

Posted by caveblogem in 3QD, Blogs and Blogging, daily kos, dailykos, linguistics, Neil Gaiman, Other, Three Quarks Daily, vocabulary, writing.

I first saw raincoaster‘s blog in WordPress’s list of fast growing blogs or popular blogs or something like that.  I check in once and a while and am always entertained.  We share a passion for H. P. Lovecraft and squids and a couple of other things. 

I took a sample of her blog posts yesterday and processed them, and I was a little bummed out at first about the fact that the largest words in the cloud are tag words, and so they present little new information.  Many of the others have to do with current events, so they don’t seem like the timeless blogger fingerprints I had envisioned a day or two ago. 

Nevertheless, if you look at some of the smaller words that pop out, distinguishing her vocabulary from those of Three Quarks Daily, Daily Kos, Pretty Good on Paper, and Neil Gaiman’s Journal, you’ll see some interesting nuggets (below, click to enlarge).


On a side note about Neil Gaiman, Dan somebody (whose relationship to Mr. Gaiman I did not quite get) has done some interesting analysis of Mr. Gaiman’s blog over a long span of time.  The links, if you should wish to pursue them, are in the comment thread to the post on Mr. Gaiman here.  The analysis and method Dan used seems more sophisticated than mine.  For example, he passes Mr. Gaiman’s words through something called a “Yahoo Term Extraction API.”  If I remember my Latin roots correctly it seems to have something to do with bees.  At any rate, his analysis also takes a slightly different tack, examining “terms,” rather than words, and chopping off words that have fewer than five letters.  So what you will see are dynamic clouds of what I suspect are topical concerns of Mr. Gaiman, rather than the individual word usage.  They are fascinating to look at, however, and quite clever.

The day before yesterday I stood for several minutes looking at a book on Java at my local Barnes and Noble.  If it hadn’t had water damage, which made that funny noise and wouldn’t let me flip pages easily, I would have actually bought it.  If anyone is interested in saving me the time and trouble it would take to learn some sort of dynamic html, I would happily partner up, supplying data and analysis for the creation of some sort of acceptable widget. 

Up next, Alabaster Crippens.

Which words do you own?–Neil Gaiman March 16, 2007

Posted by caveblogem in Blogs and Blogging, bookmooch, Books, Cartooning, fiction, literature, Neil Gaiman, Other, vocabulary, web 2.0, writing.

Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here.] 

I began to read the work of Neil Gaiman last year when somebody suggested I read Good Omens, a collaboration between Mr. Gaiman and Terry Pratchett.  Then I read American Gods and Neverwhere and everything else I could get my hands on.  The only thing I haven’t been able to get ahold of is his latest, Fragile Things, which nobody has posted on Bookmooch or Paperbackswap (have to be a little frugal this year, I’m afraid.)  Anyway, Mr. Gaiman is a tremendously talented writer of creepy and interesting tales.  And he writes a darn good blog, too, which I subscribe to and read whenever I can.

I sampled 22,000 words from Mr. Gaiman’s site, spanning the period January 6 – March 14, yesterday morning.  I had to run the spell-check a little differently from the way I normally do, because Mr. Gaiman uses the English spellings of words like color, organize, check (cheque, a draft on one’s checking account), favorite, and orangutan.  So I just changed these to the Americanized versions in his list so that I could merge it in with the others.

I have started to add some words to my spell-checker, and with Mr. Gaiman’s blog I added googled, blog, blogger, blogging, edamame, and perhaps a couple of others that I forgot to write down at the time but which I was absolutely certain were correctly spelled words.

The Blogger’s Vocabulary List is getting larger with each blog I incorporate.  The latest, which includes samples from Three Quarks Daily, Daily Kos, this blog (Pretty Good on Paper) and Neil Gaiman’s Journal, contains 9,383 different words.  In a couple of months I should be able to make a pretty good estimate of the size of the vocabulary in actual use out there (here?) in the blogosphere.  Check this space for updates.

Mr. Gaiman added 1,112 words to the list, an impressive feat at this point for an individual blogger.  Here is a vocabulary cloud composed of the words Mr. Gaiman added to the list, with font sizes at twice the point size as the number of times they appeared in his 20,000-word sample (click for a larger image).


I’ve decided to stop estimating the size of the vocabularies of individual blogs in this study because such estimates are too artificial.  Even bloggers and writers use most of their words in conversation.  And since your vocabulary is altered by each conversational partner, (your conversational partner asks a question about broccoli or oysters and you find yourself using these words yourself, if only to ask for clarification) estimates of this sort don’t seem all that relevant.

What does Mr. Gaiman’s vocabulary cloud say about him as a blogger?  What does it say about the bloggers to which his words were compared?  What will Raincoaster‘s vocabulary cloud say about her or us or anything, when it is added to this growing pool tomorrow?