Which words do you own?–Stiletto Girl March 22, 2007

Posted by caveblogem in Blogs and Blogging, linguistics, Other, statistical analysis, writing.

[Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and seem to go on forever.]

Moontopples said last night,

Each blog you do means I’ll add less and less, but I’d love to be included. Maybe I’ll be your first subject to add no words at all.

I have also been concerned about that, Moontopples.  But things don’t seem to be winding down at all.  In fact, today’s subject, Stiletto Girl  added a remarkable 1,292 unique words from her sample of 21,000.  Her frequently used additions to the vocabulary database are represented in the vocabulary cloud below (Click to enlarge the picture.  Words are in font sizes double the number of times the words appeared). 


 I have to admit that I had to look at least one of them up.  I think she said that people should feel free to analyze her in the comment thread, but I may just have been hearing things. 

Here is the other diagram that I wanted to unveil today (below, click to enlarge).  It shows that other blogs used only 43 words that SG did not use (the right-hand lobe of the diagram below).  I think that’s just as amazing as the addition of nearly 1,300 words at this point in the game.  She’s the Motmistress of the Hour, clearly.


The middle of the diagram, the intersection of the sets, shows words that other blogs used as well, but only the ones that SG used a lot more often than others (due to space considerations–there were a lot of these).  It reminds me of a diagram a psychology professor once drew on the blackboard for my class.  On the left, the Id.  On the Right, the Superego.  In the middle, the Ego.  Or was that philosophy class?  In that case, on the left is evil.  On the right, good.  In the middle, the eternal conflict waged between the forces of darkness and light. 

I have noticed that many people tend to use noms de blog which are composed of regular words, rather than proper names.  Watching Speed Racer last night with my son I noticed that many of the villains and other characters are similarly named.  Speed Racer, Snake Oil, Cruncher Block, Inspector Detector, Rock Force, Racer X (yes, there are glaring exceptions, most notably Spritle and Trixie).  Is this all a part of what Douglas Coupland (I think it was him) called the “Hello Kittyfication of America”?  Or do I just need more sleep?

Next up: kuipercliff, followed by Mr. Topples.


1. Stiletto Girl - March 22, 2007

Wow, that is amazing. I am totally blown away! Thank you so much, Caveblogem. You’ve boosted my ego today. This inspires me to keep on blog truckin’!

BTW, love the addition of the new diagram. So, pray tell, what was that word you had to look up?

Mot mistress with the mostest

2. Stiletto Girl - March 22, 2007

Apologies for double posting but I’m curious – and perhaps you elaborated on this and I missed it but – I am assuming you don’t comb over the whole blog? If not, how do you figure out which pieces to select?

3. caveblogem - March 22, 2007


I had to look up “palfrey,” which popped out at me when I created the cloud. MS Word knew it was a word, but I didn’t recognize it and at first assumed that you had mis-spelled “paltry.”

Regarding your methodological question, I don’t comb the whole blog. I just start from a recent post (usually the post before the blogger agrees to participate in the study) and work backward until I have a sample large enough. I have to cut out a thousand or more proper names, usually, but the number can vary considerably. So, currently I shoot for around 24,000 words. Your sample covered the dates February 3 – Mar 20, 2007. I should have put that in the post.

No apology necessary for doubling the comment. Comment as often as you like. And I’m delighted that you will continue to blog, SG.

4. Stiletto Girl - March 22, 2007

That’s what I suspected, Cave. Which in a way is too bad, as I have other posts that are less advertised and go way back that would be more fun to bite into because of elaborate word content (plus flinging words is always a delight). Hmm.

I look forward to your other observations on upcoming blogs!

