jump to navigation

How many words do you actually use?–Three Quarks Daily March 12, 2007

Posted by caveblogem in 3QD, Blogs and Blogging, linguistics, Other, statistical analysis, Three Quarks Daily, vocabulary, web 2.0, writing.
trackback

I said in my last post that I was going to look at the vocabularies of some other blogs to make me feel better.  And I did, but I left the analysis I did yesterday of the foremost political blog in the US at home, so I can’t post it today (and the answer is “yes,” it did make me feel better). 

I can post another blog’s results, however.  Three Quarks Daily is the first blog I ever read.  That’s not quite right.  3QD is the first thing I ever read after somebody showed it to me and said “that is a blog.”  Anyway, 3QD is still one of my favorites.  It is like getting the best of Science, The New Yorker, The Economist and several other top-notch magazines delivered daily with interesting commentary. 

It is written by at least 4 or 5 people per week and by as many as maybe 12.  So, if these people have different vocabularies, the total number of words they might use in a sample could be expected to exceed mine by a good margin.  And it did. 

In a 13,000 word sample covering March 4 – March 12, 3QD used 3,654 different words (my total, for a comparison, was 2,734 words, so they used about a third more than me).  I actually drew a much larger sample from their site than I did mine, partially because I wanted to be comprehensive, fair, and careful.  In 18,000 words 3QD used 4576 words. 

And a little regression produced the following quadratic equation [The “Pearson’s R” was actually a little better when using a cubic model, but that would imply an infinite vocabulary, and as much as I respect the team over there at 3QD I feel like I have to draw the line somewhere]:

3qdequation2.jpg

Which, as you can see in the chart below (click to enlarge–actual data is in red, estimates are in blue) tops out around a total vocabulary in use of 5,698. 

3qdchart.jpg

It occurred to me this morning that it might be interesting to see how many words are shared by different blogs, as a sort of Venn diagram, perhaps.  Do bloggers speak different languages?  How much vocabulary is shared by different blogs?  Which words do they share?  Which are different?  I must know!  So I’ll be taking a look at that question after I do a couple more of these. 

Advertisements

Comments»

1. strugglingwriter - March 12, 2007

I think you’re padding your word count with these posts :). “Quadratic “, “Quarks”, and “infinite” indeed :)

2. caveblogem - March 12, 2007

strugglingwriter,

I need all the help I can get.

3. Ideas Man 2 Word Counts, and Vocab « - March 15, 2007

[…] How many words do you actually use? – 3QD Posted in speech, words, links, language, writing. […]

4. kuipercliff - March 17, 2007

Could you also generate an Obscurity Quotient or Ubiquity Index, based on the relative frequency of certain words within the broader canon of the English Language? ‘Infinite’ being more frequent than ‘quadratic’ for example.

British National Corpus
WordCount

5. caveblogem - March 17, 2007

kuipercliff,

Such indices should be no trouble at all to generate. Currently the cloud thing I’m doing is pretty engaging, and I anticipate doing several more. But when I have a large enough pool for comparisons like “obscurity quotients” and “ubiquity indices,” where I could rank a couple of dozen sites, perhaps, I’ll be sure to credit you for the idea!

6. kuipercliff - March 17, 2007

Excellent. I look forward to the results of your explorations!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: