jump to navigation

Cybernetic Haiku September 9, 2008

Posted by caveblogem in 3QD, Constructivism, Haiku, Other, Three Quarks Daily, vocabulary.
8 comments

If I have any readers left, they might remember that I used to periodically examine other blogs, sacking them for words and studying words that seemed relatively unique to them [See the “Studies on the Working Vocabulary of the Blog-O-Sphere” section of this page.]  Towards the end of that phase, I used a simple algorithm to create a Haiku out of the words that a blogger used more often than other bloggers.  Yeah, it’s kinda weird and a little too complicated to explain succinctly, but you could read some of the posts and see the project develop.  And it made sense at the time. . . 

Anyway, it bothered me at the time that I was unable to automate the process of generating a Haiku out of a bunch of random words.  It bothered me that I had to intervene in the process.  I wanted to be able to push a button and have the computer do the rest, but I didn’t yet have the skill-set that I needed.  But I do now.  So here it is.  Have some fun; click the icon below.

This project demonstrates one of my favorite things about human thought–the compulsive and unconscious ways we create meaning.  We see a string of words and our brains just automatically start making sense out of them.  Doesn’t matter that they are random.  Recently I read a blog post (I think it was in Three Quarks Daily, but I can’t seem to find it now) somebody explained a party game based on the principle (and don’t get me started on the exploitation of this quirk in hypnotism).  A person volunteers to leave the room and, upon returning, guess the pertinent details of a dream that one of the others will relate to the rest of the participants while she is out of the room.  However, no dream was told to the others during her absence.  The other participants just randomly answer the questions of the volunteer, trying to keep their answers consistent with the ones that precede them.  Thus the dream is entirely a figment of the volunteer’s imagination, and usually ends up telling the participants a little more than they want to know about the mind of the volunteer.  

Yeah, it sounds more like a dirty trick than a game.  But it is an interesting metaphor for life, too.  And I am desperately trying to tell myself that that is a good thing, these days.  If you are an optimist, you are much more likely to find happiness, because you expect to–you look for it, assuming it is there somewhere.  

Anyway, this looks to be my last extracurricular programming project until at least November, and probably even later than that, since I want to participate in NaNoWriMo again this year.  I started a new job last week and between that and the two classes I’m taking, I won’t have much time to put into this sort of thing for a while.  

When I saw that Moon Topples is blogging again I briefly toyed with the idea of setting this thing up so that it automatically posted a haiku for me each day on this site– a poor-but-efficient imitation of MTs Monday Morning Haiku posts.  But I think I’ll just ask that if any readers of this blog manage to get the machine to produce a particularly interesting poem, they post it in a comment below.

Advertisements

Breaking the Pattern of Thought August 19, 2008

Posted by caveblogem in Books, Constructivism, Edward de Bono, how to, Lateral Thinking, Other, vocabulary, writing.
5 comments

I’ve been re-reading Edward de Bono’s wonderful (if clumsily written) Lateral Thinking recently, while searching for new-but-manageable programming projects that I can do between semesters (so that I can keep learning programming skills). Naturally, de Bono gave me an idea (never fails).

Lateral Thinking‘s first couple of chapters argue, convincingly, that peoples’ thoughts run along established patterns that can make creativity difficult. The remainder of the book presents de Bono’s grab-bag of thinking tools, helpful methods for breaking out of these patterns when necessary (when the vertically-reasoned ideas are not working).

One technique, “Random Stimulation,” helps in a brainstorming process. It works like this:

Randomly select a word from a dictionary and just run with it, trying to connect it to the problem you are working on, for three minutes, following whatever chain of silly connections you follow. Hopefully, out of that massive, ill-considered spray of concepts, something emerges that will help solve the problem.

Here’s de Bono’s example:

The numbers 473-13 were given by a table of random numbers and using the Penguin English Dictionary the word located was: ‘noose’. The problem under consideration was ‘the housing shortage’. Over a timed three minute period the following ideas were generated:

noose – tightening noose – execution – what are the difficulties in executing a housing programme – what is the bottleneck, is it capital, labour or land?

noose tightens – things are going to get worse with the present rate of population increase.

noose – rope – suspension construction system – tentlike houses but made of permanent materials – easily packed and erected – or on a large scale with several houses suspended from one framework – much lighter materials possible if walls did not have to support themselves and the roof.

noose – loop – adjustable loop – what about adjustable round houses which could be expanded as required – just uncoil the walls – no point in having houses too large to begin with because of heating problems, extra attention to walls and ceilings, furniture, etc. – but facility for step-wise expansion as need arises.

noose – snare – capture – capture a share of the labour market – capture – people captured by home ownership due to difficulty selling and complications – lack of mobility – houses as exchangeable units – classified into types – direct exchange of one type for similar type – or put one type into the pool and take out a similar type elsewhere. . . .

From this example may be seen the way the random word is used. Often the random word is used to generate further words which themselves link up with the problem being considered. . . . The word is used in order to get things going–not to prove anything. [174-5]

O.K., so it doesn’t always work. At least I am not convinced that the “housing problem” was adequately addressed through this method. I have used de Bono’s “Random Stimulation” method, however, with excellent results.

So, I developed an online resource that loads a randomly generated word, with its definition. Just click the linked picture below.

So now you don’t have to generate random numbers and hunt for a big dictionary. Indeed, I kept the webpage very small, as well as javaScript-free, so that it can be accessed by web-enabled phones.

Which words do you own?–Tales from the Reading Room July 14, 2007

Posted by caveblogem in blogging, blogs, Blogs and Blogging, Books, COMBS, Haiku, linguistics, literature, Other, statistical analysis, vocabulary.
7 comments

Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Other posts, whether analyzing particular blogs within the study or detailing the methodology of this thing or whatever, can be found at the Center for Occasional Meme and Blog-O-Sphere Studies [COMBS].  Go there by clicking here or the Center’s logo, which should be on the right (starboard) side-bar over there —->

About two months ago I took a sample of words from Litlove’s blog Tales from the Reading Room.  I added them to the vocabulary database, but I was reluctant to just do a normal post on them.  I wanted to do something a little special because Litlove had started this whole project, in a way, with one of her posts.  So I procrastinated, a favorite strategy of mine, until I could think of something more interesting.  I think I hit upon something, so without further ado . . .

Litlove’s word sample runs from March 31 – May 9, 2007.  Sample size was 25,741 words.  She added 905 words.   She used a wide variety of words–4,535 different words within the sample, pretty good, since her sample had 5,000 fewer words than most of the others.

Here is a word cloud comprised of the words used more than twice by Litlove but not at all by any of the other 18 blogs that went before her:

onlycloud.jpg

And here’s those words in a font called Love Letters:

onlycloud-loveletters.jpg

And here’s the Venn diagram I usually make out of these words:

llvenn.jpg

The left lobe consists of words that were new to the sample, that nobody else had used, sized relative to the frequency of use.  The middle part consists of words that everybody has used so far, sized according to how much more frequently Litlove used them in the sample than others did.  And the right lobe consists of words that everyone else sampled before her used, but that she did not. 

Here is another effort by my Haiku-generating algorithm, which crashed six times before yielding a Haiku made from only the most common words and the words Litlove added to the database (all of the crashes all had to do with a shortage of monosyllabic words of various types in Litlove’s pool of words.)

In boy’s forthright sneer
she adheres perilously
to the politeness.

Puzzling, like all good machine-generated poetry. 

And here is the new thing.  It’s an additional wordcloud that is a little more complicated than the others I have generated thus far.  This is the first time I have tried to explain it, so bear with me.  I calculated the average number of times each word in the database is used (per subject).  Then I subtracted the number of times each words was used in Litlove’s sample.  The postive numbers represent words that Litlove used more frequently than average.  Then I scaled these words by frequency of use in her sample.  But then I deleted the 65 most frequently used words in the database (see here for a partial list of these).  This yields a list of at least 100 words showing something new about the speech patterns/word choices of the blogger, Litlove, in this case.  I’m not at all sure what it shows, though.  So here’s Litlove’s cloud:

mtacloud.jpg

And for purposes of comparison, here’s one from last week’s subject, silverneurotic:

sn-mtacloud.jpg

I find these a little more interesting than the other visuals, at this point.  And since their appearance is not so firmly tied to the size of the samples, I can generate them with a much smaller sample from someone’s blog.  So I may just keep doing this, if I keep getting volunteers.

As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish.  Thanks for participating, Litlove, and sorry about the long wait.

Which words do you own?–Searching for Normalcy July 5, 2007

Posted by caveblogem in blogging, blogs, Blogs and Blogging, COMBS, Haiku, linguistics, Other, statistical analysis, vocabulary.
3 comments

Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Other posts, whether analyzing particular blogs within the study or detailing the methodology of this thing or whatever, can be found at the Center for Occasional Meme and Blog-O-Sphere Studies [COMBS].  Go there by clicking here or the Center’s logo, which should be on the right (starboard) side-bar over there —->

Anyway, the blog under the microscope today is Searching for Normalcy, published by Balou.  Her word sample runs from November 22, 2006 – June 27, 2007.  Sample size was 32,214 words.  She added 502 words, which is  more than what I would expect to see at this point in the experiment.   She used a wide variety of words–4,552 different words within the sample. 

Here is a word cloud comprised of the words used more than twice by Balou but not at all by any of the other 26 blogs sampled thus far:

balouonlycloudpic.jpg

Never ceases to amaze when words that seem so ubiquitous, words like maternity and crafts, pop up for the first time.  I mean, I’ve processed more than half a million words.  How did these not appear until now?  Words like “corals,” “ornament,” “starfish,” these I can understand, but “breakup?”  Go figure.  Please.

And here’s those words in a font called Lou:

onlycloudloupic.jpg

And here’s the Venn diagram I usually make out of these words:

balouvennpic.jpg

The left lobe consists of words that were new to the sample, that nobody else had used, sized relative to the frequency of use.  The middle part consists of words that everybody has used so far, sized according to how much more frequently Balou used them in the sample than others did.  And the right lobe consists of only two words that everyone else sampled thus far has used, but that she did not. Of these there are none, again.  The list of words that everyone uses is, I think, getting down to the bare essentials, sine quibus non of writing.

Here is another effort by my Haiku-generating algorithm, which crashed four times.  All of the crashes all had to do with a lack of monosyllabic adjectives in Balou’s pool of words.  So the algorithm is not to blame this time.  (I have a pretty good store of words now for this algorithm, by the way.  When I run it with all of the words (the ones I have coded as to number of syllables and part of speech, it rarely trips.)

Crabs, snails, big-eyed pairs,
dogma cleans the halo of
the tolerant brat.

The second and third lines are pretty straightforward, although it is difficult to imagine dogma doing something like that. The first line can be interpreted as apostrophe, I think (with an anthropomorphic bent).  “Big-eyed pairs” is evocative of a scene from an anime treatment of the biblical story of Noah, or perhaps even “Evan Almighty” (don’t know, haven’t seen it, but I’m judging by the commercials).  I’d be interested in any other theories, of course. 

As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish.  Thanks for participating, Balou!

Next up (early next week, prob’ly): litlove, ’cause I promised.

Which words do you own?–silverneurotic June 28, 2007

Posted by caveblogem in Haiku, linguistics, Other, statistical analysis, vocabulary.
4 comments

Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Other posts, whether analyzing particular blogs within the study or detailing the methodology of this thing or whatever, can be found at the Center for Occasional Meme and Blog-O-Sphere Studies [COMBS].  Go there by clicking here or the Center’s logo, which should be on the right (starboard) side-bar over there —->

Anyway, the blog under the microscope today is The Post College Years-Part Two, published by silverneurotic.  Her word sample runs from April 4 to June 20, 2007.  Sample size was 32,911 words.  She added 375 words, which is about what I would expect to see at this point in the experiment.  That doesn’t seem like a lot, but there are so many words are already in the database that it’s hard to find new ones.  Silverneurotic used a wide variety of words–4,231 different words within the sample. 

Here is a word cloud comprised of the words used more than twice by silverneurotic but not at all by any of the other 25 blogs sampled thus far:

snonly.jpg

Columbine is a flower, so that’s why it came through the spell-check unscathed.  Probably she was speaking of the High School.  And here’s those words in a font called Silverdollar:

snonlysd.jpg

And here’s the Venn diagram I usually make out of these words:

snvenn.jpg

The left lobe consists of words that were new to the sample, that nobody else had used, sized relative to the frequency of use.  The middle lobe consists of words that everybody has used so far, sized according to how much more frequently silverneurotic used them in the sample.  And the right lobe consists of only two words that everyone else sampled thus far has used, but that she did not. Of these there are none.  The list of words that everyone uses is, I think, getting down to the bare essentials, the words without which there can be no writing.

Here is another “effort” by my Haiku-generating algorithm, which crashed five times.  Don’t even care, at this point.  Stupid computer thinks it is being “Zen” when it is just being difficult.

Sisterly catcher,
wholesome columbine annoys
her bare symphony.

Despite my complaining about the process, I like the result.  The tragedy of an allergy-prone woman involved in a naked springtime softball pick-up game, or am I reading too much into it? 

As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish.  Thanks for participating, silverneurotic!

Next up: Searching for Normalcy, then litlove, ’cause I promised.

Which words do you own?–Klotz, as in Blood June 20, 2007

Posted by caveblogem in blogging, blogs, Blogs and Blogging, Haiku, linguistics, Other, statistical analysis, vocabulary.
3 comments

Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Other posts, whether analyzing particular blogs within the study or detailing the methodology of this thing or whatever, can now be found at the Center for Occasional Meme and Blog-O-Sphere Studies [COMBS].  Go there by clicking here or the Center’s logo, which should be on the right (starboard) side-bar over there —->

Anyway, the blog under the microscope today is “Klotz,” as in “Blood,” published by Steve.  His word sample runs from March 6 to June 11, 2007.  Sample size was 32,911 words. Steve totally wrecked the curve, adding 1,188 words, which shouldn’t really happen at this point in the study.  And for those of you keeping score, Steve used 6,450 different words in his sample. 

The theory was that the new words were supposed to level out by now.  At any rate, the number of distinct words in the database passed the 25,000 mark.   So the working vocabulary of the Blog-o-sphere is, obviously, more than that.

Here is a word cloud comprised of the words used more than twice by Steve but not at all by any of the other 24 blogs sampled thus far.

klotzonlycloud.jpg

And here’s those words in a font called Blood:

klotzonlycloud-b.jpg

I ran out of Floridian fonts. 

And here’s the Venn diagram I usually make out of these words.  The left lobe consists of words that were new in the sample, that nobody else had used, sized relative to the frequency of use.  The middle lobe consists of words that everybody has used so far, sized according to how much more frequently Steve used them in the sample.  And the right lobe consists of only two words that everyone else sampled thus far has used, but that Steve did not. 

klotzvenn.jpg

Here is another effort by my Haiku-generating algorithm, which went off this time without a hitch, although I used words that Steve added to the database but which he didn’t use frequently enough to put them in the vocabulary clouds.  (I just didn’t want the hassle this time.)

No pilgrim ruptured
between traditions of brie
and clutching the dolt.

Clutching the dolt sounds like the name of some bizarre game out of Colonial American folklore, duznit?  Maybe I should have enclosed it in quotation marks or italicized it.  As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish.  Thanks for participating, Steve!

Next up: Silverneurotic, then Searching for Normalcy, then this project will be put on hold for a little while while I pursue a new one that seems much more interesting at the moment (and which will debut in this space later this week, hopefully.)

Which words do you own?–A Mom, a Blog, and the Life In-Between June 15, 2007

Posted by caveblogem in Blogs and Blogging, Haiku, linguistics, Other, vocabulary.
6 comments

[Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly basis. There is an interesting (to some) analysis of the most common words here.  And there is some discussion of method here and here.]

Anyway, the blog under the microscope today is A Mom, a Blog, and the Life In-Between, published by Tere.  Her word sample runs from April 2 to June 6, 2007.  Sample size was 28,685 words.  Tere added 445 words.  There were 3,904 different words in her sample, a little above the norm.

Here is a word cloud comprised of the words used more than twice by Tere but not at all by any of the other 24 blogs sampled thus far.

tere-tereonly.jpg

And here’s those words in a font called Flores:

tere-tereonlyf.jpg

Two reasons for that font.  First, Tere strongly recommends that husbands bring their wives flowers.  Second, it is the only other font I have that relates to Florida in some fashion.

And here’s the Venn diagram I usually make out of these words.  The left lobe consists of words that were new in the sample, that nobody else had used, sized relative to the frequency of use.  The middle lobe consists of words that everybody has used so far, sized according to how much more frequently Tere used them in the sample.  And the right lobe consists of only two words that everyone else sampled thus far has used, but that Tere did not. 

tere-terevenn.jpg

Here is another effort by my Haiku-generating algorithm, which crashed three times before coughing up this ecliptic gem.

Cheesecake on the odes,
dears of a year-and-a-half
re-edit mangoes.

Well, you get what you pay for here at PGoP, I always say. If you get cheesecake on the odes, you’ll probably find yourself re-editing mangoes sooner or later, if only to hide the embarassment on your face.  Whoops. . . Honey, I’d better see to the mango editing again.

As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish.  Thanks for participating, Tere!

Next up: another Floridian, “Klotz,” as in “Blood,” then Silverneurotic, somewhere in the mid-Atlantic U.S. before returning to the white-hot Floridian blogging scene with Searching for Normalcy, (which is very similar to Calvin Coolidge’s campaign slogan, if I’m not mistaken.)

Which words do you own?–Miami Rhapsody June 11, 2007

Posted by caveblogem in Blogs and Blogging, Books, fiction, Haiku, libertarians, linguistics, luck or time, narrative, vocabulary.
3 comments

[Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly basis. There is an interesting (to some) analysis of the most common words here.  And there is some discussion of method here and here.]

There is a potentially offensive word below.  You have been warned. 

I just finished reading Carl Hiaasen’s Lucky You, a novel about two people who play the same lottery numbers and win the same Florida Lottery jackpot worth $28m.  One is a black woman in her late twenties who is working as a veterinary assistant in a small town.  The other is a racist living in the Miami area who wants to use the money to finance a militia group, which he will use to fight the UN/Nato/Jewish/race-mixing invasion force he believes is preparing in the Bahamas.  So the guy steals the woman’s lottery ticket.  The story’s about how she gets it back. 

I’ve been thinking about the story a little for a number of reasons, but the most pertinent of them is that the racist character in Lucky You can’t utter the most prominent word in the vocabulary cloud below.  When he was in his early teens he spoke this word at home, once.  Then his father, who never used corporal punishment, but for this one exception, beat him with a razor strop.  After dad was done with him, his mother took him inside the house and washed his mouth out with a well know abrasive tub and tile cleaner containing bleach.  Consequently, he has this gagging reflex whenever he even thinks this word.  The only other member of his militia, his accomplice, teases him about this. 

Carl Hiaasen uses this word in the book a number if times, which seemed daring to me, in a weird way.  Hiaasen makes this word come from the mouths of racist bad guys, and some of the story attempts to explore racism and bigotry (but not so much that it disrupts the comedy).  Nevertheless, it seemed daring to me because I don’t think I have ever spoken this word, though I often curse like a sailor.  My parents never beat me for anything, much less using this word.  But I grew up in a family of Libertarians who pretty much ignored skin color.  And I was sheltered enough in white suburban California that racial issues were never prominent in my experience.  Racism in the news always seemed somewhat unreal (well, a lot of the news did).  It was only later, studying history in college, that I began to see racism as a real and contemporary problem.  Well, that’s how sheltered I was.

Say the word as an insult and it brands you a stupid bigot.  Say it ironically, or even analytically (as a commentary on language, for example) and it is too easy to be misunderstood, or come off as a priveledged white intellectual (which is what I am, basically, but I try not to flaunt it).  It was an easy word to avoid, until this post.

Anyway, the blog under the microscope today is Miami Rhapsody, a truly fascinating read published by Yvette.  I recommend subscribing.  She won’t fill your inbox as often as many others, and seems to write only when she has something interesting to say.  Her word sample runs from July 28, 2006 to June 1, 2007–every word she posted up to that point.  There were only 20,000 words in the sample, so the numbers will look a little low in comparison to other blogs examined recently (where the samples tend towards 30,000)  Yvette added 510 words.  There were 3,608 different words in her sample, a little above the norm, I think.

Here is a word cloud comprised of the words used more than twice by Yvette but not at all by any of the other 23 blogs sampled thus far.

And here’s those words in a font called Floribetic:

And here’s the Venn diagram I usually make out of these words.  The left lobe consists of words that were new in the sample, that nobody else had used, sized relative to the frequency of use.  The middle lobe consists of words that everybody has used so far, sized according to how much more frequently Yvette used them in the sample.  And the right lobe consists of only two words that everyone else sampled thus far has used, but that Yvette did not.  She doesn’t seem to care about money or looks.  Refreshing, isn’t it?

Here is another effort by my Haiku-generating algorithm.

You professor’s racists!
The louder nuns not potted
are the nuns you mow.

“Professor’s racists.”  I kinda like that, although I’m not sure what it would mean.  A group of brown-shirted nerdy bigots?  Something in the phrasing seems like a badly-translated Maoist slogan of some sort.  And “mowing the louder nuns” also puts me in mind of those jokes we told as a kid: What’s black and white and red all over? 

As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish.  Thanks for participating, Yvette!

Next up: two more Floridian blogs, A Mom, A Blog, and the Life In-Between, then “Klotz,” as in “Blood,” then Silverneurotic, who is not from Florida, if I remember correctly.

Clarifications about the Blogging Vocabulary Study June 8, 2007

Posted by caveblogem in Blogs and Blogging, linguistics, Other, statistical analysis, vocabulary.
1 comment so far

Dayngrous Discourse asked this question in a comment yesterday.

Of these words 3,907 were unique, meaning that she used 3,907 different words in the sample, well above the average of about 3,500. Does that mean I added that 3,907 new words to the database? 

I learned during the brief time I was teaching college that when you get a question from somebody it usually means that there are dozens of people who had the same question but didn’t ask it, so I thought I’d put the answer up here where others could see it, rather than just tack it into the comments section.

So, in response, when I said that 3,907 words in the sample were unique, I meant that there were 3,907 different words in the sample of 28,000 words from your blog.  I grabbed 28,000 words, which went in this order (I began the sample at May 13th, Mother’s Day):

Happy
Mother’s
Day

That’s three different words, so far, from the title of the post.  Then we have the beginning of the post itself:

Happy
Mother’s
Day

Which is 6 words toward the sample’s total of 28,000, but only three are unique to that sample.  And all of these words were already in the database, because somebody else had used them.  So they weren’t unique to the database, just to the sample.  Probably the first word that you added to the database was “dished,” which came at the last line of that Mother’s Day post.

I forgot to say in the post how many words you added to the database, so I’ll tell you now: 435, which is pretty darned good for someone who posts frequently, often more than once a day, as you do.  And it is a lot for short posts with lots of tags, which tend to push the word variety down.

I hope that helps, but I have my doubts.  Just as it is hard to talk about math using math, it is hard to talk about words with words sometimes, and here I am using both.  If Kurt Gödel didn’t say something like that, he probably should have.  This type of muddle is one of the reasons that literary critics invent jargon, not that we should forgive them, mind you.

The word database has more than half a million total words in it now.  And about 25,000 different words.  So it is getting more and more difficult to add new ones.  But the next two participants, both hailing from southern Florida, found some anyway.

Next up: Miami Rhapsody and  A Mom, A Blog, and the Life In-Between

Which words do you own?–2Dolphins June 6, 2007

Posted by caveblogem in 3QD, Blogs and Blogging, Haiku, linguistics, Other, Three Quarks Daily, vocabulary.
1 comment so far

[Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere.  Posts on this subject started here and will continue on a somewhat weekly basis. There is an interesting (to some) analysis of the most common words here.  And there is some discussion of method here.]

Today’s volunteer is the blog 2Dolphins, which is run by a married couple in Texas.  I had to go back quite a ways, chronologically, to get a large enough sample (there are a lot of pictures and such on the blog.)  So this sample runs from September of 2005 to May 31, 2007.  This may account, partially, for the fact that it added an inordinate number of words to the database.  I guess technically it was an ordinate number, since all numbers are, by definition, ordinate (I think), but you know what I mean.  They added 1,024 words, that’s right, two to the tenth power, which is a lot.

There were 5,593 different words in the sample, which is also a new record. 

Here is a word cloud comprised of the words used more than twice by 2Dolphins but not at all by any of the other 22 blogs sampled thus far.

onlydolpic.jpg

I was happy to see the word dolphin’s in this, but not as happy as I would have been to see dolphins or dolphin.  The very first blog I sampled (other than my own), Three Quarks Daily, used the word “dolphin.”  And Alabaster Crippens used the word “dolphins” in the sample I took from his blog.  Anyway, here’s another copy of the same cloud, in a font called “dolphin.”  Best I can do.

2doldolpic.jpg

And here’s the Venn diagram I usually make out of these words.  The left lobe consists of words that were new in the sample, that nobody else had used, sized relative to the frequency of use.  The middle lobe consists of words that everybody has used so far, sized according to how much more frequently 2Dolphins used them in the sample.  And the right lobe consists of only one word which everyone else sampled thus far has used, but that 2Dolphins did not.

2dvenn.jpg

And finally, here is another effort by my Haiku-generating algorithm, which stumbled a record five times.  There weren’t enough verbs to choose from, so it kept crashing.  “Snoopy” is supposed to function as an adjective in the poem, not as a beloved cartoon dog.  It is only capitalized because it is at the start.  The dudes are snoopy.

Snoopy on the pod,
dudes from an aggregator
rename a protein.

As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish.  Thanks for participating, 2Dolphins!