Wither Question #98? June 29, 2007Posted by caveblogem in blogging, blogs, Blogs and Blogging, COMBS, memes, Other, statistical analysis, tagging.
This is fourth in a series of posts about my study of responses to the dreaded 150 Things meme. All of which will end up on the COMBS page of this site, eventually.
Zandperl,* of Strange Musings writes:
One thing I noticed in your “study” is that many people left out the #98, about naming a constellation. I believe that I deliberately omitted it from mine since I’m an astronomer and know that in reality you can’t name your own constellation (the International Astronomical Union, of anti-Pluto fame, actually names them), but I’m curious if you know why all the other people omitted it and who started it.
Ah, well, I was curious about that, too. Many people omitted this question before you deleted it from you list. Difficult to say why, at least without more research (Oh, Boy!), but it may have been because there were at least two different questions numbered 98 by then. Most people who answered the question (meaning that they included it in their list) put it in as “created and named your own constellation.” But a few answered a different question #98: “passed out cold.”
Perhaps a short paragraph regarding method is in order. This “study” sampled blogs that responded to this meme by going to Technorati and typing in the first line of the meme as a search term. Then I scrolled through 50 pages (500 blogs) until I got to the 50th, and worked backwards. (I figured that with some blogs being deleted and some being offline for other reasons, I would be able to get a sample of 300 or so with which I could do this “study.”)**
The earliest blog in the sample (um, the earliest for which I have a date) was Purple Valley, written by val, published on October 19, 2006. If one wanted, one could trace the meme back, starting with the people that tagged her (which can be found on her post, here) and probably, perhaps, find the origin of the meme.
What a discovery that would be! Like Burton and Speke searching for the origin of the Nile. It would take you to the wilds of the Internet Archive, I suppose. If nobody wants to do that, I would understand. But I am otherwise engaged at the moment. If sombody does want this job, I’d be happy to put them on the list of advisors at COMBS (which would mean putting up a page for that sort of thing, of course). Such a research affiliate could choose their own title and role there, we’re not stuffy about that sort of thing.
Finally, my sincerest apologies for not responding to comments in the last two weeks. There have been many, and I have responded to many of them on other peoples’ blogs, because my blog, this blog, perhaps for very good reasons, treats my own comments as spam and filters them out. Yes. It does. And then yesterday when I discovered what Akismet was doing I attempted to “unspam” these comments. It ignored my efforts as efficiently as only a computer algorithm can ignore things. It did.
*Does one capitalize the lower case name of a nom-de-blog when it starts a sentence? I couldn’t find anything in Strunk and White to cover this.
**zandperl put the word “study” in quotation marks, which I’m going to adopt here. As soon as I have the time I’m going to change it throughout the blog, even going so far as to change it within the logo for COMBS. Although I am making a serious attempt to get all of this stuff right, I’m not fooling myself into lending my findings more scientific weight or import than they can bear. Having done some serious polling, public opinion, and marketing research, I know how to do a serious study. Most of the questions in this particular meme have multiple interpretations, which would be inadmissible as a study. Take question #98, for example. I interpreted it to mean something like what the fictional ogre Shrek did in his first movie, pointing at the sky and telling Donkey that there was a constellation called “Gabby” named after a talkative donkey. What I am doing here is not a series of studies; these are “studies.”
Which words do you own?–silverneurotic June 28, 2007Posted by caveblogem in Haiku, linguistics, Other, statistical analysis, vocabulary.
Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere. Other posts, whether analyzing particular blogs within the study or detailing the methodology of this thing or whatever, can be found at the Center for Occasional Meme and Blog-O-Sphere Studies [COMBS]. Go there by clicking here or the Center’s logo, which should be on the right (starboard) side-bar over there —->
Anyway, the blog under the microscope today is The Post College Years-Part Two, published by silverneurotic. Her word sample runs from April 4 to June 20, 2007. Sample size was 32,911 words. She added 375 words, which is about what I would expect to see at this point in the experiment. That doesn’t seem like a lot, but there are so many words are already in the database that it’s hard to find new ones. Silverneurotic used a wide variety of words–4,231 different words within the sample.
Here is a word cloud comprised of the words used more than twice by silverneurotic but not at all by any of the other 25 blogs sampled thus far:
Columbine is a flower, so that’s why it came through the spell-check unscathed. Probably she was speaking of the High School. And here’s those words in a font called Silverdollar:
And here’s the Venn diagram I usually make out of these words:
The left lobe consists of words that were new to the sample, that nobody else had used, sized relative to the frequency of use. The middle lobe consists of words that everybody has used so far, sized according to how much more frequently silverneurotic used them in the sample. And the right lobe consists of only two words that everyone else sampled thus far has used, but that she did not. Of these there are none. The list of words that everyone uses is, I think, getting down to the bare essentials, the words without which there can be no writing.
Here is another “effort” by my Haiku-generating algorithm, which crashed five times. Don’t even care, at this point. Stupid computer thinks it is being “Zen” when it is just being difficult.
wholesome columbine annoys
her bare symphony.
Despite my complaining about the process, I like the result. The tragedy of an allergy-prone woman involved in a naked springtime softball pick-up game, or am I reading too much into it?
As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish. Thanks for participating, silverneurotic!
Next up: Searching for Normalcy, then litlove, ’cause I promised.
Experience: What’s it Good For? June 27, 2007Posted by caveblogem in memes, Other, Philosophy, statistical analysis, tagging, web 2.0.
This is third in a series of posts about my study of responses to the dreaded 150 Things meme. All of which will end up on the COMBS page of this site, eventually.
My wife and I rented “The Pursuit of Happyness” from Netflix about a month ago but finally found time to watch it on Sunday night. I like Will Smith. We lived in San Francisco for a while, at a time when I was interested in the stock market. We are both interested in the plight of the homeless. There were a lot of reasons that we thought we’d like the movie. But we just couldn’t get through the unhappyness part of it. We stopped watching after maybe 25 minutes and sent it back.
But it got me thinking about this meme, which some people looked at as a to-do list for life. It should all add up to something, shouldn’t it, all these experiences?
I combined the positive responses to questions #38 (have you ever actually felt happy about your life, even for just a moment?) and #141 (have you ever thought to yourself that you’re living your dream?) to divide the sample into people who were relatively content (those who responded in the affirmative to both), and relatively discontent (those who did not). Then I crosstabulated these against the rest of the questions. So, what correlations did I find at the 95% level of confidence? None. Didn’t seem to make much difference.
None of the accomplishments on that list was strongly related to how content you said you were.
The Impulsively Generous June 26, 2007Posted by caveblogem in blogging, blogs, Blogs and Blogging, COMBS, memes, Other, philanthropy, statistical analysis, tagging.
1 comment so far
One of the fun parts of statistical research is connecting data that seemlingly have nothing to do with one another. For example, does one’s propensity to give too much money to charity have anything to do with the probablility that one has touched a cockroach?
As it turns out, within the statistical sample I took of bloggers who responded to the 150 things meme, the answer is yes. People who had at some point given more money than they had to charity were much more likely to have touched a cockroach.
I intend to do a few more crosstabulations of the implications of my study of the 150 things meme and would be delighted to have this research directed by readers. Just let me know about your pet theories (as they pertain to data in the 150 things meme) and I’ll run the numbers. Obviously I need some sort of direction because there are more than 12,000 possible crosstabulations in this dataset.
Anyway, I started with an analysis of question #24, (have you ever given more than you could afford to charity) because responses were almost evenly split, which gave me two samples of more than 100 to compare. But the question also caught my eye because I work in fundraising. So I’m always looking to shed more light on philanthropy, when I can. What else does an extensive crosstabulation of question #24 tell us? Those who had given more than they could afford to charity were significantly more likely to have
- Bought everyone in the bar a drink,
- Held a tarantula,
- Taken a candlelit bath with someone,
- Hugged a tree,
- Watched a meteor shower,
- Gotten drunk on champagne,
- Had a food fight,
- Asked out a stranger,
- Held a lamb,
- Seen a total eclipse,
- Taken a midnight walk on the beach,
- Milked a cow,
- Pretended to be a superhero,
- Started a business,
- Fallen in love and not had their heart broken,
- Crashed a party,
- Recorded music,
- Picked up and moved to another city just to start over,
- Eaten mushrooms that were gathered in the wild,
- Changed someone’s mind about something they care deeply about,
- Eaten fried green tomatos, and
- Selected one “important” author they missed in school and read (them)
As they say, correlation does not imply causality, except when it does, of course. Just because these people were more likely, as a group, to have eaten fried green tomatos than the non-impulsively generous group doesn’t mean that people who are careful and/or stingy have an aversion to that food. But it sorta makes you think, duznit? And if nothing else, these crosstabulations point in the same direction as every other bit of research that COMBS has produced and will ever produce:
Needs more research.
Blogger, are You Experienced? June 21, 2007Posted by caveblogem in blogging, blogs, Blogs and Blogging, memes, Other, statistical analysis, tagging, web 2.0.
Last week I began a statistical examination of blogger responses to the lengthy meme “150 Things,” which is a long list of experiences that bloggers have been tagging each other with for at least a year or so. (If you came here because of a trackback, I included your blog in the statistical sample). Those tagged copy the list and reproduce it on their blog, with their experiences/accomplishments in bold type. Response percentages (from a sample of 222 blogs) may be found in a summary report here. They are probably more interesting to look at in that form. But if you want, I could put together some bar charts from the data. Let me know.
The percentages are pretty easy to use, I think. Just look at a question, like #7, for example. Sixty-three (63) percent of the bloggers sampled have taken a candlelit bath with someone. If you have not, that makes you a loser :). Or take question #40. Only one percent of the bloggers sampled have been to all 50 states. If you have, that’s an extraordinary accomplishment. If you have not, well, almost nobody else has. Don’t let it get to you. Some of them are a little boring anyway, I suspect.
For those who prefer this stuff in narrative form, I must point out that this was a pretty eclectic group of questions. At any rate, most (more than half of the sample plus a percentage to account for sampling error) of the bloggers who took the time to bold the tasks that they had accomplished are not the sedentary creatures portrayed in the mainstream media. Although most have lounged around in bed all day at least once (59), they obviously don’t make a lifestyle out of it. They just don’t have the time.
They are social. Most have formed friendships with people they admire (42). Sure, they have had their share of skirmishes with their buddies, involving food (27) or frozen water (30), but they are supportive when it counts (41). Most have an impulsive (88), romantic streak (7, 49, 62, 83) and have professed their love to significant others (8). Despite their geeky reputations some have broken hearts along the way (110). But they have also experienced love without getting their hearts broken (68) and ended up getting married (72) and having children (or at least changing them—20).
They may have an undeserved reputation for geekiness. Most have never played D&D for more than six hours (71) or written their own computer language (140), although most have at some point alphabetized their CDs (56). These people have used firearms (116), most of them, and ridden a horse (118), perhaps at the same time. They are not to be trifled with—most of them have eaten raw fish. It takes guts. I remember.
Perhaps they are not extroverts, but they have, possibly via liquid fortification (23), cut loose a little (36, 58, 102, 146).Most have gone to drive-in theaters (65), ridden roller coasters (34, perhaps where they screamed as loud as they could–31), attended huge sporting events (15) and stayed up late enough to watch the sun rise (13).
Most have not traveled extensively, although most took a road trip at some point in time and have toured ancient sites, whatever those are (47, 69).
They are effective communicators, although not necessarily with words (138). Most say they have changed peoples minds (129). If they are not perfectly happy, this group has known happiness (38).
They are a do-it-yourself group. They make their own food, from scratch, if necessary, watching it grow from seeds, perhaps, into sugar cane, corn, wheat, etc., grinding the grain, smooshing the corn to make oil, and turning the finished bounty into cookies (17, 77).
This is the first of a small series of posts on this particular meme. If your blog didn’t get a trackback from this yet, and you would like to be included, just comment, link to this post (from your 150 things post, if possible), or wave your arms or something—I’d be happy to put you in. And if anybody has suggestions for other memes that might be in interesting study, do please let me know. I’m thinking about doing one of the book memes rattling around, since the format is similar.
The blogs sampled for this particular study are listed below, with links (many didn’t have actual titles, particularly the ones from MySpace, so I am listing them with only their ID numbers.)
001, 004, 005, 006, 007, 008, 009, 010, 011, 012, 013, 014, 015, 016, 017, 018, 019, 020, 021, 022, 023, 024, 025, 027, 030, 031, 032, 033, 034, 035, 036, 037, 038, 039, 040, 041, 042, 043, 044, 045, 046, 047, 048, 049, 050, 051, 052, 053, 054, 055, 056, 057, 058, 059, 060, 062, 063, 064, 066, 067, 068, 069, 070, 071, 072, 073, 074, 075, 076, 077, 078, 079, 080, 081, 082, 083, 084, 085, 086, 087, 088, 089, 090, 091, 092, 093, 094, 095, 096, 097, 098, 099, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 173, 174, 175, 176, 177, 178, 179, 180, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, and 222.
Which words do you own?–Klotz, as in Blood June 20, 2007Posted by caveblogem in blogging, blogs, Blogs and Blogging, Haiku, linguistics, Other, statistical analysis, vocabulary.
Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere. Other posts, whether analyzing particular blogs within the study or detailing the methodology of this thing or whatever, can now be found at the Center for Occasional Meme and Blog-O-Sphere Studies [COMBS]. Go there by clicking here or the Center’s logo, which should be on the right (starboard) side-bar over there —->
Anyway, the blog under the microscope today is “Klotz,” as in “Blood,” published by Steve. His word sample runs from March 6 to June 11, 2007. Sample size was 32,911 words. Steve totally wrecked the curve, adding 1,188 words, which shouldn’t really happen at this point in the study. And for those of you keeping score, Steve used 6,450 different words in his sample.
The theory was that the new words were supposed to level out by now. At any rate, the number of distinct words in the database passed the 25,000 mark. So the working vocabulary of the Blog-o-sphere is, obviously, more than that.
Here is a word cloud comprised of the words used more than twice by Steve but not at all by any of the other 24 blogs sampled thus far.
And here’s those words in a font called Blood:
I ran out of Floridian fonts.
And here’s the Venn diagram I usually make out of these words. The left lobe consists of words that were new in the sample, that nobody else had used, sized relative to the frequency of use. The middle lobe consists of words that everybody has used so far, sized according to how much more frequently Steve used them in the sample. And the right lobe consists of only two words that everyone else sampled thus far has used, but that Steve did not.
Here is another effort by my Haiku-generating algorithm, which went off this time without a hitch, although I used words that Steve added to the database but which he didn’t use frequently enough to put them in the vocabulary clouds. (I just didn’t want the hassle this time.)
No pilgrim ruptured
between traditions of brie
and clutching the dolt.
Clutching the dolt sounds like the name of some bizarre game out of Colonial American folklore, duznit? Maybe I should have enclosed it in quotation marks or italicized it. As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish. Thanks for participating, Steve!
Next up: Silverneurotic, then Searching for Normalcy, then this project will be put on hold for a little while while I pursue a new one that seems much more interesting at the moment (and which will debut in this space later this week, hopefully.)
[Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere. Posts on this subject started here and will continue on a somewhat weekly basis. There is an interesting (to some) analysis of the most common words here. And there is some discussion of method here and here.]
Anyway, the blog under the microscope today is A Mom, a Blog, and the Life In-Between, published by Tere. Her word sample runs from April 2 to June 6, 2007. Sample size was 28,685 words. Tere added 445 words. There were 3,904 different words in her sample, a little above the norm.
Here is a word cloud comprised of the words used more than twice by Tere but not at all by any of the other 24 blogs sampled thus far.
And here’s those words in a font called Flores:
Two reasons for that font. First, Tere strongly recommends that husbands bring their wives flowers. Second, it is the only other font I have that relates to Florida in some fashion.
And here’s the Venn diagram I usually make out of these words. The left lobe consists of words that were new in the sample, that nobody else had used, sized relative to the frequency of use. The middle lobe consists of words that everybody has used so far, sized according to how much more frequently Tere used them in the sample. And the right lobe consists of only two words that everyone else sampled thus far has used, but that Tere did not.
Here is another effort by my Haiku-generating algorithm, which crashed three times before coughing up this ecliptic gem.
Cheesecake on the odes,
dears of a year-and-a-half
Well, you get what you pay for here at PGoP, I always say. If you get cheesecake on the odes, you’ll probably find yourself re-editing mangoes sooner or later, if only to hide the embarassment on your face. Whoops. . . Honey, I’d better see to the mango editing again.
As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish. Thanks for participating, Tere!
Next up: another Floridian, “Klotz,” as in “Blood,” then Silverneurotic, somewhere in the mid-Atlantic U.S. before returning to the white-hot Floridian blogging scene with Searching for Normalcy, (which is very similar to Calvin Coolidge’s campaign slogan, if I’m not mistaken.)
Shamrock Book Corner/Mark June 12, 2007Posted by caveblogem in Books, DIY, how to, luck or time, Origami, statistical analysis.
I find a lot of four-leaf clovers, which some people consider to be lucky (um, not the finding, I think, but the possession thereof). The first couple I found I gave away as presents, after laminating them, but there was nothing particularly elegant about lamination. And I have quite a few, now–I stopped counting at thirty. And laminating is boring and expensive.
So I’ve been looking for some other way of presenting them to people. Because what am I going to do with all these things? It occurred to me a couple of weeks ago that I could put them inside paper, onionskin or tracing paper, possibly translucent vellum, so that they could be used as bookmarks. Problem was that unless you put some sort of tassel on them, they will stick out of the book and part of the mark will get mushed. They might get knocked loose, which would also suck. Anyway, yesterday I finally happened upon a solution, which is to make book corners out of them.
The design for this is that of a letterfold, the K-Letterfold, which is diagrammed here, at my favorite letter and envelope folding site. I don’t use the K-Letterfold much for actual letters, because it comes out too small to actually post through U.S. mail when you use paper of standard dimensions. But it is perfect for this particular purpose (see below, click to enlarge).
The book is Nicholas Rescher’s Luck: The Brilliant Randomness of Everyday Life. Much better, printable, concise, instructions and diagram are at this site (look under K-Letterfold on the side-bar), but I am putting step-by step instructions below so that you can see where the shamrock goes in the folding process.
Step 1: I started out with a 6 inch by 8 inch (15.24 cm x 20.32 cm) sheet of tracing paper. The pictures below are for the same size white sheet, which shows the folds and the position of the shamrock. It is best to fold the thing first, then unfold it and place the shamrock (or whatever flat keepsake or flower or whatever) inside and refold it. It is less likely to damage the delicate dried plant if you wrestle with the paper and crease it first.
Step 2: Fold one corner snug against the side.
Step 3: Fold the top side down to meet the edge of the paper.
Step 4: Fold paper in half and then unfold. Then fold it in a quarter towards the crease in the middle. Yeah, I know that’s two steps. Second one is like 4 and 1/2. O.K.?
Step 5: Fold the other quarter to meet the center crease. Now comes the tricky part.
Step 6: Tuck the pointy part at the bottom into the slot in the middle.
Step 7: Then slide it all the way to the top inside, so that the little crevasse (seen in the picture below in a not-quite-closed-but-almost-closed state) closes as completely as it can.
Step 8: Turn over and tuck the remaining untucked corner into the other inside slot . . . carefully.
Of all the letterfolds this is one of the most stable. It simply does not open accidentally, even when sent through the mails without any adhesive devices to keep it closed. And as you will see, it can be used vertically or horizontally, so that the side with the clover is always on the page that you are attempting to mark.
Which words do you own?–Miami Rhapsody June 11, 2007Posted by caveblogem in Blogs and Blogging, Books, fiction, Haiku, libertarians, linguistics, luck or time, narrative, vocabulary.
[Note: This is part of a continuing series on the actual vocabulary in use in the blogosphere. Posts on this subject started here and will continue on a somewhat weekly basis. There is an interesting (to some) analysis of the most common words here. And there is some discussion of method here and here.]
There is a potentially offensive word below. You have been warned.
I just finished reading Carl Hiaasen’s Lucky You, a novel about two people who play the same lottery numbers and win the same Florida Lottery jackpot worth $28m. One is a black woman in her late twenties who is working as a veterinary assistant in a small town. The other is a racist living in the Miami area who wants to use the money to finance a militia group, which he will use to fight the UN/Nato/Jewish/race-mixing invasion force he believes is preparing in the Bahamas. So the guy steals the woman’s lottery ticket. The story’s about how she gets it back.
I’ve been thinking about the story a little for a number of reasons, but the most pertinent of them is that the racist character in Lucky You can’t utter the most prominent word in the vocabulary cloud below. When he was in his early teens he spoke this word at home, once. Then his father, who never used corporal punishment, but for this one exception, beat him with a razor strop. After dad was done with him, his mother took him inside the house and washed his mouth out with a well know abrasive tub and tile cleaner containing bleach. Consequently, he has this gagging reflex whenever he even thinks this word. The only other member of his militia, his accomplice, teases him about this.
Carl Hiaasen uses this word in the book a number if times, which seemed daring to me, in a weird way. Hiaasen makes this word come from the mouths of racist bad guys, and some of the story attempts to explore racism and bigotry (but not so much that it disrupts the comedy). Nevertheless, it seemed daring to me because I don’t think I have ever spoken this word, though I often curse like a sailor. My parents never beat me for anything, much less using this word. But I grew up in a family of Libertarians who pretty much ignored skin color. And I was sheltered enough in white suburban California that racial issues were never prominent in my experience. Racism in the news always seemed somewhat unreal (well, a lot of the news did). It was only later, studying history in college, that I began to see racism as a real and contemporary problem. Well, that’s how sheltered I was.
Say the word as an insult and it brands you a stupid bigot. Say it ironically, or even analytically (as a commentary on language, for example) and it is too easy to be misunderstood, or come off as a priveledged white intellectual (which is what I am, basically, but I try not to flaunt it). It was an easy word to avoid, until this post.
Anyway, the blog under the microscope today is Miami Rhapsody, a truly fascinating read published by Yvette. I recommend subscribing. She won’t fill your inbox as often as many others, and seems to write only when she has something interesting to say. Her word sample runs from July 28, 2006 to June 1, 2007–every word she posted up to that point. There were only 20,000 words in the sample, so the numbers will look a little low in comparison to other blogs examined recently (where the samples tend towards 30,000) Yvette added 510 words. There were 3,608 different words in her sample, a little above the norm, I think.
Here is a word cloud comprised of the words used more than twice by Yvette but not at all by any of the other 23 blogs sampled thus far.
And here’s those words in a font called Floribetic:
And here’s the Venn diagram I usually make out of these words. The left lobe consists of words that were new in the sample, that nobody else had used, sized relative to the frequency of use. The middle lobe consists of words that everybody has used so far, sized according to how much more frequently Yvette used them in the sample. And the right lobe consists of only two words that everyone else sampled thus far has used, but that Yvette did not. She doesn’t seem to care about money or looks. Refreshing, isn’t it?
Here is another effort by my Haiku-generating algorithm.
You professor’s racists!
The louder nuns not potted
are the nuns you mow.
“Professor’s racists.” I kinda like that, although I’m not sure what it would mean. A group of brown-shirted nerdy bigots? Something in the phrasing seems like a badly-translated Maoist slogan of some sort. And “mowing the louder nuns” also puts me in mind of those jokes we told as a kid: What’s black and white and red all over?
As always, the vocabulary clouds and Haiku are the property of the volunteers, except that said volunteer may not have them taken off of my site but may otherwise do with them what they wish. Thanks for participating, Yvette!
Clarifications about the Blogging Vocabulary Study June 8, 2007Posted by caveblogem in Blogs and Blogging, linguistics, Other, statistical analysis, vocabulary.
1 comment so far
Dayngrous Discourse asked this question in a comment yesterday.
Of these words 3,907 were unique, meaning that she used 3,907 different words in the sample, well above the average of about 3,500. Does that mean I added that 3,907 new words to the database?
I learned during the brief time I was teaching college that when you get a question from somebody it usually means that there are dozens of people who had the same question but didn’t ask it, so I thought I’d put the answer up here where others could see it, rather than just tack it into the comments section.
So, in response, when I said that 3,907 words in the sample were unique, I meant that there were 3,907 different words in the sample of 28,000 words from your blog. I grabbed 28,000 words, which went in this order (I began the sample at May 13th, Mother’s Day):
That’s three different words, so far, from the title of the post. Then we have the beginning of the post itself:
Which is 6 words toward the sample’s total of 28,000, but only three are unique to that sample. And all of these words were already in the database, because somebody else had used them. So they weren’t unique to the database, just to the sample. Probably the first word that you added to the database was “dished,” which came at the last line of that Mother’s Day post.
I forgot to say in the post how many words you added to the database, so I’ll tell you now: 435, which is pretty darned good for someone who posts frequently, often more than once a day, as you do. And it is a lot for short posts with lots of tags, which tend to push the word variety down.
I hope that helps, but I have my doubts. Just as it is hard to talk about math using math, it is hard to talk about words with words sometimes, and here I am using both. If Kurt Gödel didn’t say something like that, he probably should have. This type of muddle is one of the reasons that literary critics invent jargon, not that we should forgive them, mind you.
The word database has more than half a million total words in it now. And about 25,000 different words. So it is getting more and more difficult to add new ones. But the next two participants, both hailing from southern Florida, found some anyway.