Unwritten words
December 18, 2024 10:44 AM   Subscribe

 
Some of these make sense as to why they're not present. The Sherlock Holmes one though indicated that apparently Doyle never thought to stage a mystery inside a prison run by priests making prayers to Jesus and it's a big loss.
posted by Atreides at 10:53 AM on December 18 [1 favorite]


I particularly like The Yellow Wallpaper -- "Himself" is missing and it's big, big because it occurs a *lot* in most works.
posted by clew at 11:01 AM on December 18 [6 favorites]


Pride and Prejudice and BLOOD
posted by Lemkin at 11:04 AM on December 18 [2 favorites]


Pride and Prejudice has BATTLE prominently, but it’s completely absent from the Winnie the Pooh one, making me wonder if there was a final Battle Of The Hundred Acre wood I missed out on.
posted by Jon Mitchell at 11:23 AM on December 18 [4 favorites]


Interesting. It seems correct but doesn't seem to do stemming. E.g. "difficult" does not appear in Moby Dick, but there are quite a few references to "difficulty" and "difficulties".

Some of it's interesting though, like the absence of "religion" from Sherlock Holmes.
posted by TheophileEscargot at 11:25 AM on December 18 [2 favorites]


that's a very interesting tool. the presence of the absence.
now they need to make it so I can upload texts and see what's missing
posted by chavenet at 11:26 AM on December 18 [1 favorite]


I woukd love to see this applied to politicians' corpus od speaches with the otherwise common being common in say newspapers' main sections or such. It might be interesting. It might not. I mean newspapers often report murders and there's not much point politicians talking about that But I'd be curious just in case. And also curious to see how the list differed for different politicians.
posted by If only I had a penguin... at 11:35 AM on December 18 [4 favorites]


Texts via Project Gutenberg. Word frequency data derived from Wiktionary frequency lists that are based on Project Gutenberg data. Some stopwords have been removed, especially when they lead to confusing results (e.g., "towards" and "toward" appear quite often in anti-tag clouds, because they are both common and authors rarely use both.) A few US/British pairs have been edited out, especially "honour/honor".

Note: If you try this yourself on your own texts, be careful because the frequency data from Wiktionary includes the boilerplate headers and footers from Project Gutenberg. For any other source of data, you probably want a different frequency list anyway, more reflective of typical English.
That sure suggests a minor repair someone should do, tidy up the Wiktionary inputs to *not* include the boilerplate!
posted by clew at 11:42 AM on December 18 [1 favorite]


I'd be interested in seeing this for the entire Holmes canon, beyond the single volume they mention here - one of the absent words in the volume they used was 'Greek', but a story called The Adventure of the Greek Interpreter appears in The Memoirs of Sherlock Holmes, the same volume which contains the story where Doyle vainly attempts to kill Holmes off.
posted by Whale Oil at 11:58 AM on December 18


That's interesting. I like the results for Jeeves and Winnie-thePooh particularly - they are sort of anti-texts.
posted by paduasoy at 12:01 PM on December 18


Yeah I was confused as well by the Holmes one. They list "inhabitants" and "mountain" as never appearing, but I'm looking right at both of those in "A Study In Scarlet" (which I reached for since they list a bunch of religious words and it's got this long discursion into the Mormons). But yeah, technically a different book.
posted by nickmark at 12:04 PM on December 18 [1 favorite]


Looking at the words for The Souls of Black Folk, I was startled to see "colour" listed. That's literally in the famous opening paragraph of the book!
Herein lie buried many things which if read with patience may show the strange meaning of being black here at the dawning of the Twentieth Century. This meaning is not without interest to you, Gentle Reader; for the problem of the Twentieth Century is the problem of the color line.
Of course, W.E.B. Du Bois spelled it the American way, so this system reported the shocking oddity that a common British spelling was not in the work.

C+ try harder.
posted by rum-soaked space hobo at 1:25 PM on December 18 [1 favorite]


On closer inspection BATTLE is indeed absent from Winnie.

Bother.
posted by Jon Mitchell at 1:28 PM on December 18 [2 favorites]


They excluded honour/honor for that reason, rum soaked space hobo, doing the same for colour/color should be possible.

Trickier to regard them as the same word. I mean, easy with a few global find and replaces, but... I expect there’s a reason they didn’t.
posted by clew at 1:43 PM on December 18 [1 favorite]


Having read, and re-read, and read again, the incredibly delightful adventures of Winnie the Pooh, I can absolutely guarantee "battle" or "war" have never been mentioned. However, "gun" is used in the very first story with Christopher Robin breaking out a bb gun and hitting something, just not what he was aiming for. Though, I do wonder if either of two words appear in the most recent film where there's some strong "going to war" imagery as the Hundred Acre wood folk prepare to deal with a Backson.
posted by Atreides at 2:21 PM on December 18


For those who may be wondering, "The Adventures of Sherlock Holmes" listed on that page is a specific volume of twelve Holmes stories, and the webpage is not using that label to refer to the entirety of all Holmes stories written by Doyle.
posted by Whale Oil at 3:04 PM on December 18 [3 favorites]


interesting, for Sherlock Holmes the top two are, SCARCELY
RELIGION
the bottom two are
UNIVERSAL
REIGN
posted by clavdivs at 3:28 PM on December 18


Jon Mitchell, I was so surprised I did a bit of Ctrl-F to check -- the plain text versions of the Milne books did not have it, but the missing-words page did, down near the bottom. No Scouring of the Hundred-Acre secretly hidden.
posted by clew at 3:58 PM on December 18 [1 favorite]


I often find value in parsing what people say or write by seeing what is unspoken/missing that should be there given the context, as it sometimes indicates what is really going on, instead of the empty suit document that has been published. There must be a name for this.

When Twitter was very functional I read of this being used to analyse companies (but I cannot find the link).

Surely a serious anti-tag / missing string analysis should compare within genre?

Whale Oil That may help explain why so many politicians speeches are so filled with weasel words, dissimulation as they avoid 'unspeakable truths'. Our new government are very much like that as they ran their campaign via a widespread astroturf / influence-op that attacked clean water (to appease the some farmers) and public health - (mainly by appealing to many NZers inherent racism).
posted by unearthed at 4:10 PM on December 18 [1 favorite]


« Older Ending the Patriarchal Bargain and finding new...   |   A vision of middle class futurism has ossified... Newer »


You are not currently logged in. Log in or create a new account to post comments.