Voynich manuscript deciphered?
January 29, 2018 8:13 PM Subscribe

Decoding Anagrammed Texts Written in an Unknown Language and Script "Since its discovery over a hundred years ago, the 240-page Voynich manuscript, filled with seemingly coded language and inscrutable illustrations, has confounded linguists and cryptographers. Using artificial intelligence, Canadian researchers have taken a huge step forward in unraveling the document’s hidden meaning."
posted by dhruva (50 comments total) 31 users marked this as a favorite

This is exciting!

I wonder, though, if it’s a bad sign that the initial sentence makes no sense. It sounds like the beginning of modern in medias res fiction. Possibly the sentence is in some deep and vanished code, but that’s an awfully sexy conclusion.
posted by Countess Elena at 8:28 PM on January 29, 2018

"Be sure to drink your Ovaltine." Ovaltine? A crummy commercial? Son of a bitch!
posted by entropicamericana at 8:45 PM on January 29, 2018 [38 favorites]

I'm happy to own a copy and I'm looking forward to owning the annotated translation once it's published.

I really, really want to know WHAT IN THE HELL this thing is.
posted by Revvy at 8:46 PM on January 29, 2018 [1 favorite]

It will be decoded into yet another code that looks like nonsense.
posted by rhizome at 8:59 PM on January 29, 2018 [1 favorite]

Wait, let me get this straight. They decided it was Hebrew, presented the decoded "Hebrew" text to an actual speaker of Hebrew who told them it doesn't make sense in Hebrew, decided they didn't like the answer, and so they dicked around with it in Google freaking Translate until they got a vaguely coherent sentence? Like, from the sentence that they were already told doesn't make any sense? And then they have the (ahem) chutzpah to claim that "the full meaning of the text won’t be known until historians of ancient Hebrew have a chance to study the deciphered text."?
posted by tkfu at 9:11 PM on January 29, 2018 [52 favorites]

My preferred explanation.
posted by palmcorder_yajna at 9:20 PM on January 29, 2018 [4 favorites]

IT HAS BEEN
[0] DAYS
SINCE OUR LAST
WORKPLACE
VOYNICH MANUSCRIPT DECIPHERMENT

(We had made it about 284 days since the previous one)
posted by edheil at 9:21 PM on January 29, 2018 [104 favorites]

Has Betteridge's law been disproven?
posted by Joe in Australia at 9:30 PM on January 29, 2018 [3 favorites]

No.
posted by intermod at 9:48 PM on January 29, 2018 [14 favorites]

Not AI. How is this AI. More buzzword jumping.
posted by GallonOfAlan at 10:27 PM on January 29, 2018 [1 favorite]

I'm sorry. I have read their purported translation and it is so awful that it isn't even worth making fun of. I can't believe any credible journal would print such nonsense.

The authors present, as a sample, the purported transcription "ועשה לה הכהן איש עליו לביתו ו עלי ענשיו המצות" and say that "According to a native speaker of the language, this is not quite a coherent sentence." I should say it isn't! It's gibberish. The first three words could plausibly mean "The cohen (priest) shall do for her", but then it continues "man, on him, [the letter vav] (which means "and" at the start of a word but never appears by itself), on me, its people, the commandments (or, the matzos)". This doesn't mean anything; it doesn't even respect the way Hebrew conjunctions work.

The researchers seem to have produced their "translation" as follows: they assigned a value to each Voynich symbol and sorted the letters of each word accordingly, so the words THERE and THREE would each be recorded as 11234. They then took a dictionary for each language and did the same for all of those words, but kept changing the values of the symbols until they found one which worked. E.g., a correct choice of values for the words APPLE and RIDER will also potentially have the pattern 11234, but choosing a set of values that produces 11234 for one of these will rule out the possibility of an equivalent for the other. It's obviously easy to find a set of values that will make any individual word meaningful, but a correct set of values will produce a valid translation for every word in the Voynich manuscript.

I'm not surprised the researchers found a match for Hebrew: Hebrew (and other Semitic languages) tend to have shortish words made up of modifiers added to a three-letter root system. The vocalisation of a word is mostly implied by its consonants, so Hebrew words can be made up from a combination of almost any letters. In contrast, in most European languages every valid word must have at least one of a limited range of vowels (in English, A-E-I-O-U, sometimes Y, rarely W). This would have made it a lot easier for them to find matches in a Hebrew dictionary, because Hebrew words don't need to include a limited subset of letters. And even so, the paper tells us that they had to assume that some words in the Voynich manuscript had spelling errors.

This paper is bad, bad, bad. Computer engineers should not be allowed to do linguistics. At least, not without some actual knowledge of languages.
posted by Joe in Australia at 10:43 PM on January 29, 2018 [70 favorites]

Not AI. How is this AI. More buzzword jumping.

AI is a vaguely defined term, and anything that doesn't involve manually creating the specific inquiry (in this case, linguistic/semantic analysis and translation) is generally considered to qualify.

Are they still banking on people being drawn to the buzzword? Sure, but it's still technically accurate.

Their conclusion is also nonsense, but it's still (badly done) legitimate AI.
posted by Molten Berle at 10:49 PM on January 29, 2018

AIs are cool and trendy and all, but I solved the Voynich Manuscript using good old-fashioned cryogenics. I simply froze my copy to preserve it for when future me has access to time travel, and someone else has figured it out. It'll work eventually!
posted by drinkyclown at 10:50 PM on January 29, 2018 [2 favorites]

Even without much knowledge of languages there are obvious holes in the paper. For example they show their histogram where Hebrew is the closest match--but don't present any evidence that the match is close enough to actually mean anything. How about showing a similar histogram for actual Hebrew texts that have been manipulated in the proposed fashion (alphagrams)--one would think it should match? How about showing that histograms for *other* languages *don't* match? Or that random gibberish doesn't match?
posted by equalpants at 11:00 PM on January 29, 2018 [7 favorites]

Can't they determine from the physical artifact these days, bt analysing the paper, inks, etc., where it was likely produced, down to a fairly small region, and when? The kind of crank who produces these time-cube type artifacts is often present in the historical record.
posted by maxwelton at 11:29 PM on January 29, 2018 [1 favorite]

The first attempt at applying machine learning to the Voyinch manuscript returned: Not a hotdog.
posted by drnick at 12:20 AM on January 30, 2018 [7 favorites]

filled with seemingly coded language and inscrutable illustrations

Sounds like a sandwich to me.
posted by rhizome at 12:24 AM on January 30, 2018 [1 favorite]

What Joe in Australia said. Even somebody like me, with only the most basic, rudimentary familiarity with the Hebrew language can tell that isn't a sentence. It's just a sequence of random words strung together. Given the way Hebrew spelling works, you can usually arrange and rearrange a group of four or five characters till you eventually get a combination that a Hebrew spellchecker will recognize as an actual word. That's essentially what they did. They got an 80% success rate! Getting coherent sentences this way is a another matter. They totally failed at that.
posted by nangar at 12:47 AM on January 30, 2018 [4 favorites]

It's a cookbook!
posted by cenoxo at 1:03 AM on January 30, 2018 [3 favorites]

I still can’t get past the part about “... and then Google Translate spat out something semi-coherent.” It’s mind-boggling. I’m sure the authors are very nice people, but this is like me writing an article on an exciting new discovery about the diet of large African mammals based on experiments with a Hungry Hungry Hippos board.
posted by No-sword at 1:37 AM on January 30, 2018 [26 favorites]

When we find out it just says '42' we'll have to build an even bigger AI.
posted by adept256 at 2:06 AM on January 30, 2018 [4 favorites]

They couldn't find Hebrew scholars to vet their translations? I find that highly suspect. Just whom did they reach out to? I know for a fact the Technion's Alon Itai would be all over this.
posted by analogue at 2:21 AM on January 30, 2018

They should focus on finding the person behind the hoax. That's the interesting bit of this story. Who, around 1400 or so, could and would have come up with a joke like that?
posted by pracowity at 4:14 AM on January 30, 2018 [1 favorite]

Huh. I had thought that the standard authorship hypothesis ascribed it to John Dee and/or Edward Kelley, who would have had both the technical skills and the motivation to create such a thing, but the radiodating places it a century before either was born.
posted by jackbishop at 4:57 AM on January 30, 2018

is there a free digital version using a 'voynich' character set? i'm voy-curious.
posted by j_curiouser at 6:11 AM on January 30, 2018 [1 favorite]

Surely this!
posted by tofu_crouton at 6:19 AM on January 30, 2018

i'm voy-curious.

voy ey
posted by pracowity at 6:33 AM on January 30, 2018 [7 favorites]

I had thought that the standard authorship hypothesis ascribed it to John Dee and/or Edward Kelley ...

Roger Bacon and Albertus Magnus are two of the traditional authorial candidates, but having read Bruce Holsinger's John Gower mysteries over the holidays, I now want Chaucer to be the author.
posted by octobersurprise at 6:49 AM on January 30, 2018 [2 favorites]

Can't they determine from the physical artifact these days, bt analysing the paper, inks, etc., where it was likely produced, down to a fairly small region, and when?

Carbon dating says it's from the early to mid 1400s. The region is a little harder to prove, but there are some examples in the drawings of castles with swallow-tail shaped crenellations that are particular to northern Italy. Still, no one's found a likely author.
posted by echo target at 7:17 AM on January 30, 2018 [2 favorites]

> (We had made it about 284 days since the previous one)

think you missed one... the last was in September
posted by Auz at 7:18 AM on January 30, 2018 [1 favorite]

From the paper:

"None of the decipherments appear to be syntactically correct or semantically consistent [...] The results presented in this section could be interpreted either as tantalizing clues for Hebrew as the source language of the VMS, or simply as artifacts of the combinatorial power of anagramming and language models."

The authors haven't claimed to have deciphered the VMS; I think the Gizmodo article distorts their claims significantly (i.e. hype/clickbait).
posted by crazy_yeti at 7:33 AM on January 30, 2018 [3 favorites]

voy ey
posted by pracowity at 9:33 AM

... Canadian researchers have taken a huge step forward ...

voy, eh?
posted by ZenMasterThis at 7:58 AM on January 30, 2018 [4 favorites]

Is there any reason to think the Manuscript wasn't produced with a one-time pad? That seems like the simplest explanation for an indecipherable code and would mean it's pointless to keep trying to decode it.
posted by straight at 9:08 AM on January 30, 2018 [1 favorite]

the combinatorial power of anagramming and language models

This is how I feel sometimes when I'm trying to explain to my 3 year old that she shouldn't do this thing that isn't as clear cut as "Don't run into the street!" or that she should do another thing in response to her question of "Why do we pick up toys before dinner?" and my befuddlement of trying to role play as a parent while A) tired and B) usually caught unawares comes out as something brilliant like

"Because important clean to house keep." or "No, doing that isn't the best thing; I told you!"
posted by RolandOfEld at 9:11 AM on January 30, 2018

Yet another breathless overhyped non-solution to the Voynich. There really should be a global moratorium on this stuff. The Voynich is cool because it's mysterious and has really neat illustrations. It's fun. It's not particularly important though. And the way it attracts amateurs (including myself) convincing themselves they've read some meaning into the randomness is embarassing.

Is there any reason to think the Manuscript wasn't produced with a one-time pad?

There's too much regular structure and repeated sequences for that. It may be enciphered, but if so it's a more structural cipher than complete randomization. IIRC the sequences are actually too regular for typical written languages, which suggests a possible forgery. The equivalent of someone filling in "blah blah blah" instead of "lorem ipsum dolor".
posted by Nelson at 9:32 AM on January 30, 2018 [6 favorites]

Roger Bacon and Albertus Magnus

But just as Dee and Kelley are too late, they're too early, surely? Unless we hypothesize the whole thing was transcribed 100-200 years later, which seems no more plausible than the notion that Dee or Kelley used really old vellum.
posted by jackbishop at 9:40 AM on January 30, 2018

Even without much knowledge of languages there are obvious holes in the paper.

Well, there's the problem right there. They should get ahold of copy that's in better condition.
posted by Atom Eyes at 9:42 AM on January 30, 2018 [3 favorites]

Is there any reason to think the Manuscript wasn't produced with a one-time pad?

The scope of the project, with the included illustrations (not to mention the length) would suggest that it is not a throw-away object like a one-time pad would be.

Also, the letter distributions are all wrong. Run text through a one-time pad, and you get uniform distributions. That's one of the benefits of a one-time pad, that it utterly obliterates detectable language features which would aid cryptanalysis. The Voynich manuscript is very languagey in its glyph distributions, which a text encrypted with a one-time pad wouldn't be. There are other ciphers which do produce uneven glyph distributions, like a running-key cipher, where some standard text is used as the key instead of a one-time pad.
posted by jackbishop at 9:49 AM on January 30, 2018 [1 favorite]

Also, cryptography was in its infancy in the 1400's. One time pads are a fairly recent invention to the best of anyone's knowledge (first known example from the 1880's). In theory someone could have had the idea, used it once and never told anyone else about their nifty cryptographic invention, but it seems awfully unlikely. Back in the 1400's even multi-alphabet ciphers were new and wild.

Back in the 1400's cryptography was a viewed as a sort of useless endeavor and all ciphers were viewed as breakable by a knowledgeable person given enough time.
posted by sotonohito at 10:18 AM on January 30, 2018 [4 favorites]

jackbishop: "That's one of the benefits of a one-time pad, that it utterly obliterates detectable language features which would aid cryptanalysis."

Is this the case even if the pad is a coherent script EG: XORing with say a bible.
posted by Mitheral at 10:38 AM on January 30, 2018

Mitheral No. To work right a one time pad **MUST** be as close to truly random as possible, making "truly random" numbers has long been one of those difficult problems for cryptography.

Order in a one time pad will make the resulting cyphertext vastly weaker. So much so that with modern computer assisted cryptanalysis a one time pad made the old fashioned way (with bingo balls or what have you) is vulnerable (well, sort of) because some balls will weigh more or less than others and the results won't be as perfectly random as you'd hope. If the person making the pad cheats, doesn't close their eyes when grabbing a ball for example, that also introduces order into the result.

Use the Bible as a pad and you'd get cyphertext that is crackable.
posted by sotonohito at 11:02 AM on January 30, 2018 [2 favorites]

To not abuse the edit window, the longer the message the more vulnerable it is to cryptanalysis no matter what. A brief, twenty or thirty character message might be fairly secure even using the Bible as a pad (don't count on it, but maybe). The more characters that exist in the cyphertext the more vulnerabilities you get from insufficient randomness in your pad.

I'd expect anything the size of the Voynich Manuscript encrypted using 1400's tech would be almost trivially cracked using modern cryptanalaysis. Codes are a different matter.

But the Voynich Manuscript almost certainly isn't just cleartext in any known language run through an encrypting algorithm, encrypted text doesn't look like that.
posted by sotonohito at 11:09 AM on January 30, 2018 [3 favorites]

It will be decoded into yet another code that looks like nonsense.

"I have transformed the problem from intractably difficult and possibly quite insoluble conundrum into a mere linguistic puzzle. Albeit," he muttered, after a long moment of silent pondering, "an intractably difficult and possibly insoluble one."

—Douglas Adams, Dirk Gently's Holistic Detective Agency
posted by kindall at 12:04 PM on January 30, 2018 [4 favorites]

Now I don't want to say I solved it, at least not yet. But randomly assigning the values 1 or 0 to each letter, then carefully permuting the resulting bit string, the first 218,720 letters clearly decode to this

Obviously, an xkcd scholar would have to take a look before I publish.
posted by kleinsteradikaleminderheit at 12:24 PM on January 30, 2018 [2 favorites]

> Use the Bible as a pad and you'd get cyphertext that is crackable.

also it would decode to the bible
posted by kleinsteradikaleminderheit at 12:28 PM on January 30, 2018 [1 favorite]

Not AI. How is this AI.

The process of training a machine to recognize things by showing it a bunch of examples and then asking it to identify a example it has never seen is very solidly in the realm of what is called "Machine Learning". Depending on which definition you're using, Machine Learning can come under Artificial Intelligence. Of course, the definition of both (and Deep Learning!) change about four times a day so don't quote me.
posted by Tell Me No Lies at 1:47 PM on January 30, 2018

IT HAS BEEN
[0] DAYS
SINCE OUR LAST
WORKPLACE
VOYNICH MANUSCRIPT DECIPHERMENT

I'm auditioning part of one of the astronomical charts as a tattoo right now, but there is a part of me that worries about a real translation showing up. One day I'll have a statement on how powerful enigma can be, the next I'll be on Buzzfeed.
posted by Tell Me No Lies at 2:03 PM on January 30, 2018 [5 favorites]

Tell Me No Lies: auditioning part of one of the astronomical charts as a tattoo right now

heh. what a unique problem to have. It seems to me , you might have to disguise the astronomical chart somehow, like use some sort of code or mirror flip or something that gives you plausible deniability in case a true translation turns up.
posted by dhruva at 7:06 PM on January 30, 2018

A friend just linked to this very good article: "This week, The Times of Israel spoke with Hebrew-language, medieval and computer science scholars who broke down the so-called decryption breakthrough."

I laughed out loud when I saw that the researchers came from the University of Alberta, because so does this intrepid researcher who recently 'solved' the mysterious code written in the back of the Somerton Man's Rubaiyat of Omar Khayyam. Whatever are they putting in the water up there?
posted by daisyk at 11:56 AM on February 1, 2018

Not so fast: "Claims about AI ‘cracking’ the 600-year-old code were just wishful thinking" [The Verge]
posted by slipthought at 12:06 PM on February 7, 2018

« Older DIY or Die in 1994 | "On the one hand, information wants to be... Newer »

This thread has been archived and is closed to new comments

MetaFilter

Voynich manuscript deciphered?
January 29, 2018 8:13 PM Subscribe

Tags

Share

Voynich manuscript deciphered? January 29, 2018 8:13 PM Subscribe

Tags

Share

Voynich manuscript deciphered?
January 29, 2018 8:13 PM Subscribe