insects and rodents seem apparently never to enter the buildings
September 28, 2022 1:01 AM   Subscribe

"The Tripitaka Koreana - carved on 81258 woodblocks in the 13th century - is the most successful large data transfer over time yet achieved by humankind. 52 million characters of information, transmitted over nearly 8 centuries with zero data loss - an unequalled achievement." (threadreader; previously: 1, 2; also btw 5 things the Western book as we know it depends on[1,2] and How the Trapper Keeper Shaped a Generation of Writers - or Pee-Chees if you please! ;)
posted by kliuless (25 comments total) 60 users marked this as a favorite
 
1. Wow!

2. The thread says the blocks can be used for printing, but also can be read from directly. Presumably the script is mirrored(?) - is character-based writing easier to read back to front than alphabet writing?
posted by colin.jaquiery at 1:25 AM on September 28, 2022 [3 favorites]


Wow indeed. You can see in one of the photos that the text is mirrored (unless they carved the other side right-reading, but that wasn’t mentioned). I don’t think hanja are easier to read in reverse than Latin characters, but I imagine you’d get used to it.
posted by adamrice at 1:39 AM on September 28, 2022 [1 favorite]


I was wondering about the print / read aspect too.

My guess: they do not ink the actual blanks, they put the paper sheet over it and then on top of that the ink plate or some other system is used.
posted by Meatbomb at 2:45 AM on September 28, 2022 [2 favorites]


Team Pee-Chee, for life. [and given the contents of the "5 things..." list, maybe the plural should be "Pee-Cheese"]
posted by chavenet at 3:04 AM on September 28, 2022


An exceptional example of a whole system - - it's surprising it wasn't covered in the Whole Earth Catalog back in the day.
posted by fairmettle at 3:25 AM on September 28, 2022


That's sooo impressive!
posted by Harald74 at 3:57 AM on September 28, 2022


In some cases, rather than reversed, the text was carved directly, and then at some point some folk came along and did rubbings:

https://www.lib.berkeley.edu/EAL/stone/rubbings.html
The most extensive of several large projects to preserve authoritative texts was the carving of the Buddhist canon on 7,137 stone tablets or steles—over 4 million characters—in an undertaking that continued from 605 to 1096.
There's a scene where a character surreptitiously takes a rubbing of part of Confucius's writings being rubbed in one of Barry Hughart's books - Bridge of Birds, maybe?
posted by sebastienbailard at 3:58 AM on September 28, 2022


So interesting, thanks for posting. The idea information transmission over long periods of time has always fascinated me.

It reminds me a little (but is different from) this temple in Japan that is rebuilt every 20 years.
posted by Gorgik at 5:34 AM on September 28, 2022


I'm curious about the writing. It's in Hanja, which are "Chinese characters" as the Twitter thread says but are a logographic Korean writing system dating back 2400 years. (Modern Korean is written in Hangul, a completely different phonetic system that was invented well after the Tripitaka Koreana was first carved.)

I don't know any Korean or any Chinese language so I'm having a hard time understanding Hanja but it sounds a little like Kanji. Using Chinese character glyphs to represent Korean words, with a lot of semantic overlap but with some divergence. This site has some example words.

The Twitter thread says that "Tripiṭaka Koreana was one of the most coveted items among Japanese Buddhists in the Edo period". Would the text have been readable to a Japanese scholar who did not have specific knowledge of Korean and Hanja? And could a Chinese scholar puzzle out the words well enough because of the similarities between Hanja and Hanzi?

(Also I'm going to quibble with the "2GB of data". Their estimate says each character is 256 bits, based on a raster rendering of the glyph. That's not at all how people think of writing systems though. The Tripiṭaka Koreana is not a bitmap, it is a sequence of discrete Hanja characters. Hanja has about 54,000 characters so ~16 bits a character. 52M characters = 100MB of data, more or less. If you really want a graphical encoding, measuring by strokes would make more sense than pixels. I don't mean to diminish the monumental achievement of the Tripitaka Koreana! Just can't help myself since the Twitter author started invoking some technical measurements I know something about.)
posted by Nelson at 7:32 AM on September 28, 2022 [5 favorites]


in some ways it's a miracle it's still around.

it survived the suppression of buddhism in favor of state confucianism during the joseon dynasty

it survived the imjin and chongyu wars, when the japanese invaded the peninsula and looted countless historical treasures and took artisans. monk armies can only go so far

it survived the japanese occupation, as haeinsa temple wasn't looted, ransacked, and demolished unlike other palaces and temples

it survived being outside the busan perimeter during the korean war, in a region that likely saw indiscriminate bombings and artillery barrages
posted by i used to be someone else at 7:33 AM on September 28, 2022 [7 favorites]


I am pleasantly surprised every time I see another tweet in that thread.
posted by gentlyepigrams at 8:03 AM on September 28, 2022


Now I have a perverse desire to go up province and start engraving on the Canadian Shield. Giant QR codes, component pixels 10cm across.
posted by seanmpuckett at 8:23 AM on September 28, 2022 [3 favorites]


> insects and rodents seem apparently never to enter the buildings

Buildings full of wood that was coated with poisonous lacquer, on floors of earth that sits upon layers of salt.

Why would an insect or rodent go there? There's nothing to eat, and the environment is physically repellant. The insect that spent too much time on those floors would dry right out, and the way the wood was treated, even a rat would not want to gnaw it.
posted by Aardvark Cheeselog at 8:31 AM on September 28, 2022 [4 favorites]


I'm curious about the writing. It's in Hanja, which are "Chinese characters" as the Twitter thread says but are a logographic Korean writing system dating back 2400 years. (Modern Korean is written in Hangul, a completely different phonetic system that was invented well after the Tripitaka Koreana was first carved.)

I don't know any Korean or any Chinese language so I'm having a hard time understanding Hanja but it sounds a little like Kanji. Using Chinese character glyphs to represent Korean words, with a lot of semantic overlap but with some divergence. This site has some example words.
so, hanja are chinese characters, which, in mandarin, are known as hanzi. just from the romanizations, you can see that hanzi (c), hanja (k), kanji (j), and hán tự (v) are all cognates. while their common ancestry was used as the basis of unicode han unification1, hanja and kanji broke off much earlier and so there are minor differences from modern hanzi.
  • 1a. unlike mainland hanzi, hanja and kanji never went through the dramatic simplification, so they're closer in appearance to traditional hanzi, which can still be seen in taiwan and hong kong
  • 1b. however, both hanja and kanji underwent their own forms of simplification, with kanji having more reforms known as "shinjitai", but these focus primarily on more common terms
  • 2. stroke orders are a little different between hanzi, hanja, and kanji, largely because writing styles evolved differently
  • 3. the comparison to kanji is more or less accurate, though it's much less common these days for hanja to be used. some of the last holdouts are newspapers, where they use it for specific terms that could be ambiguous if written as hangeul, but you will still see it in academic/legal writing or in marketing that wants to give an air of tradition (see: 辛 ramen). they're also used for names when people want to be formal about it (like, you'll see 김 on a name tag or most signatures; 金 if you get a fancy award or, on some vital records)
  • 4. hanja are often used for places--which is how you can get some unexpected abbreviations, with seoul being the cause for a lot of it. seoul has no real hanja representation, which is why the word for capital (京, pronounced 'gyeong') is used instead; this means the seoul-busan expressway isn't seobu, but gyeongbu
  • 5. pronunciations can vary. like how kanji have a native reading and a sinic reading, hanja do too. when those readings are used tend to vary, especially because there are lots of synonyms that have either native korean roots or sinic roots, though they often have connotational differences: 人 is "사람 / saram" in native korean, which is more "person", but "인간 / ingan" using sinic roots, which is more, like, "human"
  • 6. part of the reason hanja came to korea (and japan and vietnam) is because of sinic cultural influence and the spread of buddhism/confucianism/daoism
all of that is to preface: the tripitaka koreana is written in (i believe) classical chinese style, which means even though it's written with the korean variant of chinese characters, it is still in some ways chinese literature. one comparison could be the difference between classical arabic and pashto--i use this example because the quran and pashto are both written using arabic script, but arabic and pashto are unrelated languages and yet pashtun muslims are still able to read the quran and understand it.

pre-hangeul's main acceptance, writing and understanding classical written chinese was the mark of erudition; for commoners who were able to read, which was rare, hanja were often used to record sounds, not meaning, using systems called idu, gugyeol, or hyangchal. little formal writing was ever done in this format, and the difficulty and idiosyncrasies of those systems is one of the reasons for creating phonetic-based hangeul. it's also why hangeul didn't get widespread acceptance until the 19th century. class and patriarchy is, and was, such an impediment to progress
The Twitter thread says that "Tripiṭaka Koreana was one of the most coveted items among Japanese Buddhists in the Edo period". Would the text have been readable to a Japanese scholar who did not have specific knowledge of Korean and Hanja? And could a Chinese scholar puzzle out the words well enough because of the similarities between Hanja and Hanzi?
this means that yes--the text is also readable to japanese and chinese scholars, provided that they're familiar with some of the regional and temporal variances and also understand that this would have been written in the 1200s--imagine reading beowulf in old english using runic characters. for japanese scholars, like korean scholars, they'd be reading it as chinese much in the same way a pashtun imam would read the quran as arabic. transmission of meaning, not sound.

finally, the tripitaka koreana was used as the basis of the taishō tripitaka, with printings of it given to the japanese before the invasions. given that the edo period was after the invasions and centuries after the printings of tripitaka koreana were given to the japanese as gifts...

---

1han unification was undertaken largely because there are well over 100k characters and variants between korean, japanese, chinese, and vietnamese; there were enormous difficulties with this, because people had to decide which things were variants and which were distinct; which things would be regional and which would not. naturally, this was controversial and still is.
posted by i used to be someone else at 8:31 AM on September 28, 2022 [33 favorites]


Buildings full of wood that was coated with poisonous lacquer, on floors of earth that sits upon layers of salt.

fun fact. i'm assuming that the lacquer being used is based off toxicodendron trees given the fact that it's korea, meaning that they're made with urushiol

urushiol, if you're familiar, is the same thing that gives poison ivy/sumac/oak their characteristic itchiness and contact dermatitis

that said, once its turned into lacquer and treated, it's perfectly fine to eat out of because it's polymerized.
posted by i used to be someone else at 8:38 AM on September 28, 2022 [4 favorites]


I don't know any Korean or any Chinese language so I'm having a hard time understanding Hanja but it sounds a little like Kanji. Using Chinese character glyphs to represent Korean words, with a lot of semantic overlap but with some divergence. This site has some example words.
i realize i forgot to answer this part: basically, in modern written korean and japanese, hanja and kanji are used as highly compressed conveyances of meaning in a larger sentence written in korean and japanese grammar.

think of how we use emojis sometimes in english sentences, like in a lighter example:

"lol, trump just got his 🍑 im🍑ed"

the first one is like how hanja/kanji are used: the first peach is a stand in for "ass", conveying meaning.

the second one is idu/hyangchal/gugyeol (or the japanese antecedents of hiragana/katakana, things like manyogana), and used as a sound.

neither of those really matter in this case, because the way the tripitaka koreana (or, the korean name for it, 팔만 대장경 in hangeul, 八萬 大藏經 in hanja, "palman daejanggyong" romanized) is written is as classical chinese writing.
posted by i used to be someone else at 9:16 AM on September 28, 2022 [9 favorites]


Thank you for explaining how the language works! Maybe I had it backwards? For a modern Korean without specialized academic knowledge, would the Tripitaka Koreana be basically unreadable? And in, say, 1400 a Korean scholar would be able to read it but it would be more like reading a Chinese text than one in Korean like would be spoken colloquially?

(I appreciate the Arabic analogy too. Also wondering if it's a little like a pre-Reformation European situation where religious texts were in Latin and almost anyone who could read at all could read that Latin, no matter what their first language was.)
posted by Nelson at 9:22 AM on September 28, 2022 [1 favorite]


i don't know if specialized-specialized academic knowledge would be required. i'd say that a modern korean with a working knowledge of common hanja and equipped with a dictionary would be able to piece a lot of it together, though they'd likely miss a lot of the nuance and depth someone familiar with reading the classics untranslated would.

i'd put the comparison like a modern american being given a text in latin. if you know enough roots you can sorta maybe piece it together, kinda. you get a dictionary and more of it becomes apparent. but if you haven't studied latin, it's easy to make mistakes and miss things.

and yes--any scholar reading it, korean or not, would be reading it more like a chinese text with chinese grammar than anything really resembling spoken korean. (my understanding is that classical written chinese doesn't resemble modern spoken chinese languages either, kinda like how church latin or quranic arabic don't necessarily resemble vulgar latin or, say, jordanian arabic.)
posted by i used to be someone else at 9:48 AM on September 28, 2022 [3 favorites]


I just tried to read sample pages and maybe understood 70 percent of the first page (a biography of Buddha?) and 30 percent of the second page (some sutra probably?) Chinese kids have to learn some classical Chinese literature in high school, but reading 古文 is usually hard for a lay person and the lack of punctuation really really doesn't help.
posted by of strange foe at 10:19 AM on September 28, 2022 [4 favorites]


That's not at all how people think of writing systems though

The Twitter person explains why they didn't calculate using Unicode: because it's a specific protocol you can't count on people knowing, thousands of years in the future. Something graphical is far safer.

But they're still kinda wrong, because a text can include a key to its own encoding. If you were doing this entirely with bits, you could include the entire Unicode CJK dictionary— e.g. you list each of the 80,000 codes, together with a bitmap. That's under 3MB, which you can then add to your 100MB Unicode text.

Of course, a snarky person could suggest that maybe in a few millennia people won't be able to read Hanja. But it's OK: when the Buddhist canon is forgotten, Maitreya will appear.
posted by zompist at 2:40 PM on September 28, 2022 [5 favorites]


The insect that spent too much time on those floors would dry right out

The many hundreds of desiccated pill bugs in my dry outbuilding didn't seem hip to their fate. I keep telling them only death awaits if they get in, but do they listen?
posted by maxwelton at 4:07 PM on September 28, 2022


I was completely unaware of this, so thank you.
posted by aramaic at 8:46 PM on September 28, 2022 [1 favorite]


My academic field is Buddhist print culture in China, so I can try to answer some of the questions raised above:

The Sinic Buddhist canon is written in what we call Classical Chinese, but it was such a lingua franca for the entire region up to the modern era, this is a little misleading. It's similar to Latin during the middle ages and early modern periods in Europe - every educated person learned it, could read and write in it, but they wouldn't speak it, instead they would speak vernacular languages that were mostly based on the roots of their regional Classical language.

An extra complication is that the Buddhist scriptural texts were translated or compiled during different historical periods, and thus reflect different styles of CC. Additionally, they use a ton of terms and ideas that are unique to Buddhism, and which would be difficult for even an educated reader to understand without additional training.

The blocks have the characters carved on them in reverse, to be inked and then have paper pressed on their surface to make a print. I find reversed Chinese characters much easier to read than reversed English, maybe because the greater complexity of the characters makes it easier to identify them?

There were tons of enormous printing projects done with woodblocks over the centuries. This may be the largest single set, but it's just a small number compared to the whole of what was printed. It's sad that most people think of Gutenberg first when they think of the history of printing, whereas for centuries, Classical Chinese printed with woodblocks was the bulk of what was produced on earth.
posted by sudasana at 11:53 PM on September 28, 2022 [12 favorites]


There were tons of enormous printing projects done with woodblocks over the centuries. This may be the largest single set, but it's just a small number compared to the whole of what was printed. It's sad that most people think of Gutenberg first when they think of the history of printing, whereas for centuries, Classical Chinese printed with woodblocks was the bulk of what was produced on earth.
speaking of this, there's jikji movable metal type from korea in the late 1300s—and before gutenberg (previously). it is also a buddhist text.
posted by i used to be someone else at 8:22 AM on September 29, 2022 [2 favorites]


Japanese writing is based on Chinese, but evolved in a very different direction. Initially, however, to be a literate Japanese person meant reading Chinese. Because of the differences between the languages, Chinese need to be seriously contorted to express Japanese, leading to the writing system Japan has now.

But Japan developed another system called kanbun (漢文: written the same in Korean but pronounced hanmun), in which you write Chinese and add diacritic marks to give clues as to how to read it as Japanese. This isn’t really actively used anymore, but some people do still learn it. From what I understand, kanbun eventually developed its own idiosyncrasies distinct from Chinese.
posted by adamrice at 10:03 AM on September 29, 2022


« Older Machines Parlantes   |   WELCOME TO 1982: the year that invented pop music... Newer »


This thread has been archived and is closed to new comments