How many languages will survive in a digital world?
October 28, 2013 7:33 AM   Subscribe

Digital language death: Of the approximately 7,000 languages spoken today, some 2,500 are generally considered endangered. Here we argue that this consensus figure vastly underestimates the danger of digital language death, in that less than 5% of all languages can still ascend to the digital realm. We present evidence of a massive die-off caused by the digital divide. — See also:

"If the web doesn't have it, it doesn't exist." Video of an earlier presentation by the author, Andras Kornai, on this topic.

Author's home page
posted by beagle (14 comments total) 16 users marked this as a favorite
 
5% * 7000 = 350 languages

That actually seems like a high estimate to me.

I wonder how many of those languages are being represented using their preferred / native scripts and charsets vs how many are generally encoded using an ASCII or Latin-1 transliteration? I saw a lot of the latter in Bangladesh and Thailand (esp. for mobile use). Desktop OSes may have finally gotten around encoding and input issues but on mobile it's still rough, particularly on low-end featurephones.

I'm curious if you were to look at the web as a whole, how many different scripts are widely used and in what percentage. My guess is that you'd see Latin (of course), Japanese, traditional and simplified Chinese, Korean, Cyrillic, and (maybe?) Devongari, but then the usage frequency would fall off spectacularly. Unfortunately I can't find any actual data, just encoding usage which doesn't tell you much since the clear winner is UTF-8. But my gut feeling is that you'd probably get to well over 99% of all written digital content using only 10 scripts, indicating that many languages are getting transliterated. (Admittedly, many languages share scripts / character sets so it's not like 350 languages implies that there should be 350 different scripts. But 350:10 is a pretty skewed ratio.)

For all the languages that are digitized but not in their native scripts/charsets, I wonder what that will mean for the evolution of those languages over time?
posted by Kadin2048 at 8:28 AM on October 28, 2013 [2 favorites]


Not to mention that a lot of the endangered languages don't even have writing systems/scripts.
posted by iamkimiam at 8:48 AM on October 28, 2013


And I no longer have a tail. We call that evolution.
posted by judson at 9:02 AM on October 28, 2013 [2 favorites]


And I no longer have a tail. We call that evolution.

Man, there are an awful lot of people that would jump at the chance to have a tail.
posted by curious nu at 9:04 AM on October 28, 2013


"If the web doesn't have it, it doesn't exist."

I imagine this will come as quite a shock to the large number of living humans who apparently don't exist.
posted by threeants at 9:08 AM on October 28, 2013 [1 favorite]


Language death is a very important topic, and its implications re digital technology are important as well, but the framing on this kind of bothers me. The term "digital ascent", for one, is almost laughably historicistic.
posted by threeants at 9:09 AM on October 28, 2013 [2 favorites]


It's not just the digital divide. It's worse. It's depressing, the number of student essays I have to read that have Estonian words but basically a garbled English syntax, with a bunch on literal translations of English expressions thrown in that nobody had heard of 10-15 years ago. This is accompanied by a noticeable drop in the size and variety of vocabulary. Estonian - the language of the celebrated IT republic - is an endangered species.

We all speak perfect English though.
posted by Pyrogenesis at 9:10 AM on October 28, 2013 [1 favorite]


the framing on this kind of bothers me

Me too — the methodology is very weird here, particularly the online component. It's strange, weakly justified, and result-skewing to do things like using Wikipedia as a substitute for doing the research to find actual online language communities, or relying on Apple's corporate decisions about what language support will be most profitable as a proxy for which languages are thriving in "the digital ecosystem." The methodological weirdness leads to some bizarre ideas, like the contention that whoever's machine-generating Volapuk Wikipedia articles is actually illicitly "gaming" the author's arbitrary metrics. As though the world had conspired to arrange itself around the researcher!
posted by RogerB at 10:00 AM on October 28, 2013 [2 favorites]


For more on the Volapük Wikipedia Crisis (I think Jimmy Wales even gets involved), see here and here.
posted by gubo at 10:20 AM on October 28, 2013


I dunno about this. Dialects persist in spoken language despite the lack of scripts, character sets, or any written clues at all. So long as enough people speak with a dialect, the abscence of dialects on the web should make no difference. Why should language be any different?
posted by three blind mice at 11:35 AM on October 28, 2013 [1 favorite]


One would think the digital divide is what's holding back the extinction of languages. Nothing preserves linguistic diversity better than ethnic segregation and hostility. Keep the $ETHNICs away from the net and they'll hold on to their language.
posted by ocschwar at 1:19 PM on October 28, 2013


Pyrogenesis wrote: Estonian - the language of the celebrated IT republic - is an endangered species. We all speak perfect English though.

In eighty years time, the only languages spoken in Europe will be English and Hungarian.
posted by Joe in Australia at 7:25 PM on October 28, 2013 [1 favorite]


Many languages will die in the next 50 years. On the plus side, fewer languages mean people across the world will find it easier to communicate. On the minus side, fewer languages mean... what exactly? I'm genuinely curious.
posted by Triplanetary at 10:31 AM on October 29, 2013


On the minus side, fewer languages mean... what exactly? I'm genuinely curious.

Sounds like you might enjoy David Crystal's book Language Death.
posted by RogerB at 10:45 AM on October 29, 2013


« Older There's plenty of blue chip hill rats 'round here...   |   One man's garbage... Newer »


This thread has been archived and is closed to new comments