Here’s how to fit 1,000 terabytes on a DVD
June 20, 2013 6:09 PM   Subscribe

"We live in a world where digital information is exploding. Some 90% of the world’s data was generated in the past two years. The obvious question is: how can we store it all? In Nature Communications today, we, along with Richard Evans from CSIRO, show how we developed a new technique to enable the data capacity of a single DVD to increase from 4.7 gigabytes up to one petabyte (1,000 terabytes). This is equivalent of 10.6 years of compressed high-definition video or 50,000 full high-definition movies."
posted by Blazecock Pileon (74 comments total) 19 users marked this as a favorite
 
That's a lotta pr0n.
posted by Greg_Ace at 6:11 PM on June 20, 2013 [4 favorites]


Let's be clear: this is not about changing the way information is recorded on standard DVDs. Rather, it's about changing the composition of an object so that it meets the dimensional standards of DVD but has far different materials comprising it.

You are not going to be able to buy a Nature Communications burner and use your old Memorex DVD-Rs to store a petabyte. You're going to have to buy a Nature Comm burner AND Nature Comm discs.
posted by infinitewindow at 6:15 PM on June 20, 2013 [11 favorites]


Some 90% of the world’s data was generated in the past two years.

Bullshit. Check out the fossil record. It's analog, yes, but it's still data.
posted by quonsar II: smock fishpants and the temple of foon at 6:19 PM on June 20, 2013 [41 favorites]


Yay! Now I can lose 200,000 times as much data when the disc gets scratched!
posted by Sys Rq at 6:28 PM on June 20, 2013 [21 favorites]


The logistics of reading and writing a DVD-sized disc with such high densities are non trivial. How small can you make the heads? How costly are they going to be?
posted by Monday, stony Monday at 6:34 PM on June 20, 2013


quonsar II: smock fishpants and the temple of foon: "Some 90% of the world’s data was generated in the past two years.

Bullshit. Check out the fossil record. It's analog, yes, but it's still data.
"

If you ain't digital, you ain't shit.

It's more like ten years, but BP is right. The amount of information has expanded exponentially in just my lifetime, not geometrically. (Shit, I hope I have those right.)

Even if we could get all of the DNA information from every single fossil on the planet, it would pale in comparison storage wise to what we've shat out over the last decade.

The NSA didn't spend billions to store 300GB of biological fossil data. They spent billions to store every single email sent to or from an American based server for like seven years.
posted by Sphinx at 6:34 PM on June 20, 2013 [1 favorite]


Woot! A disc to store all the contents of my download directory, with all that little stuff I download and then am afraid to delete in case I might need it again.
posted by Samizdata at 6:40 PM on June 20, 2013 [3 favorites]


You can store a petabyte but can you store a a trilobite?
posted by benzenedream at 6:42 PM on June 20, 2013 [39 favorites]


Some 90% of the world’s data was generated in the past two years.

Where did this figure come from? Nature is citing Science Daily, but that's a news outlet, not a research publication, and they seem to have gotten it from a SINTEF press release. Are press releases peer reviewed?
posted by Toekneesan at 6:43 PM on June 20, 2013 [1 favorite]


Let's be clear: this is not about changing the way information is recorded on standard DVDs. Rather, it's about changing the composition of an object so that it meets the dimensional standards of DVD but has far different materials comprising it.

Yeah, their use of "DVD" is very misleading. By their definition blu-rays and CD-ROMs are also DVDs.
posted by Holy Zarquon's Singing Fish at 6:46 PM on June 20, 2013 [2 favorites]


What is really fun about this, if it isn't bullshit, isn't what you could do with a DVD-sized disc (which nobody would need for practical purposes unless holographic 3D TV is invented) but what you could do with much smaller, say 3/4 inch diameter, and much more easily concealable discs.

However, the article doesn't explain how you're supposed to read the resulting high-density tracks with normal equipment.
posted by localroger at 6:47 PM on June 20, 2013


Sorry, the Nature article didn't cite Science Daily, the blog about the Nature article did. Still, I'm wondering how it was calculated.
posted by Toekneesan at 6:48 PM on June 20, 2013


Practically, this means that long term backups from existing TB hard drives/SSDs will get a hell of a lot less precarious for consumers[1], even if this ends up storing 10x less because of error correction or other mass production considerations.

That kind of density would also seem to indicate that the discs will have rather insane transfer rates.

I wonder if they will call it PurpleRay? :-)

[1] right now, most people just backup... to another hard drive.
posted by smidgen at 6:51 PM on June 20, 2013


The NSA will find this useful!
posted by Foosnark at 6:52 PM on June 20, 2013 [1 favorite]


For the disbelievers: My senior thesis was a historical paper on living conditions in Philadelphia during the British occupation (1777-78).

For my research, I collected every easily to moderately available source of data still in existence that even vaguely concerns Philadelphia and it's surroundings during that time period. Censuses, letters, diaries, maps, official reports, foreign newspaper reports, military reports, parliamentary and Congressional proceedings, even contemporary broadsheets and advertisements. There are probably obscure documents sitting in the bottom sub-basement of a Pennsylvania courthouse or library that I couldn't reach, but I can confidently say that my personal collection of these documents is probably approaching something like a comprehensive collection of surviving documents relating to Philadelphia in 1777-78.

All of these documents, almost all in PDF format, take up 572.4 megabytes on my laptop HD.

Literally the entire history of an entire region of the United States for an entire year takes up a fraction of a $12 thumb drive.

And we're talking about the late 1700's here. The farther back in history you go, the fewer and farther between the documents become until you get to the Classical era, where you are lucky if the event in question get's mentioned by one source and repeated by another.

It's very easy for me to believe that 90% of all data ever produced by the human race has occurred in the past two years. People simply have no concept of how sparse human data is before the advent of the digital revolution.
posted by Avenger at 6:53 PM on June 20, 2013 [27 favorites]


I wonder if they will call it PurpleRay?

Blu-Ray is called that because the laser is blue, which with its shorter wavelength gives it a smaller dot than DVD red or CD infrared. It was a great idea until they crapped up the new standard with all the weird DRM and took out the error correction.

This technique is actually wavelength-independent, but sharpening the results at whatever base wavelength you use. I'd expect it to be called something "overlap" related.
posted by localroger at 6:55 PM on June 20, 2013


Here's to hoping CSIRO doesn't asshole this one up like they did with WiFi. (Which directly affected me as Buffalo was injuncted from doing anything with my old router.)
posted by Samizdata at 6:57 PM on June 20, 2013


I'll believe it when I see it. You need only google holographic disc storage to see numerous press releases and articles over the past 12 years about how "real soon now!" we'll be able to fit 500GB on a DVD-sized disc, but nothing ever seems to come to market.
posted by thewalrus at 6:58 PM on June 20, 2013 [1 favorite]


All of these documents, almost all in PDF format, take up 572.4 megabytes on my laptop HD.

Avenger, that may be true, but what if your goal is the preservation of old printed materials that are degrading? Scanning a single 150 page book and storing it at high quality 600 dpi, raster is going to take up way, way more than 572MB. Yes you can OCR the book, but if data storage is cheap, isn't it useful to keep the original scans as well? The original scans can always be used to create new formats.
posted by thewalrus at 7:01 PM on June 20, 2013


If you ain't digital, you ain't shit.

DNA is digital.
posted by DU at 7:01 PM on June 20, 2013 [7 favorites]


Shocking statistics are the crack cocaine of the web. Tolerance develops fast. "90% of the worlds data has been generated in the last two years" is the first stat that's gotten me high in a while.
posted by Halogenhat at 7:04 PM on June 20, 2013 [6 favorites]


Might be true if you were to somehow be able to count the cumulative total of the uncompressed bitrate of all the digital video recorded in the last two years.
posted by thewalrus at 7:06 PM on June 20, 2013


We are truly in an age where quantity trumps quality.
posted by not_on_display at 7:07 PM on June 20, 2013 [4 favorites]


Samizdata: "Here's to hoping CSIRO doesn't asshole this one up like they did with WiFi."

Despite the story told in that Ars Technica hatchet job, the real arseholes in that case were Broadcom, Lenovo, Acer, Sony, AT&T, etc.
posted by Pinback at 7:12 PM on June 20, 2013 [1 favorite]


And graphien (sp?) batteries will last a week and charge in a minute! Get back to me when these things ship. Otherwise the are Duke Nuke 'em.
posted by cjorgensen at 7:15 PM on June 20, 2013


DNA is digital.

as are fossilized phalanges.
posted by rube goldberg at 7:18 PM on June 20, 2013 [6 favorites]


Pinback: "Samizdata: "Here's to hoping CSIRO doesn't asshole this one up like they did with WiFi."

Despite the story told in that Ars Technica hatchet job, the real arseholes in that case were Broadcom, Lenovo, Acer, Sony, AT&T, etc.
"

I will agree to disagree. From what I saw at the time it was patent trollery on an international scale. That is the only reason I can see for the Buffalo test case. You attack the little company to see what you can get away with.
posted by Samizdata at 7:22 PM on June 20, 2013 [1 favorite]


Oh, and circle takes the square for the win.
posted by Samizdata at 7:23 PM on June 20, 2013


Some 90% of the world’s data was generated in the past two years.

Unfortunately, this figure equally weights "cat video" and "We discovered there's an entire new continent in the ocean between Europe and Asia."

Quantity ain't quality.
posted by Thorzdad at 7:23 PM on June 20, 2013 [6 favorites]


All fossils are digital.

How do you think paleontologists get all those fossils? They dig it all!

(I'm not even slightly sorry for that joke.)
posted by etc. at 7:24 PM on June 20, 2013 [16 favorites]


The laser focusing technique is nothing new. The real innovation came from the fact that they found a polymer that could actually withstand being etched by such a tiny beam.

No word on how durable the material is, which will be a big hurdle to overcome if we're going to commercialize this.
posted by schmod at 7:24 PM on June 20, 2013


(Also, FWIW, we've had the technology to read data at crazy densities for quite some time -- you could theoretically manipulate and read data at the sub-nanometer level with an Atomic Force Microscope. However, doing so would be unbelievably tedious.
posted by schmod at 7:26 PM on June 20, 2013


Bullshit. Check out the fossil record. It's analog, yes, but it's still data.

The wild thing is, it's all information! Check out the radiation pervading the universe! That's been going on for a while.
posted by kenko at 7:27 PM on June 20, 2013 [1 favorite]


I remember reading "Microcosmos" where it is claimed that bacteria have horizontally sexed up more information through DNA transfer than all of human history combined. I suspect it may still hold true.
posted by lordaych at 7:30 PM on June 20, 2013 [1 favorite]


Bacteria can suck it.
posted by Halogenhat at 7:31 PM on June 20, 2013 [1 favorite]


We need them more than they could ever need us. Bacteria! The top and bottom of the food chain.
posted by lordaych at 7:33 PM on June 20, 2013


In before Blasdelb invokes phages. Derail over.
posted by lordaych at 7:34 PM on June 20, 2013 [3 favorites]


My phage can beat your bacteria's ass.
posted by Samizdata at 7:36 PM on June 20, 2013 [2 favorites]


SELECT DISTINCT DNA FROM FageYogurt WHERE FLAVOR="honey" AND SIZE="8" AND UNIT="o to the zip"
posted by lordaych at 7:39 PM on June 20, 2013 [1 favorite]


So, does this mean we are becoming digital hoarders? Being able to store more and more stuff on smaller and smaller media is one thing, being able to retrieve, catagorize and make sense of all the digital hoard is quite another.
posted by BozoBurgerBonanza at 7:55 PM on June 20, 2013 [1 favorite]


Yeah but you can't google your basement!
posted by lordaych at 7:56 PM on June 20, 2013


Also the intelligence efforts to gather ALL THE DATAZ remind me of a friend who regularly buys external hard drives to fill with music, pron, and various videos etc knowing he will never go back and peruse them, except terrifying and no "never."
posted by lordaych at 7:59 PM on June 20, 2013 [1 favorite]


Samizdata: "I will agree to disagree. From what I saw at the time it was patent trollery on an international scale."

Fair enough; it's tangential to this post & I'm definitely not here to convince anybody of the correctness or otherwise of my opinion.

But I'll point out that the original CSIRO patent pre-dated 802.11a & related/derived standards (being from 1993, it actually pre-dates 802.11 altogether), described a non-obvious solution to a technical problem that the industry had been trying to overcome for years, was accepted into the 802.11 patent pool under licencing conditions agreed to by both the CSIRO and the IEEE, that those conditions were agreed to by licencees as part of their licence to use 802.11a and derived standards, and that the CSIRO had been attempting to work with chipset manufacturers & other 802.11 licencees to meet those licencing conditions since their method was first adopted as part of 802.11a.

Strangely, the Ars Technica article neglects to mention most of those facts, which makes it hard for people to determine the rights and wrongs of the case for themselves…
posted by Pinback at 8:00 PM on June 20, 2013 [4 favorites]


Still, pretty cool technology, eh?
posted by Blazecock Pileon at 8:28 PM on June 20, 2013 [1 favorite]


I am disappointed. I first read the title as, "Here's how to fit 1,000 vertebrates on a DVD."

This is really interesting, though - thanks for the link!
posted by daisyk at 9:11 PM on June 20, 2013


"...we developed a new technique to enable the data capacity of a single DVD to increase from 4.7 gigabytes up to one petabyte (1,000 terabytes). This is equivalent of 10.6 years of compressed high-definition video or 50,000 full high-definition movies."

Out of 50,000 movies, only 500 are really worth watching. Using this observation, I was able to compress one petabyte's worth of data into 10 terabytes.
posted by twoleftfeet at 9:13 PM on June 20, 2013 [6 favorites]



Shocking statistics are the crack cocaine of the web. Tolerance develops fast. "90% of the worlds data has been generated in the last two years" is the first stat that's gotten me high in a while.


Yeah, except 90% of that data is just people misspelling their racism on facebook and gifs of Oprah shooting bees at a teen wolf.

Still, pretty cool technology, eh?

Totally, I long for the day I have a copy of everything and an AI reference librarian/indexer on my phone,
posted by Divine_Wino at 9:14 PM on June 20, 2013


If they're taking requests, I'd like a DVD with a one-bit capacity. Because a four and half inch dot would be pretty exciting. And if you give me enough of them I could rearrange them on my shelf to make a copy of any movie I want.
posted by storybored at 9:26 PM on June 20, 2013 [3 favorites]


Rearrangement sold separately.
posted by twoleftfeet at 9:52 PM on June 20, 2013 [1 favorite]


What we need is better compression technology. Something that will cross-reference multiple YouTube uploads of the same video (and the inevitable remixes) where you only have to store the same frame once. That should cut down the size of stored data over the past 2 years by 99%.
posted by pashdown at 10:01 PM on June 20, 2013


We live in a world where digital information is exploding.

You know, that's not just hyperbole. Many consumer products that use digital information are not up to International Safety Standards and can fail catastrophically under heavy digital information loads, leading to explosions.

It's another reason to keep your cellphone a few more inches away from your brain.
posted by twoleftfeet at 10:02 PM on June 20, 2013 [1 favorite]


90% of the worlds data has been generated in the last two years.

Whoah.
posted by chemoboy at 10:07 PM on June 20, 2013


90% of the worlds data has been generated in the last two years.

Yeah. I know that guy. His name is Al and he lives with his mom. He doesn't have anything else to do so you know...

I've have a word with him. Sorry about the inconvenience.
posted by twoleftfeet at 10:13 PM on June 20, 2013 [2 favorites]


there's more bandwidth in a life, maybe even a subsentient one, than our entire digital record.
posted by legospaceman at 10:13 PM on June 20, 2013


there's more bandwidth in a life, maybe even a subsentient one, than our entire digital record.

Maybe when we simulate life in a computer, we'll have better compression algorithms.
posted by Blazecock Pileon at 10:40 PM on June 20, 2013


Some 90% of the world’s data was generated in the past two years.

Yeah, but 46% of that is just redundant copies of Game of Thrones.
posted by George_Spiggott at 10:57 PM on June 20, 2013 [3 favorites]


BP, it is bad-ass technology indeed and I felt bad for going off the rails, shooting from the iPhone hip as it were earlier.

I've been pretty jaded in the past about different absurdly high-density optical / crystalline / magnetic storage miracles over the years, but I think they've been the real deal for the most part, just hard to implement with practical read/write speeds or in an affordable consumer-level fashion, as others have said. I wonder how many have been put to use in less-visible sectors; a friend of mine worked at Los Alamos and spoke of pretty amazing storage technologies that have yet to be commercial available. OR SO HE SAYS!

I distinctly remember reading an article in Wired magazine* on a flight well over a decade ago that described a recent breakthrough in blue LED and subsequently blue laser technology involving gallium compounds. This would eventually lead to higher capacity disks due to the higher-frequencies involved! And it was pretty sweet to see it come to fruition many years later with Blu-Ray. My first CD-Burner was a 4X SCSI beast and hell it felt powerful to burn 650MB and eventually 700MB at a time onto a CD-R. Up until this point I knew one person with a CD-Burner, a 1X who had it years before, and if he burned you a disc containing whatever, it meant you were in his circle of trust, as wear and tear are nothing to sneeze at and CD-Rs weren't cheap. I've probably already covered this ground before.

I wonder how quickly they'll be able to spin one of these petabyte-sized mofos and need to read and grok more, but even if it was extremely slow, I could see there being a market in all "spaces," in the cloud / datacenter as a new gigantic static-backup and archival media akin to tape and Blu-Ray today. My environment at work is all spinning-disk backup (not my doing, but let me have it anyway so I can pass it on) so I'm not certain what the kids are using these days for huge static backups though IBM says USE TAPE TOO BRO. In the consumer space, these would be great for data hoarding, backing up bulk data without relying on the cloud and even with slow speeds and caching technologies you could probably do some sort slow-trickle-time-machine streaming backups to them and perhaps have an absurdly long continuous snapshot of VM or physical computer activity if you want.

* It wasn't this 1997 article, which contains the less-than-optimistic and then, hey, optimistic! passage:

All this interest won't necessarily translate into commercial products, however. Military applications, or those with niche markets too tiny to mine, might never make it out of the lab. And even Nichia has yet to learn how to manufacture salable lasers in large volumes at reasonable cost. Researchers expect that these materials will keep them busy for years to come. "We haven't seen the full potential of these materials," says Moustakas. "The potential is much more than we can envision."

** I actually think it was this even older article...1995!
posted by lordaych at 12:05 AM on June 21, 2013 [1 favorite]


The whole universe is information. We just need to figure out the codec.
posted by stavrosthewonderchicken at 12:44 AM on June 21, 2013 [1 favorite]


This technique is actually wavelength-independent, but sharpening the results at whatever base wavelength you use. I'd expect it to be called something "overlap" related.

Ray-RayTM

The whole universe is information. We just need to figure out the codec.

e + 1 = 0
posted by lordaych at 1:14 AM on June 21, 2013 [2 favorites]


there's more bandwidth in a life, maybe even a subsentient one, than our entire digital record

Aside from the fact that you're comparing two different things (data rate vs. total data), that's… actually not true, AIUI. I don't have the numbers to hand, but you can estimate the data rate of the human sensorium, and it's large but not very large by current standards; a total lifetime's worth of sensory input would actually be easy to store. I find this faintly terrifying.
posted by hattifattener at 1:29 AM on June 21, 2013


The bandwidth of the human optic nerve bundle is of the order of ten megabits per second, and that's the highest bandwidth sensory bus in the body by some significant degree. Yes, there are two of them but they deliver substantially identical signals.

So no, no matter how you cut it, the bandwidth of a life is not only a tiny fraction of the total data there is, it's going to be comparable to the bandwidth in a television.

The world in your head is substantially synthetic.
posted by Devonian at 2:02 AM on June 21, 2013 [4 favorites]


You wouldn't download a universe, would you?
posted by blue_beetle at 5:05 AM on June 21, 2013 [4 favorites]


localroger: "What is really fun about this, if it isn't bullshit, isn't what you could do with a DVD-sized disc (which nobody would need for practical purposes unless holographic 3D TV is invented). "

Speaking of holograms...

What the fuck ever happened to that cool "holographic storage" that we were supposed to have, you know the shit I was reading about in the late 90s in Wired? Seems like that and Hydrogen vehicles have never hit it big time. :(
posted by symbioid at 6:47 AM on June 21, 2013 [1 favorite]


The Complete, Updated, and *definitive* Star Wars Saga, including all fan generated content ever created, now on Ray-Ray™.


I like the ring of that.
posted by George Lucas at 7:25 AM on June 21, 2013 [2 favorites]


The world in your head is substantially synthetic.

And yet--it's the only world any of us really know or can know. Even these various stories about the profusion of data come to us only via the (relatively) small synthetic worlds in our heads.

Analog data is still where it's at. Digital information can only ever approach near-absolute fidelity to the analog information it models... Sure, the loss of information is so negligible now that you'd never know the original analog audio or video contained additional info that's missing from the digital version, but there's still loss of information. Apropos of nothing. Just musing here.

Neat to think we might have such powerful storage media available soon. I could put all the simulated things on one disc!
posted by saulgoodman at 7:34 AM on June 21, 2013


Yeah but you can't google your basement!

Yet.
posted by Aizkolari at 7:43 AM on June 21, 2013


The bandwidth of the human optic nerve bundle is of the order of ten megabits per second, and that's the highest bandwidth sensory bus in the body by some significant degree. Yes, there are two of them but they deliver substantially identical signals.

Yeah, the bandwidth is not remotely sufficient to convey the field that the eye actually detects in full resolution; instead there's some control traffic going the other way as the brain dynamically directs the nerve to prioritize some information over other. Most of what you think you see is your brain filling in what it thinks it already knows around the stuff you're actually paying attention to. Video compression algorithms attempt to do very roughly the same thing in principle; one difference being that in the latter the complete full-resolution frame must be displayed at all times, even if some parts of it are older than others, because the TV doesn't know what part of the screen you're looking at. By contrast your brain doesn't need to 'render' anything that you're not paying attention to.
posted by George_Spiggott at 8:08 AM on June 21, 2013 [1 favorite]


So is anybody even working in how much can be stored in a mind?
posted by jfuller at 10:55 AM on June 21, 2013


"In before Blasdelb invokes phages. Derail over."

LYSIS FOR THE PHAGE GOD

AND HALF THE EARTH'S BACTERIA DAILY FOR HIS THRONE
posted by Blasdelb at 11:21 AM on June 21, 2013 [1 favorite]


All of these documents, almost all in PDF format, take up 572.4 megabytes on my laptop HD.

I have what, by current library imaging-studio standards, are pretty average resolution images (24-bit TIFFs at 600 dpi) of printed documents of a similar period to these, approximately a double-page spread of a book of the period in size. They weigh in at between 200-300 MB, each. These images are used for, among other things, content-based image recognition and analogue (via my eyeballs) comparison of typographical and other features. I could definitely use higher-quality images for either task, not just in the visible light band, but multi-spectral, too, and raking light, which is good for detecting pen-corrections, bearing type and other esoteric features of early-ish printing.

Then there are those features that current imaging technologies can rarely capture, such as watermarks, paper quality and other features that are important for forensic purposes (are you sure all your documents are authentic?). These features count as data for certain purposes - just as a legible text sounds as if it counts as all the data you need for your purposes – and fair enough. But for these more specialised purposes... well, at a certain level of granularity there isn't enough data-storage capacity in the world to store a complete digital copy of a single born-analogue document, and there won't be until we have matter-replicators figured out.
posted by GeorgeBickham at 12:23 PM on June 21, 2013


90% of data is noise anyhow!
posted by elpapacito at 2:51 PM on June 21, 2013 [1 favorite]


The bandwidth of the human optic nerve bundle is of the order of ten megabits per second, and that's the highest bandwidth sensory bus in the body by some significant degree.

Counting sensory bandwidth at the optic nerve is pretty arbitrary, like estimating the bandwidth of camera by looking at the speed of writing jpgs to the memory card.

One could back up and look at the first synapse in the retina which joins the cone photorecpetors with their downstream cells. There are roughly 5 million cones in one human retina releasing quanta of neurotransmitter at ~300 vesicles/sec. If each quanta represented a bit, that would equate to 1.5 Gbps. And that's just one sensory modality.
posted by euphorb at 8:11 PM on June 21, 2013


Information processing among animals aligns with other factors, all modeled by a power law.

Crudely put, the bigger you are, the slower you move. Small animals live quicker, but live shorter lives.

If you focus in on the optic nerve, and how quickly signals reach a processing cortex, you miss the fact that these signals are at best ready-for-action, which in smaller animals is a smaller threshold.

You've got humans doing I/O at a rate of 10 gig per day right now. The next closest species is below one.
posted by twoleftfeet at 9:00 PM on June 24, 2013




« Older Say Cheese!   |   selection and preservation that bind humans and... Newer »


This thread has been archived and is closed to new comments