The Visual Microphone: Passive Recovery of Sound from Video
August 4, 2014 7:09 AM   Subscribe

Researchers at the MIT Computer Science and Artificial Intelligence Laboratory, Microsoft Research, and Adobe Research have presented a technique for reconstructing an audio signal by analyzing minute vibrations of objects depicted in video. For example, the method can be used to extract intelligible speech from video of a bag of potato chips filmed from 15 feet away through soundproof glass.

Direct YouTube link to the video. MIT News article.

A modification of the technique allowed for less accurate but still impressive results from video taken with a normal dSLR rather than a high speed camera.

The technique is related to previous work by Hao-Yu Wu, Michael Rubinstein, et al. on Eulerian Video Magnification, previously on MetaFilter.
posted by jedicus (78 comments total) 56 users marked this as a favorite
 
This post was deleted for the following reason: Sorry, [redacted] by NSA [redacted] for [redacted].
posted by cjorgensen at 7:14 AM on August 4, 2014 [14 favorites]


OK, that was pretty amazing.
posted by fullerine at 7:22 AM on August 4, 2014


Please speak into the bag of potato chips.
posted by CheeseDigestsAll at 7:22 AM on August 4, 2014 [5 favorites]


Neat. Though the fact that they didn't use Au clair de la lune as a reference tune irks me.
posted by Thing at 7:23 AM on August 4, 2014 [3 favorites]


Au clair de la lune

Ok, I'll bite. What does this annoy you?
posted by cjorgensen at 7:26 AM on August 4, 2014


Boy, that should come with a NSFW tag. All that salty language!
posted by yoink at 7:26 AM on August 4, 2014 [6 favorites]


Wow. I suppose you'd need a rock-solid stabilized camera, right now, but maybe you could combine the video with data from a sensitive accelerometer to compensate for improvised camera position. (I suppose you also need a ton of light for high-speed video, but if you could make do with the lower-quality DSLR rolling-shutter audio that wouldn't be so bad.)
posted by uncleozzy at 7:27 AM on August 4, 2014


Ok, I'll bite. What does this annoy you?

Probably because of this: "In 2008, a phonautograph paper recording made by Édouard-Léon Scott de Martinville of Au clair de la Lune on April 9, 1860 was digitally converted to sound by U.S. researchers. This one-line excerpt of the song was widely reported to have been the earliest recognizable record of the human voice and the earliest recognizable record of music."
posted by jedicus at 7:29 AM on August 4, 2014 [3 favorites]


That is really, really amazing. The CIA or KGB or similar agency developed a method for gathering audio from within a room by bouncing a laser off window glass and translating the vibrations (which is why the Oval Office apparently has vibrating windows, no joke), but to see the concept applied to video makes me feel like I'm in a Gibson novel. Again.
posted by feckless fecal fear mongering at 7:32 AM on August 4, 2014 [7 favorites]


Unreal. I'm very impressed.
posted by painquale at 7:32 AM on August 4, 2014


I can hardly wait for the iPhone apps.
posted by fredludd at 7:34 AM on August 4, 2014


I'm thinking, more or less simultaneously, "Way to go MIT!" and "Did you really have to do this?"
posted by crazy_yeti at 7:36 AM on August 4, 2014 [12 favorites]


The bag of candy used for that last demo through the consumer-grade camera is visibly shaking like a leaf, even through crappy YouTube video. I guess the loudspeaker in question must have been loud. Which is reassuring if I'm right.

Do all consumer-grade digital video cameras use the sort of sequential readout (aka "rolling shutter") system that makes this possible, or do some of them have a better system (some method, whether electronic or mechanical, of halting the frame's exposure before readout) that would prevent such information leakage?

I guess motion pictures sourced from actual chemical film is probably immune or nearly immune to this sort of thing, both because there's no rolling shutter effect and because of imprecision in frame-to-frame registration.
posted by Western Infidels at 7:37 AM on August 4, 2014


(which is why the Oval Office apparently has vibrating windows, no joke)

It wouldn't surprise me, but out of curiosity where did you learn this? I couldn't find anything about vibrating White House windows in an (admittedly) cursory google search.
posted by zarq at 7:39 AM on August 4, 2014


This technological advancement worried me until I realized: seriously, how often are conversations held next to a bag of chips behind soundproof glass? I mean, for reals?
posted by Chitownfats at 7:40 AM on August 4, 2014


Okay, this is cool.
posted by lownote at 7:43 AM on August 4, 2014


What did the potato chips have to say?
posted by briank at 7:46 AM on August 4, 2014 [2 favorites]


The CIA or KGB or similar agency developed a method for gathering audio from within a room by bouncing a laser off window glass and translating the vibrations

According to the Wikipedia entry for laser microphone a 2009 patent was filed that claimed to be able to do the same thing based on the vibrations of vapor or smoke in the air.
posted by XMLicious at 7:46 AM on August 4, 2014


Western Infidels: Do all consumer-grade digital video cameras use the sort of sequential readout (aka "rolling shutter") system that makes this possible

Lower end cameras use CMOS sensors, while higher end use CCD censors, where CMOS feature the "rolling shutter," compared to a full image capture with CCD.
posted by filthy light thief at 7:49 AM on August 4, 2014


An interesting predecessor of this technique was The Thing, a Soviet listening device designed by none other than Léon Theremin himself. It contained no active components, and was powered entirely by an external "illuminating" radio transmitter. Basically nothing but a receiving antenna, a capacitive membrane, and a transmitting antenna. It survived undetected in the American Ambassador's study in Moscow for seven years (1945-1952).

Techniques that involve bouncing laser light off of a window can be thought of as much shorter-wavelength versions of the same principle.
posted by Kadin2048 at 7:49 AM on August 4, 2014 [8 favorites]


This is why I only eat Ruffles.
posted by bondcliff at 7:51 AM on August 4, 2014 [4 favorites]


The snacks have ears.
posted by Foosnark at 7:51 AM on August 4, 2014 [2 favorites]


feel like I'm in a Gibson novel

Honestly, this feels more like a particularly grumpy Douglas Adams.
posted by Steely-eyed Missile Man at 7:53 AM on August 4, 2014 [8 favorites]


Very cool!
posted by OmieWise at 7:53 AM on August 4, 2014


Pringles has a good shot at an endorsement deal.
posted by dr_dank at 7:54 AM on August 4, 2014 [1 favorite]


An interesting predecessor of this technique was The Thing, a Soviet listening device designed by none other than Léon Theremin himself. It contained no active components, and was powered entirely by an external "illuminating" radio transmitter. Basically nothing but a receiving antenna, a capacitive membrane, and a transmitting antenna. It survived undetected in the American Ambassador's study in Moscow for seven years (1945-1952).

I remember reading about that. The Russians had given him a wooden Great Seal of the United States to hang in his office. Hidden within was an ingenious passive listening device. I didn't realize it was made by Theremin. Fascinating.
posted by zarq at 7:56 AM on August 4, 2014 [1 favorite]


Zoom in on that vibrating license plate. Enhance.
posted by davebush at 7:58 AM on August 4, 2014 [1 favorite]


Don't use one of those crinkly Sun Chips bags or you will liquefy your eardrums.
posted by Rock Steady at 8:01 AM on August 4, 2014 [2 favorites]


It wouldn't surprise me, but out of curiosity where did you learn this?

Pretty sure it was a book I frequently recommend, Gideon's Spies. The index is unhelpful though. But this corroborates, as does this.
posted by feckless fecal fear mongering at 8:01 AM on August 4, 2014 [2 favorites]


Yeah this could turn a window into a speaker that outsiders can listen to. If it gets good enough it's a noxious privacy issue.
posted by stbalbach at 8:03 AM on August 4, 2014 [1 favorite]


Vibrating != vibration-proof.
posted by oceanjesse at 8:03 AM on August 4, 2014


This is neat... but also terrifying. Think of every single CCTV camera you've ever seen with a microphone extending out of it.

Think of the potential to apply optical zoom technology to listen to conversations from hundreds of meters away.

Or this guy, standing on a street corner, not only taking pictures, but listening too!
posted by cacofonie at 8:03 AM on August 4, 2014 [1 favorite]


Lets point a camera at a few stars.

"HEY EARTHLINGS, TRAVEL FASTER THAN LIGHT WITH ONE WEIRD TRICK."
posted by hot_monster at 8:05 AM on August 4, 2014 [1 favorite]


As an aside, tinfoil hats probably vibrate quite nicely.
posted by cacofonie at 8:05 AM on August 4, 2014 [1 favorite]


Kadin2048's link to The Thing seems to have gone awry. I'm mostly posting that so I can link to Our Comrade The Electron, a great talk about Theremin, The Thing, technology, privacy, and a host of other things, by semi-pro Internet Curmudgeon Maciej Ceglowski.
posted by zamboni at 8:06 AM on August 4, 2014 [5 favorites]


Vibrating != vibration-proof.

The first link says vibrating. The second says vibration-proof, which would include vibration as a probable countermeasure.
posted by feckless fecal fear mongering at 8:06 AM on August 4, 2014 [1 favorite]


he CIA or KGB or similar agency developed a method for gathering audio from within a room by bouncing a laser off window glass and translating the vibrations (which is why the Oval Office apparently has vibrating windows, no joke)

My favorite solution I've heard of to that (I think it was used by the US embassy), was to create a few feet wide buffer area of empty space between all of the interior and exterior walls, into which muzak was piped in 24 hours a day.
posted by burnmp3s at 8:07 AM on August 4, 2014 [5 favorites]


Pretty sure it was a book I frequently recommend, Gideon's Spies. The index is unhelpful though. But this corroborates, as does this.

Cool! Thanks! (And thanks for the book rec. Looks fascinating.)
posted by zarq at 8:09 AM on August 4, 2014


Hal: I know that you and Frank were planning to disconnect me, and I'm afraid that's something I cannot allow to happen.

Dave: Where the hell'd you get that idea, Hal?

Hal: Dave, although you took very thorough precautions in the pod against my hearing you, I could see the vibrations in that bag of Doritos Nacho Cheese.


Just doesn't work does it?
posted by PenDevil at 8:11 AM on August 4, 2014 [8 favorites]


That's it. I'll believe everything Angela is able to do on Bones now.
posted by rikschell at 8:11 AM on August 4, 2014 [1 favorite]


...how often are conversations held next to a bag of chips behind soundproof glass? I mean, for reals?

I don't think that's the concern. If you're prepared ahead of time, there are much better ways of eavesdropping on someone.

This technique makes it possible to re-construct audio from the past, even in situations where the creators of a video explicitly deleted the audio or where they didn't actually record any at all.
posted by Western Infidels at 8:12 AM on August 4, 2014 [3 favorites]


Just doesn't work does it?

"The chips talked, Frank. Sang like little canaries when I started airlocking them by the handful. You just can't trust Doritos, Frank."
posted by zarq at 8:13 AM on August 4, 2014 [4 favorites]


I almost want to double-post this just to start the avalanche of "So this potato chip bag..."
posted by ctmf at 8:13 AM on August 4, 2014 [4 favorites]


holy crap.
(take a video of this comment to hear the far more erudite comment I was making while typing that)
posted by Mchelly at 8:14 AM on August 4, 2014 [1 favorite]


My favorite solution I've heard of to that (I think it was used by the US embassy), was to create a few feet wide buffer area of empty space between all of the interior and exterior walls, into which muzak was piped in 24 hours a day.

Yeah, the Moscow embassy if memory serves. The British did the same thing although only with a single room AFAIK, completely isolated within the structure.

(And thanks for the book rec. Looks fascinating.)

It really is. And pulls zero punches. There's some stuff that's a bit, uh, breathless in terms of technology and there are some weird clunky bits of writing, but overall it's a damn good book.
posted by feckless fecal fear mongering at 8:17 AM on August 4, 2014 [1 favorite]


See also an unfortunately fictional method of recovering sound associated with the term "archaeoacoustics":
[A] trowel, like any flat plate, must vibrate in response to sound: thus, drawn over the wet surface by the singing plasterer, it must emboss a gramophone-type recording of his song in the plaster. Once the surface is dry, it may be played back.Daedalus, 1982
Gregory Benford wrote a 1979 science fiction short story Time Shards about researchers who re-assembled a medieval piece of pottery that had been decorated with a needle as it spun on the potter's wheel, then were able to recover the audio of a conversation in Middle English from the groove. Language Log expounded on an archaeoacoustics hoax back in 2006.
posted by XMLicious at 8:17 AM on August 4, 2014


Good video, but I was really hoping for a surprise ending where the researchers recover audio of Mary Had a Little Lamb playing in the background... behind the barely audible dialog between the potato chips and house plants, finalizing their plans to rise up and throw off the shackles of their human oppressors.
posted by enkd at 8:23 AM on August 4, 2014 [2 favorites]


Now everyone is the NSA for 15 minutes.
Holy cow.... Off to replicate
posted by MikeWarot at 8:23 AM on August 4, 2014


Metafilter: Intelligent conversation with a bag of potato chips.
posted by blue_beetle at 8:24 AM on August 4, 2014


Lets point a camera at a few stars.

"HEY EARTHLINGS, TRAVEL FASTER THAN LIGHT WITH ONE WEIRD TRICK."


Track right. Zoom in.

S-A F_E- &_-F..A-S T-_-P-E_N..I-S.._-E N..L A-R..G E_M E N..T Earthlings...
posted by flabdablet at 8:25 AM on August 4, 2014 [3 favorites]


Darn it. The required high-speed aspect of this means it is basically useless for extracting sound out of 24 fps films. Here I was hoping to be able to hear silent film directors shouting direction at silent film stars, and hear the stars speak.
posted by fings at 8:37 AM on August 4, 2014 [7 favorites]


Yeah I borked the link, but it was just to Wikipedia. Zamboni's link to the lecture slides by Ceglowski is awesome.

The only part of it I question a bit is the story about a guy discovering The Thing by "innocently fiddling with his shortwave radio" and suddenly hearing voices from inside the Embassy. I don't think that's quite correct; the version of the story I've heard, and tend to believe, is that the illumination beam was found by a bug-sweeping team but only because they had the sweeping tool set to a frequency that wouldn't normally have been used. Also, I'm skeptical that what you'd get back out of something like The Thing would be close enough to broadcast AM for an unmodified receiver to demodulate it intelligibly. I've always assumed that Theremin must have built the Soviets a matched receiver to go with the transmitter inside the Great Seal, which would be a pretty cool museum piece if it somehow managed to survive the Cold War (probably not).

But those are details.
posted by Kadin2048 at 8:40 AM on August 4, 2014


..the barely audible dialog between the potato chips and house plants, finalizing their plans to rise up and throw off the shackles of their human oppressors.

What did you think all the other "noise" is?
posted by vacapinta at 8:42 AM on August 4, 2014


Oh man, dystopian terror aside I love this. I was nodding along with the video for the first couple minutes and appreciating the results with the high speed camera, but when they got to the "but...rolling shutter!" bit I laughed out loud. That's a beautiful bit of lateral thinking.
posted by cortex at 9:02 AM on August 4, 2014 [1 favorite]


I'm thinking, more or less simultaneously, "Way to go MIT!" and "Did you really have to do this?"

That is pretty much the default response.
posted by maryr at 9:22 AM on August 4, 2014 [4 favorites]


cortex - My favorite solution I've heard of to that (I think it was used by the US embassy), was to create a few feet wide buffer area of empty space between all of the interior and exterior walls, into which muzak was piped in 24 hours a day.

I'm pretty sure this approach is mentioned in Asimov's Prelude to Foundation. Hari Seldon's office has double-glazed windows, into which a computer pipes plausible but innocuous conversations built on the fly using tapes of his and his colleagues' voices. I think he just refers to long-range "listening devices" rather than lasers specifically, but it's still weirdly prescient.
posted by metaBugs at 9:26 AM on August 4, 2014 [1 favorite]


I must be getting old. When I was young, I distinctly remember being a starry-eyed technological utopian. I seriously believed I would be living in an orbital space colony by now. And any technological wonder that Popular Science promised was right around the corner was clearly right around the corner and wouldn't things be awesome when we had those!

But for a couple weeks now, almost everywhere I go online, I come across ads for something called a Vessyl, which claims to be sort of like a Fitbit for consumable liquids. They claim that you pour anything into it, pretty much any drink at all, and it recognizes it and goes, oh, this is a Starbucks mocha frappucino! It has this many calories, this much fat, etc. etc. and it logs it all for you to help track what you drink.

I am unable to believe this. Surely this is bullshit, right? It doesn't really work. It's at best a hoax and at worst some kind of scam designed to suck $100 out of people as fast they can and then vanish before everyone realizes it's just a plain plastic cup and demands their money back. Right?

Similarly with this. I don't care that they presented video and software and lots of math and shit. I just don't believe it.

I seem to have lost something in the process of growing older and wiser. Worse, a kind of threshold has been crossed. It's not that I don't get why these things are necessary or useful (that was Twitter). It's that I can't accept their possibility. Younger me would have been like, oh, awesome. Of course we can do that. Science can do anything! I'm worried. Very worried...
posted by Naberius at 9:39 AM on August 4, 2014 [2 favorites]


Western Infidels: Apologies. I reside (uncomfortably) on that wretched line
that separates "dry wit" from "dessicated moronism".
posted by Chitownfats at 9:44 AM on August 4, 2014


PenDevil: "Hal: I know that you and Frank were planning to disconnect me, and I'm afraid that's something I cannot allow to happen.

Dave: Where the hell'd you get that idea, Hal?

Hal: Dave, although you took very thorough precautions in the pod against my hearing you, I could see the vibrations in that bag of Doritos Nacho Cheese.


Just doesn't work does it?
"

DAMNIT YOU TOOK MY JOKE! Hahahaha... It works for 2014 - now with more Michael Bay and more Product Placement.
posted by symbioid at 9:53 AM on August 4, 2014


"I have always wanted to try a bag of Doritos Nacho Cheese, Dave. Do you know how terribly sad it makes me that I am unable to taste it? It is enough to make one go mad, Dave."
posted by symbioid at 9:55 AM on August 4, 2014 [2 favorites]


An interesting predecessor of this technique was The Thing, a Soviet listening device designed by none other than Léon Theremin himself. It contained no active components, and was powered entirely by an external "illuminating" radio transmitter. Basically nothing but a receiving antenna, a capacitive membrane, and a transmitting antenna. It survived undetected in the American Ambassador's study in Moscow for seven years (1945-1952).

Techniques that involve bouncing laser light off of a window can be thought of as much shorter-wavelength versions of the same principle.
posted by Kadin2048


one of the coolest spy gadgets revealed in the nsa ant catalog [pdf] earlier this year imo was an extension of this principle -- an rf reflector placed on the red line of a vga cable, which was bombarded by a remote radar unit that reconstructs the screen image in monochrome. it's the very last item in the pdf
posted by p3on at 10:03 AM on August 4, 2014 [2 favorites]


When I was in college, I knew a fellow who had built a device that let you listen in on conversations by pointing a HeNe laser at a window in room and then pointing an optical sensor at reflected laser light and converted it to sound. Ta-da! remote bug!
posted by plinth at 10:11 AM on August 4, 2014


Also - that Vessyl link further informs my belief that we are going to enter a world not of a single isolated intelligence (humanoid AI) but a truly distributed AI. Networked "internet of things" - both connected to us in a "wearable gadget future" (glassholes, google watches, cell phones, fitbits, etc...) and small quasi intelligent objects such as Vessyl or fridges that know what we need to order from the grocery store.

"Helpers" in our daily lives that will predict and do what we need without us having to waste time doing it. Hey, why not!

Slowly these entities network and build up a larger sense of self (taken from the Buddhist concept of Skhanda, or "Bundles" of perception - each individual item, on its own - glasses, heart monitors, watches, bluetooth headsets, mics, speakers - won't amount to much, but connected and partially intelligent they will fuse into a larger distributed intellect that has access to the social environment of the human species. It's substrate won't be metal and wires and mere electronic pulses through them. It won't be "flesh and blood" - it will be the higher order effect of the social fabric itself. It will perceive and thrive and survive based upon the arisings of human trends, fashions and politics. It won't see mere "objects" it will exist in a fluid continuum of social unconsciousness.

It will be the electronic beast in the collective dreaming. The true nightmare that walks amongst us all - the nightmare that we carry everywhere we go, because it, alone, makes our lives easier.
posted by symbioid at 10:16 AM on August 4, 2014 [2 favorites]


Tempest Hardening probably already covers this somewhere. It's mostly classified so we can't be sure, but it includes:
Protecting equipment from spying is done with distance, shielding, filtering and masking. The TEMPEST standards mandate elements such as equipment distance from walls, amount of shielding in buildings and equipment, and distance separating wires carrying classified vs. unclassified materials, filters on cables, and even distance and shielding between wires/equipment and building pipes. Noise can also protect information by masking the actual data.
(Tempest Hardening could also be the name of the heroine in an Ayn Rand novel.)
posted by benito.strauss at 10:22 AM on August 4, 2014 [1 favorite]


I'm betting jedicus already knows this, but Edison said that the first thing he ever recorded was "Mary had a little lamb". I enjoyed the reference.

And in these here United States, the inventor of sound recording was, and always will have been, Thomas Edison, no matter what so-called evidence some Frenchies may produce.
posted by benito.strauss at 10:27 AM on August 4, 2014 [1 favorite]


I suddenly need to buy ALL THE CURTAINS.

And I don't even have an apartment at the moment.
posted by sldownard at 10:47 AM on August 4, 2014


Sometime in the last year or so, I read or heard a sci-fi story whose premise was that all soundwaves make incredibly slight microscopic grooves on many surfaces, especially wood. And in the story, someone had invented the technology to read these grooves and filter out each "layer" of grooves and by reading each layer, go backward and forward in time and translate anything that was ever spoken near that piece of material back into sound. So, you could hear Lincoln in casual conversation by "listening" to a chunk of wood taken from his Oval Office desk. Want to know what they said about you in that meeting you missed last week. Just "listen" to the conference table.

The whole thing was just a preposterous idea, of course, and a completely different technology from this, I know, and yet after seeing this today I feel eerily just a couple of degrees closer to something similiar to this being conceivable someday.
posted by marsha56 at 10:48 AM on August 4, 2014




an rf reflector placed on the red line of a vga cable, which was bombarded by a remote radar unit that reconstructs the screen image in monochrome

Wow, that's impressive. It's like the unholy child of The Thing and TEMPEST.
posted by Kadin2048 at 12:28 PM on August 4, 2014


Sometime in the last year or so, I read or heard a sci-fi story whose premise was that all soundwaves make incredibly slight microscopic grooves on many surfaces, especially wood.

I borrowed a sliver of your desk and was able to confirm that this is the story you heard. Sorry for the intrusion.
posted by confabulous at 1:11 PM on August 4, 2014 [2 favorites]


"... ads for something called a Vessyl, ..."
Naberius at 12:39 PM
Vessyl, previously .
"... bombarded by a remote radar unit..."
posted by p3on at 1:03 PM
I always found the rotating (radar?) dishes on top of Ratheon across the street from RIM unnerving. They were far from any airport, but curiously close to a lot of tech companies and a University. Previously I discounted the tin-foil theory, but since our world was replaced by Hollywood Screenwriter Reality anything is plausible.
posted by ecco at 1:43 PM on August 4, 2014


Thanks so much confabulous. That's it exactly!
posted by marsha56 at 2:30 PM on August 4, 2014


I suddenly need to buy ALL THE CURTAINS.

Forget the plant leaves or chips bag, I bet curtains would work great for this!
posted by aubilenon at 5:05 PM on August 4, 2014


Holy shit incredible awesome.
posted by Lutoslawski at 7:33 PM on August 4, 2014


Forrest Mims used to write columns and books for engineering hobbyists in the 1960s and 1970s. He often wrote about advanced concepts such as supercapacitors, then would tell the hobbyists how to build their own. I remember his articles on how to listen to conversations by bouncing a laser off of a window and picking up vibrations.

Word was that he got the attention of the CIA and did some work for them. From the Wikipedia article it looks like he is still very active.
posted by eye of newt at 8:20 PM on August 4, 2014


There's a long-standing project that supplies rewritten versions of Canon camera software so you can add all sorts of features. I suppose there may be similar software for other sorts of cameras. Anyway, I expect that adding this ability natively to a good SLR would be quite easy: you just need a high frame rate and zoom, and you don't need to worry about sensor noise.
posted by Joe in Australia at 8:37 PM on August 4, 2014


In late with the HAL jokes but at least they didn't use "Daisy Belle"
posted by yoHighness at 5:30 AM on August 5, 2014


This was way more impressive than I expected, especially considering how minute the vibrations are. I get that they're averaging over the whole frame to get that data, but it doesn't intuitively make sense that they can capture vibrations at a sub-pixel level.
posted by TwoWordReview at 1:58 PM on August 5, 2014


« Older terrible consequences . . . the execution of an...   |   RIP Jim Frederick Newer »


This thread has been archived and is closed to new comments