Real time AI noise reduction
May 2, 2020 6:43 PM   Subscribe

Nvidia have released a beta of RTX Voice, an app which remove unwanted noise like traffic, rain or fans from your microphone input using hardware accelerated machine learning. While it is branded RTX, it is possible to run it with GTX cards.
posted by adept256 (35 comments total) 12 users marked this as a favorite
 
This is obviously great for streamers, but does it work on headphones, too? I'm probably misunderstanding how it works, but it would be cool if you could use this tech to turn any well-insulated headphones into noise-cancelling ones.
posted by Rhaomi at 6:48 PM on May 2, 2020


This is invaluable for telecommuting. It works like magic. I even tried playing the Doom Eternal soundtrack on my phone and it just cut it all out.

It's sunday morning and there's very little traffic, so I searched youtube for 'traffic noise'. The video I found had 1.6 million views, which I found surprising all on it's own.
posted by adept256 at 6:49 PM on May 2, 2020 [2 favorites]


Is there a good solution for filtering out the PC's own sound, so I can talk to someone over my computer speakers and not worry about my microphone picking it up and creating a 500ms delayed version of everything? I feel like some apps do this in-app but not reliably. I always end up switching to headphones.
posted by Nelson at 6:53 PM on May 2, 2020


How about getting stupid people from not having push-to-talk on, so I hear whatever nonsense they have playing in the background...
posted by Windopaene at 7:20 PM on May 2, 2020 [1 favorite]


How about getting stupid people from not having push-to-talk on, so I hear whatever nonsense they have playing in the background...

Most corporate users are stuck with big old crappy enterprise systems like Webex, which.... don't have push to talk. Even on Microsoft Teams it's relatively new, and a lot of companies' IT folks might not have the latest version, or have even enabled the feature.
posted by tclark at 7:25 PM on May 2, 2020 [2 favorites]


I've set this up on my PC with a GTX 1060.

It won't let you use speakers to teleconference. (please don't do that. Try a bluetooth headset if your headphones are too big)

It will filter fan noise, AC, keyboard/mouse clicks and most noise from outside of the room out of your mic feed.

It will also filter all that out of your speakers/headphones if you're sick of hearing it from other people.

It doesn't have any push-to-talk stuff. Think of it as a filter module in a synth patch.

(If using it results in audio/video sync issues, I can recommend Manycam to add appropriate delays to video/audio.)
posted by krisjohn at 7:55 PM on May 2, 2020 [2 favorites]


There's a similar service called Krisp, which I assume will soon be out of business. Ouch.
posted by gwint at 8:47 PM on May 2, 2020


I was going to say, Krisp did the same thing, but they fit this into a Chrome extension, which is amazing.
posted by metasunday at 9:00 PM on May 2, 2020


So I have recently been having this nightmare where I fart loudly in a Zoom meeting and I get the yellow box around my video. Is there a flatulence setting on this bad boy?
posted by Literaryhero at 9:05 PM on May 2, 2020 [7 favorites]


When I say "I", I obviously mean "a friend". Obviously.
posted by Literaryhero at 9:06 PM on May 2, 2020 [3 favorites]


Will it reduce dog barking on Zoom? That’s my use case need.
posted by inflatablekiwi at 9:26 PM on May 2, 2020 [2 favorites]


But how will they know I have blue switches??? :\
posted by symbioid at 9:49 PM on May 2, 2020 [10 favorites]


Jennifer Herrema could not be reached for comment.
posted by Conrad-Casserole at 9:58 PM on May 2, 2020 [4 favorites]


It will probably remove all dog-related noise. I can't test as I have a cat.
posted by krisjohn at 11:22 PM on May 2, 2020


Mute if you need to toot.

For that matter, mute always until it's time to say something.
posted by Karaage at 11:49 PM on May 2, 2020 [3 favorites]


So will this run on my GTX 950? Should I even bother trying?
posted by Fizz at 12:16 AM on May 3, 2020


For that matter, mute always until it's time to say something.

A lot of people should stay muted even then.
posted by Cardinal Fang at 12:41 AM on May 3, 2020 [4 favorites]


A lot of people should stay muted even then.

I...
posted by Max Power at 12:55 AM on May 3, 2020 [2 favorites]


I have a decent, but not insanely expensive cardioid microphone to solve this. Have had the mic for years for interviews outside. Had to do some voice over tracks, while the neighbor decided to fired up the lathing machine to do some metal work. The sound in my studio was so loud it was comical, but just for the hell of it I finished the tracks. On reviewing everything, the voice sounded good enough and almost all the sound was filtered out. Easy fix in post. (Sennheiser MD46). The mic is half the price of the cheapest RTX card i could find and I've had it for almost 10 years.
posted by ouke at 12:57 AM on May 3, 2020 [4 favorites]


If you have just taken a large, satisfying mouthful of toast - while of course being on mute - just as somebody is calling for your specific opinion for the benefit of everybody listening ...then can it fill in for you for a few seconds?
posted by rongorongo at 1:08 AM on May 3, 2020 [3 favorites]


This is obviously great for streamers, but does it work on headphones, too? I'm probably misunderstanding how it works, but it would be cool if you could use this tech to turn any well-insulated headphones into noise-cancelling ones.

In the way you're thinking, no. It can be setup to filter background noise out of any digital input and output; so say, for example, you're watching a video of a concert with background noise it would filter that output, as well as filtering out noise from your own mic recording. It's real time signal processing to remove noise, and it can process out a wide range of noise pretty effectively when I tested it on my GTX 1070 (including my screaming 4 year olds in the other room - with mum - when in a remote meeting!).

Shame my desktop camera also shows the lockdown disaster of our living room (also the twins, usually), so until Google Meet introduces background blurring, I'm stuck hiding in the bedroom with a chromebook pointed artfully at a blank wall. Thankfully, I only have a couple a week.

Your headphones could be producing a 100% noise free signal, but that doesn't stop your ears hearing the additional ambient noise in your own environment that makes it through the headphones.

Active noise cancellation headsets don't remove noise; they add it. They have a mic in the earcup that picks up low-frequency sounds in your environment, generate the same signal but 180 deg out of phase, and add it the headphone speakers, in addition to any normal input (or silence). Sound is a wave, so a perfect 'anti' signal cancels out the original, from the point of view of your ears. RTX voice can't do this.

ANC headsets generally only works well for repetitive low frequencies (below about 500Hz from memory); fan hum, engine noise, tyres on roads, distant background chatter etc. They don't work well on high, fast changing frequencies like loud or nearby voices. They will usually sound a bit flatter as the ANC can cancel out the bass frequencies, but are otherwise untouched. This is where the close fitting cushions help, as they do isolate you from that sort of noise somewhat; but any good pair of headphones will do that, no electronics required!
posted by Absolutely No You-Know-What at 2:17 AM on May 3, 2020 [3 favorites]


Active noise cancellation headsets don't remove noise; they add it.

They do. The practical effect is reducing noise. Which is what we care about.
posted by Dumsnill at 2:25 AM on May 3, 2020 [1 favorite]


ANC headsets generally only works well for repetitive low frequencies (below about 500Hz from memory); fan hum, engine noise, tyres on roads, distant background chatter etc. They don't work well on high, fast changing frequencies like loud or nearby voices.

So no fart cancellation is what you're saying...
posted by jeremias at 4:29 AM on May 3, 2020 [1 favorite]


Shame my desktop camera also shows the lockdown disaster of our living room (also the twins, usually), so until Google Meet introduces background blurring
Both Manycam and Xsplit's VCam will blur backgrounds and provide virtual camera devices for whatever app you want to use.
posted by krisjohn at 5:07 AM on May 3, 2020


Zoom disabled virtual cameras, unfortunately.
posted by Songdog at 7:09 AM on May 3, 2020


A friend of mine at this up on his computer while we were gaming together. He has a microphone, rather than an integrated headset and we can no longer hear his mechanical keyboard or the soundtrack to the game. I wonder if I can get it running on my 1060M.
posted by Hactar at 7:16 AM on May 3, 2020 [2 favorites]


While audio streaming and processing benefits from specialized hardware, it would be surprising if it really needed an advanced GPU. I don't mean to downplay how advanced the hardware is in modern phones, but almost real time voice-to-text and real time translation apps work without even a high end phone. The nn architecture is probably similar (audio that's speech vs not speech instead of speech representing x) with somewhat different goals (favor being a little over-inclusive of non-speech). A tiny delay for conference apps isn't something most would even notice, since there is often substantial network lag.
posted by a robot made out of meat at 7:32 AM on May 3, 2020


While audio streaming and processing benefits from specialized hardware, it would be surprising if it really needed an advanced GPU. I don't mean to downplay how advanced the hardware is in modern phones, but almost real time voice-to-text and real time translation apps work without even a high end phone.

Specialized hardware in low spec is often fixed function or something with DSP instructions that run in a highly optimized form or through special hardware execution paths. Most DSP work can be emulated by brute forcing it through massive FMAC performance which is what we're seeing with GPUs being purposed for doing heavy lifting of DSP style operations like image recognition and machine learning.

It's not that RTX voice requires the full grunt of an advanced GPU but that the RTX cards have a huge reservoir of typically untapped performance that is fairly power efficient. If I'm in a regular DX11 game my tensor cores are pretty much going to be idle. There will probably be a small hit to graphical performance at high GPU loads because the tensor cores will be in contention for register file space but it'll probably be in the region of <10%.

If I throw that onto a consumer Pascal GPU (like a GP104) which not only doesn't have additional tensor cores but can only execute FP16 at a 1:128 rate natively, RTX voice will quickly have to commandeer FP32 units to do emulated FP16 calculations. This takes a bigger hit out of the compute being used to assemble a frame while in game since RTX voice will not only be contending for registers, it'll also be fighting for execution resources.
posted by Your Childhood Pet Rock at 7:55 AM on May 3, 2020 [2 favorites]


Zoom disabled virtual cameras, unfortunately.

Works fine on Win10 with OBS and VirtualCam on the latest version of Zoom.
posted by CaseyB at 8:52 AM on May 3, 2020 [2 favorites]


Uh, you can totally do this on lesser hardware. I work on a similar but different and more computationally intense thing, which we can run on a good phone CPU. Model pruning is your friend.
posted by kaibutsu at 9:21 AM on May 3, 2020 [2 favorites]


So is having gotten past the alpha release stage before releasing it to the public. In its current state, RTX Voice is pretty inefficient, if Digital Foundry's benchmarking is to be believed. Supposedly, Nvidia wasn't planning on releasing this for a while yet, but the current situation compelled them to release what they had since it works well enough in many cases already.
posted by wierdo at 9:30 AM on May 3, 2020


Dammit, Apple more-or-less completely dropped NVIDIA support with Mojave, so this hackintosh enthusiast finally broke down and ordered a card from *competitor* on Thursday. Now wondering if I have enough slots for yet another GPU.
posted by aspersioncast at 9:50 AM on May 3, 2020


then can it fill in for you for a few seconds?

Just retrain the Jukebox model on your own Zoom calls. They'll be entertained by your "Open JIRA Ticket" spoken word jams, and as you climb the corporate ladder people will stop noticing
posted by RobotVoodooPower at 10:05 AM on May 3, 2020


I've been running this on my 1070-TI, and it's not just helpful, it's an absolute game-changer. The whole family got in on testing it, banging stuff in the kitchen, dog going nuts, and so on -- all of it was completely erased (by contrast, the day before setting it up, my wife started to rinse a dish while I was on a conference call and everyone said AAAaagh, are you at Waffle House?)
posted by hypersloth at 3:47 PM on May 5, 2020


I tried it on a 1060 for giggles. It did work, but people seemed to have more trouble hearing me (they said it sounded static-y). Background talking went right through (as I guess it ought). Not a scientific test.
posted by a robot made out of meat at 3:52 PM on May 7, 2020


« Older 40 days and 40 nights...   |   A proper memorial for DeFord Bailey, who changed... Newer »


This thread has been archived and is closed to new comments