RNNoise: Learning Noise Suppression
Sep. 26th, 2017 10:24 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)

This demo presents the RNNoise project, showing how deep learning can be applied to noise suppression. The main idea is to combine classic signal processing with deep learning to create a real-time noise suppression algorithm that's small and fast. No expensive GPUs required — it runs easily on a Raspberry Pi. The result is much simpler (easier to tune) and sounds better than traditional noise suppression systems (been there!).
Possible alternative uses for this algorithm ?
Date: 2017-09-29 07:49 am (UTC)Your demo got me thinking: if I want to remove something very specific from one track instead of learning a generalized filter, can I train this model with a smaller dataset, like a few seconds from that track?
Re: Possible alternative uses for this algorithm ?
Date: 2017-09-29 02:07 pm (UTC)There was a 1.6 million $ Indigogo project about Snoring noise suppression device that went bust.
https://www.indiegogo.com/projects/
Do you think you could help them?
Fab!
Date: 2017-09-29 03:34 pm (UTC)Re: Possible alternative uses for this algorithm ?
Date: 2017-09-29 03:40 pm (UTC)Re: Fab!
Date: 2017-10-02 12:27 am (UTC)One thing that strikes my ear in the samples, most obviously in the street noise one, is that the algorithm is acting more like a gate than noise removal since the horns and traffic are clearly audible still in the speech sections.
I would love to see this adapted to guitar noise suppression!
Thanks for this work.
Re: Fab!
Date: 2017-10-02 12:30 am (UTC)What about reducing noise in music?
Date: 2017-10-02 03:56 pm (UTC)Re: What about reducing noise in music?
Date: 2017-10-02 03:59 pm (UTC)Re: Fab!
Date: 2017-10-02 04:03 pm (UTC)About your "noise gate" comment, I understand why you say it sounds like that, but I can assure you it also cancels during active speech. The only reason it cannot easily cancel the horns on active speech is because it cannot use lookahead to tell the horns from speech. It's easy to do when you have a second of audio, but not if all you have is 10 ms.
Re: Possible alternative uses for this algorithm ?
Date: 2017-10-02 04:04 pm (UTC)no subject
Date: 2017-10-03 01:36 pm (UTC)Impressive Work! Really nice results too. I've been working on a similar project but to be exclusively used with Ardour so it's an lv2 plugin (https://github.com/lucianodato/noise-
Also did you evaluate using discrete wavelet transform instead of fft+bark scale? I've read in the past few works that get near zero latency with that architecture.
Thank you very much for this. It already taught me few things wasn't aware of.
no subject
Date: 2017-10-04 04:13 pm (UTC)Re: What about reducing noise in music?
Date: 2017-10-06 03:47 pm (UTC)ladspa-plugin
Date: 2017-10-07 11:44 am (UTC)Anyone any ideas how hard it would be to create a ladspa-plugin based on it?
I never done anything like this (but have basic C/C++-knowleged)…
RESIDUAL NOISE PROBLEM
Date: 2017-10-09 08:21 am (UTC)According to the audio samples you provided, it seems that the RNNoise has more residual noise than the speex. Do you think it will perform better with more noise samples for training?
no subject
Date: 2017-10-30 03:54 am (UTC)Possible Use Cases
Date: 2017-12-03 09:02 pm (UTC)I was wondering if this could be applicable to active noise cancellation as well, with a real time algorithm say from a raspberry pi or maybe a hardware implementation on a fpga. Of course, this would have to be trained on a different training set; I was thinking the Aurora II? Could it be paired with another algorithm such as beamforming to also attenuate noise not from a specific relative location?
Sorry for the plethora of questions.
Thanks.
Re: Possible Use Cases
Date: 2017-12-04 02:07 am (UTC)Real-time Algorithm
Date: 2017-12-05 02:48 pm (UTC)But how can we modify to make it real-time spectogram since you purposely delayed it by a few seconds?
Thank you sir.
Using RNNoise as VAD source / Way to improve VAD
Date: 2017-12-08 08:28 am (UTC)First of all, thanks for great work.
Recently, I needed VAD on my application and found that RNNoise has VAD output. So I tried use VAD output of RNNoise.
I commented out code not related with VAD, and made prototype of the app. It works quite nicely despite its small calculation size. But on some audio samples, VAD fails.
I'm trying to improve VAD by adding feature size and neural net size.
Can you give me some hint to improve VAD quality?
Amateur Radio
Date: 2017-12-16 08:08 am (UTC)In Amateur Radio noise suppression is always a big topic. I've added your project into a SDR receiver - if you're not into SDR see www.sdr-radio.com . Anyway it's working well when the audio is above the noise, when the noise and audio are both at the same level I can lose the audio.
This isn't a complaint - just an observation, your code isn't really designed for this situation. Now if only I could get you interested the Amateur Radio your skills would have a big impact on noise reduction.
Thanks for this project, I'll be following it closely.
Re: Amateur Radio
Date: 2017-12-16 08:09 am (UTC)Simon G4ELI
Cutting out environmental noise
Date: 2018-01-11 08:47 am (UTC)We have quite poor quality recordings so I thought your algorithm may be quite useful to remove background noise so we can more easily detect voice (which otherwise we cannot at all detect using a webrtcVAD algorithm). I'm just setting it up now. Thought I'd put this here in case you had any idea whether your algorithm would work like I hope.
Thanks a lot,
Also did you evaluate using discrete wavelet transform instead of fft+bark scale?
Date: 2018-01-17 06:49 pm (UTC)Do you have a Unique (Transform Algorhythm) that uses the "keras" or "theano" CNTK Binary libraries? say something that could be like a hybrid of the FFT & GFT, DTMF? I understand that using a GFT\bark.scale Transfrom Algorhythm alone, could be a massive difference in data\frequency band to using a FFT, as they can potentially use unlimited/Infinite Granular Wavetable band Resolution Virtual Bin data-size/processing power per noise scale or frequency bands!
Understanding there is not large amounts of people, that acknowledge/understand the difference of those two Transform Algorhythms as to begin with.
The Audio Player is awfully nice!
Date: 2018-02-16 04:34 pm (UTC)slightly off topic but how did you do that?
Re: The Audio Player is awfully nice!
Date: 2018-02-16 04:36 pm (UTC)Training Data
Date: 2018-03-12 02:14 pm (UTC)Re: Cutting out environmental noise
Date: 2018-06-07 10:37 am (UTC)Thanks
Spectral non-stationarity metric
Date: 2018-06-20 11:22 am (UTC)Re: Spectral non-stationarity metric
Date: 2018-06-24 07:53 am (UTC)Asking permission to use Javascript version of RNNoise
Date: 2018-08-02 10:18 am (UTC)Re: Asking permission to use Javascript version of RNNoise
Date: 2018-08-02 06:37 pm (UTC)Re: Asking permission to use Javascript version of RNNoise
Date: 2018-08-03 03:12 am (UTC)Re: Spectral non-stationarity metric
Date: 2018-08-27 06:39 am (UTC)Input and output data dimensions
Date: 2018-11-08 01:20 pm (UTC)Many thanks for publishing your exciting work and sharing your code.
I've two points which are not 100% clear to me after reading your documentations and code:
(1) Network training input and output data samples are finite sequences of 42- and 23-element vectors, respectively. But in the operation mode, the trained network is fed sequentially with a single input vector and outputs a single vector?
(2) Is the training data extracted from overlapping spectrogram segments?
Kind regards
Re: Input and output data dimensions
Date: 2018-11-08 02:18 pm (UTC)2) Yes, we use a frame size of 20 ms, with 10 ms overlap.
Re: Input and output data dimensions
Date: 2018-11-08 04:33 pm (UTC)Regarding 2), I think I have to specify my question:
Looking at your training code (rnnoise/training/rnn_train.py), you feed the network with sequences of 2000 42-element vectors/frames (= 1 training sample). Now I wonder if two distinct training samples might share a certain number of frames?
Re: Possible Use Cases
Date: 2019-01-02 04:43 pm (UTC)For active noise cancellation, can you suggest on the approach? Also, is there any project to your knowledge around that?
Re: Possible Use Cases
Date: 2019-01-02 04:47 pm (UTC)Negative SNR cases
Date: 2019-01-24 05:44 am (UTC)This application is working very well. But If I give -6dB/-3dB SNR case inputs, some part of speech is corrupted