This new demo presents LPCNet, an architecture that combines signal processing and deep learning to improve the efficiency of neural speech synthesis. Neural speech synthesis models like WaveNet have recently demonstrated impressive speech synthesis quality. Unfortunately, their computational complexity has made them hard to use in real-time, especially on phones. As was the case in the RNNoise project, one solution is to use a combination of deep learning and digital signal processing (DSP) techniques. This demo explains the motivations for LPCNet, shows what it can achieve, and explores its possible applications.
This demo presents the RNNoise project, showing how deep learning can be applied to noise suppression. The main idea is to combine classic signal processing with deep learning to create a real-time noise suppression algorithm that's small and fast. No expensive GPUs required — it runs easily on a Raspberry Pi. The result is much simpler (easier to tune) and sounds better than traditional noise suppression systems (been there!).
Opus gets another major upgrade with the release of version 1.2. This release brings quality improvements to both speech and music, while remaining fully compatible with RFC 6716. There are also optimizations, new options, as well as many bug fixes. This Opus 1.2 demo describes a few of the upgrades that users and implementers will care about the most. You can download the code from the Opus website.
Over the last three years, we have published a number of Daala technology demos. With pieces of Daala being contributed to the Alliance for Open Media's AV1 video codec, now seems like a good time to go back over the demos and see what worked, what didn't, and what changed compared to the description we made in the demos.
Here's my new contribution to the Daala demo effort. Perceptual Vector Quantization has been one of the core ideas in Daala, so it was time for me to explain how it works. The details involve lots of maths, but hopefully this demo will make the general idea clear enough. I promise that the equations in the top banner are the only ones you will see!
After more than two years of development, we have released Opus 1.1. This includes:
- new analysis code and tuning that significantly improves encoding quality, especially for variable-bitrate (VBR),
- automatic detection of speech or music to decide which encoding mode to use,
- surround with good quality at 128 kbps for 5.1 and usable down to 48 kbps, and
- speed improvements on all architectures, especially ARM, where decoding uses around 40% less CPU and encoding uses around 30% less CPU.
With the changes, stereo encoding now produces usable audio (of course, not high fidelity) down to about 40 kb/s, with surround 5.1 sounding usable down to 48-64 kb/s. Please give this release a try and report any issues on the mailing list or by joining the #opus channel on irc.freenode.net. The more testing we get, the faster we'll be able to release 1.1-final.
As usual, the code can be downloaded from: http://opus-codec.org/downloads/
We're also releasing both version 1.0.0, which is the same code as the RFC, and version 1.0.1, which is a minor update on that code (mainly with the build system). As usual, you can get those from http://opus-codec.org/
Thanks to everyone who contributed by fixing bugs, reporting issues, implementing Opus support, testing, advocating, ... It was a lot of work, but it was worth it.
Those who have been following the Opus git repository in the past few weeks probably haven't noticed much work going on. The reason is pretty simple, most of the work has been going on elsewhere in an experimental branch (exp_wip3 names for now) of my private repository. The reason it's in an experimental branch is that its not fully converted to fixed-point and hasn't been tested on any frame size other than 20 ms. Here's an (incomplete) list of changes for now:
- Really unconstrained VBR (not trying to keep the same average rate)
- Tonality detection to give highly tonal audio a boost in bit-rate
- (yet another) rewrite of the transient detection code
- New dynamic allocation code that boosts the rate of bands that have significant spectral leakage caused by short blocks
Thanks to these changes, the quality has (as far as we can tell) gone up compared to the current master branch. I invite you to judge for yourself by comparing the audio coded with the current master branch with the audio coded with the new exp_wip3 experimental branch. This is 64 kb/s, so fairly low rate for stereo music. The original is here. Let me know what you think.
My week at the IETF meeting was also my first week at my new job working for Mozilla. I've been hired specifically to work on Opus and other codec/multimedia development, so I should have a lot more time for that than I used to. First thing on my list: finishing the Ogg mapping for Opus and releasing an Ogg encoder and decoder.