We just released Opus 1.1-alpha, which includes more than one year of
development compared to the 1.0.x branch. There are quality
improvements, optimizations, bug fixes, as well as an experimental
speech/music detector for mode decisions. That being said, it's still
an alpha release, which means it can also do stupid things sometimes.
If you come across any of those, please let us know so we can fix it.
You can send an email to the mailing
list, or join us on IRC in #opus on irc.freenode.net.
The main reason for releasing this alpha is to get feedback about what
works and what does not.
Quality improvements
Most of the quality improvements come from the unconstrained
variable bitrate (VBR). In the 1.0.x encoder VBR always attempts to
meet its target bitrate. The new VBR code is free to deviate from
its target depending on how difficult the file is to encode. In addition
to boosting the rate of transients like 1.0.x goes, the new encoder also
boosts the rate of tonal signals which are harder to code for Opus. On
the other hand, for signals with a narrow stereo image, Opus can reduce
the bitrate. What this means in the end is that some files may
significantly deviate from the target. For example, someone encoding
his music collection at 64 kb/s (nominal) may find that some files end
up using as low as 48 kb/s, while others may use up to about 96 kb/s.
However, for a large enough collection, the average should be fairly
close to the target.
There are a few more ways in which the alpha improves quality. The
dynamic allocation code was improved and made more aggressive, the
transient detector was once again rewritten, and so was the tf
analysis code. A simple thing that improves quality of some files
is the new DC rejection (3-Hz high-pass) filter. DC is not supposed
to be present in audio signals, but it sometimes is and harms quality.
At last, there are many minor improvements for speech quality (both on
the SILK side and on the CELT side), including changes to the pitch
estimator.
Speech/music detector
Another big feature is automatic detection of speech and music. This is
useful for selecting the optimal encoding mode between SILK-only/hybrid
and CELT-only. Unlike what some people think, it's not as simple as
encoding all music with CELT and all speech with SILK. It also depends
on the bitrate (at very low rate, we'll use SILK for music and at high
rate, we'll use CELT for speech). Automatic detection isn't easy, but
doing so in real-time (with no look-ahead) is even harder. Because of
that the detector tends to take 1-2 seconds before reacting to transitions
and will sometimes make bad decisions. We'd be interested in knowing about
any screw ups of the algorithm.
Bandwidth detection
The new encoder can also detect the bandwidth of the input signal. This
is useful to avoid wasting bits encoding frequencies that aren't present
in the signal. While easier than speech/music detection, bandwidth detection
isn't as easy as it sounds because of aliasing, quantization and dithering.
The current algorithm should do a reasonable job, but again we'd be
interested in knowing about any failure.