CELT part 3: Pitch prediction
Dec. 24th, 2007 08:40 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Before reading this, I recommend reading part 1 and part 2. As I explained in part 1, CELT achieves really low latency by using very short MDCT windows. In the current setup, we have two 256-sample overlapping (input) MDCT windows per frame. The reason for not using a single 512-sample MDCT instead is latency (the look-ahead of the MDCT is shorter). With that setup, we get 256 output samples per frame to encode (128 per MDCT window). Now, at 44.1 kHz, it means a resolution of 172 Hz, not to mention the leakage. That's far from enough to separate female pitch harmonics, much less male ones. To the MDCT, a periodic voice signal thus looks pretty much like noise, with no clear structure that can be used to our advantage.
To work around the poor MDCT resolution, we introduce a pitch predictor. Instead of trying to extract the structure from a single (small) frame, the pitch predictor looks outside the current frame (in the past of course) for similar patterns. Pitch prediction itself is not new. Most speech codecs (and all CELP codecs, including Speex) use a pitch predictor. It usually works in the excitation domain, where we find a time offset in the past (we use the decoded signal because the original isn't available to the decoder) that looks similar to the current frame. The time offset (pitch period) is encoded, along with a gain (the prediction gain). When the signal is highly periodic (as is often the case with voice), the gain is close to 1 and the error after the prediction is small.
Unlike CELP, CELT doesn't operate in the time domain, so doing pitch prediction is a bit trickier. What we need to do is find the offset in the time domain, and then apply the MDCTs (remember we have two MDCT windows per frame) and do the rest in the frequency domain. Another complication is the fact that periodicity is generally only present at lower frequencies. For speech, the pitch harmonics tend to go down (compared to the noisy part) after about 3 kHz, with very little present past 8 kHz. Most CELP codecs only have a single gain that is applied throughout the entire frame (across all frequencies). While Speex has a 3-tap predictor that allows a small amount of control on the amount of gain as a function of frequency, it's still very basic. Working in the frequency domain on the other hand, allows a great deal of flexibility. What we do is apply the pitch prediction only up to a certain frequency (e.g. 6 kHz) and divide the rest in several (e.g. 5) bands. For the example from part 2 (corresponding to mode1 of the 0.0.1 release), we use the following bands for the pitch (different from the bands on which we normalise energy):
{0, 4, 8, 12, 20, 36}
Another particulatity of the pitch predictor in CELT (unlike any other algorithm I know of) is that the pitch prediction is computed on the normalised bands. That is we apply the energy normalisation on both the current signal (X) and the delayed (pitch prediction from the past) signal (P). Because of that, the pitch gain can never exceed unity, which is a nice property when it comes to making things stable despite transmission losses. Despite a maximum value of one in the normalised domain, the "effective value" (not normalised) can be greater than one when the energy is increasing, which is the desired effect. The pitch gain for band i is computed simply g_i = <X_i, P_i>, where <,> is the inner product and X_i is the sub-vector of X that corresponds to band i (same for P_i).
Here's what the distribution of the gains look like for each band:

It's clear from the figure above that the lower bands (lower frequencies) tend to have a much higher pitch value. Because of that, a single gain for all the bands wouldn't work very well. Once the gains are computed, they need to be encoded efficiently. Again, using naive scalar quantisation and encoding each gain separately (using 3 or 4 bits each) would be a bit wasteful. So far, I've been using a trained (non-algebraic) vector quantiser (VQ) with 32 entries, which means a total of 5 bits for all gains. The advantage of VQ for that kind of data is that it eliminates all redundancy so it tends to be more efficient. The are a few disadvantages as well. Trained VQ codebooks are not as flexible and can end up taking too much space when there are many entries (I don't think 32 entries is enough for 5 gains).
The last point to address about the pitch predictor is calculating the pitch period. We could try all delays, apply the MDCTs and compute the gains for each and at the end decide which is beat. Unfortunately, the computational cost would be huge. Instead, it's easier to do it in "open loop" just like in Speex (and many other CELP codecs). We compute the generalised cross-correlation (GCC) in the frequency domain (cheaper than computing in the time domain). The cross-spectrum (before computing the IFFT) is weighted by an approximation of the psychoacoustic masking curve just so each band contributes to the result (instead of having the lower frequencies dominate everything else).
Now the results: how much benefit does pitch prediction give? Quite a bit actually, hear for yourself. Here's the same speech sample encoded with or without pitch prediction. Even on music, which is not always periodic, pitch prediction can a bit, though not as much. I think there's potential to do better on music. There's a few leads I'd like to investigate (and again, I'm open to ideas):
To work around the poor MDCT resolution, we introduce a pitch predictor. Instead of trying to extract the structure from a single (small) frame, the pitch predictor looks outside the current frame (in the past of course) for similar patterns. Pitch prediction itself is not new. Most speech codecs (and all CELP codecs, including Speex) use a pitch predictor. It usually works in the excitation domain, where we find a time offset in the past (we use the decoded signal because the original isn't available to the decoder) that looks similar to the current frame. The time offset (pitch period) is encoded, along with a gain (the prediction gain). When the signal is highly periodic (as is often the case with voice), the gain is close to 1 and the error after the prediction is small.
Unlike CELP, CELT doesn't operate in the time domain, so doing pitch prediction is a bit trickier. What we need to do is find the offset in the time domain, and then apply the MDCTs (remember we have two MDCT windows per frame) and do the rest in the frequency domain. Another complication is the fact that periodicity is generally only present at lower frequencies. For speech, the pitch harmonics tend to go down (compared to the noisy part) after about 3 kHz, with very little present past 8 kHz. Most CELP codecs only have a single gain that is applied throughout the entire frame (across all frequencies). While Speex has a 3-tap predictor that allows a small amount of control on the amount of gain as a function of frequency, it's still very basic. Working in the frequency domain on the other hand, allows a great deal of flexibility. What we do is apply the pitch prediction only up to a certain frequency (e.g. 6 kHz) and divide the rest in several (e.g. 5) bands. For the example from part 2 (corresponding to mode1 of the 0.0.1 release), we use the following bands for the pitch (different from the bands on which we normalise energy):
{0, 4, 8, 12, 20, 36}
Another particulatity of the pitch predictor in CELT (unlike any other algorithm I know of) is that the pitch prediction is computed on the normalised bands. That is we apply the energy normalisation on both the current signal (X) and the delayed (pitch prediction from the past) signal (P). Because of that, the pitch gain can never exceed unity, which is a nice property when it comes to making things stable despite transmission losses. Despite a maximum value of one in the normalised domain, the "effective value" (not normalised) can be greater than one when the energy is increasing, which is the desired effect. The pitch gain for band i is computed simply g_i = <X_i, P_i>, where <,> is the inner product and X_i is the sub-vector of X that corresponds to band i (same for P_i).
Here's what the distribution of the gains look like for each band:
It's clear from the figure above that the lower bands (lower frequencies) tend to have a much higher pitch value. Because of that, a single gain for all the bands wouldn't work very well. Once the gains are computed, they need to be encoded efficiently. Again, using naive scalar quantisation and encoding each gain separately (using 3 or 4 bits each) would be a bit wasteful. So far, I've been using a trained (non-algebraic) vector quantiser (VQ) with 32 entries, which means a total of 5 bits for all gains. The advantage of VQ for that kind of data is that it eliminates all redundancy so it tends to be more efficient. The are a few disadvantages as well. Trained VQ codebooks are not as flexible and can end up taking too much space when there are many entries (I don't think 32 entries is enough for 5 gains).
The last point to address about the pitch predictor is calculating the pitch period. We could try all delays, apply the MDCTs and compute the gains for each and at the end decide which is beat. Unfortunately, the computational cost would be huge. Instead, it's easier to do it in "open loop" just like in Speex (and many other CELP codecs). We compute the generalised cross-correlation (GCC) in the frequency domain (cheaper than computing in the time domain). The cross-spectrum (before computing the IFFT) is weighted by an approximation of the psychoacoustic masking curve just so each band contributes to the result (instead of having the lower frequencies dominate everything else).
Now the results: how much benefit does pitch prediction give? Quite a bit actually, hear for yourself. Here's the same speech sample encoded with or without pitch prediction. Even on music, which is not always periodic, pitch prediction can a bit, though not as much. I think there's potential to do better on music. There's a few leads I'd like to investigate (and again, I'm open to ideas):
- Using two pitch periods
- Frequency-domain prediction
Sexual pictures
Date: 2017-01-02 08:52 am (UTC)http://arab.girls.tv.yopoint.in/?entry.ashlee
lolcats pulled seating borj kweli
Social pictures
Date: 2017-01-02 08:27 pm (UTC)http://hotpic.erolove.in/?post.patience
free blow job videos pictures cock shemale free cartroon porn indie boys haircuts teens anal sex
New plat
Date: 2017-01-04 10:18 pm (UTC)sex toys fetish massage therapist uniform johnny test season 2 episode 1
http://sissy.adultnet.in/?blog.journey
horseshoe tattoo crossdressing services pussy black fucking sisszy panty girdle nose jobs uk products for sex boob forms plus size clothing for women
New install
Date: 2017-01-06 04:28 pm (UTC)ntv mobile app free download play store store descargar perfect poker tablet google ploay android aps development companies
http://sex.games.android.porndairy.in/?stage.felicia
android app ranking android top application free wallpaper for cell phones sexy video sexy movie android messenger download
Recent spot
Date: 2017-01-06 05:34 pm (UTC)13 yr old xxx symptoms of a first period sexy cheap dresses
http://feminisation.xblog.in/?post.mariana
interesting facts about south africa erotic men women xhamster old guy urethral dilation catheter gay pnor a dick ring pink collar bdsm south african online stores
Mod Project
Date: 2017-01-11 07:33 am (UTC)http://arab.sexy.girls.twiclub.in/?entry-marissa
inequality maniac died naat courses
New spot
Date: 2017-01-12 05:38 am (UTC)adultmovie android app gratis apps for a tablet top rating apps free sex chatting apps
http://apps.android.telrock.org/?mail.evelyn
android developers training games free apps great wallpaper download get google play apps for free download app stores
Mature placement
Date: 2017-01-14 03:13 pm (UTC)http://sissyblog.twiclub.in/?leaf.melany
most erotci films erotic music erotic novel online erotic urdu stories erotic horror movie
Приглашаю Вас на Отличный Lineage HF сервер
Date: 2017-01-14 11:32 pm (UTC)Приглашаю Вас на Отличный Lineage 2 HF сервак
Сервак приглянется тем кто уважает долгую игру с замыслом на абсолютное доминирование.
Вероятно не понравится предпочитающим набежать и всех победить.
Пробегающим мимо любителям попрыгать по серверам переоткрывашкам, ловить нечего, т.к. старики их быстренько накажут :)
Адрес http://l2immortal.com
Recent plat
Date: 2017-01-16 01:11 am (UTC)http://arab.aunties.porndairy.in/?post.abby
inequality qaddafi turban deceased ordinance
Фотомолоденькие.рф
Date: 2017-01-16 12:07 pm (UTC)My unfamiliar website
Date: 2017-01-18 04:25 pm (UTC)http://sunni.muslim.purplesphere.in/?entry.anais
converting end networks komuniti how
"The road will overcome walking." I wish you never will stop and be creative - forever!
Date: 2017-01-19 07:31 am (UTC)[url=http://sapphire2.org/viewtopic.php?f=7&t=62851] Huge human spasibochki!
[/url]
http://163.15.202.98/FOP/QQ/QQ_14/modules/newbb/newtopic.php?forum=2 sdfdf242345sdfd
Latest spot
Date: 2017-01-19 12:16 pm (UTC)drawn orn gallery mature young womesn clothing stores in usa
http://sissythings.pornpost.in/?post.summer
surgical aesthetics old men portraits first menstruation how do i keep my husband happy les sex toys black girls videos porn wedge mary jane shoes top feminist books
Unshackle galleries
Date: 2017-01-20 10:54 am (UTC)http://arab.girls.tv.yopoint.in/?post-makenzie
emery dad ringtones tonight hindus
Pictures from venereal networks
Date: 2017-01-20 01:31 pm (UTC)http://sexypic.erolove.in/?post.nataly
hindi sexy story in hindi language mrozowska monika playboy 2008 rapidshare free chubby porn downloads is there a test for men for hpv manga studio megauploass
Latest site
Date: 2017-01-21 05:24 am (UTC)http://kitty.board.blognet.pw/?post-alexa
top 10 porn alltime search websites milf teen fuck porn pyjamjas porn legend of dragoon porn christina millian porn
My new website
Date: 2017-01-22 05:47 am (UTC)programs android apk apps download make mobile apps online mobile phone provider instalar apk para android
http://apps.android.telrock.org/?diagram.victoria
atualizacao google play store google play apk latest version apk game downloads all phones of htc top games for download
Adult galleries
Date: 2017-01-23 04:39 am (UTC)http://boobs.pics.erolove.top/?entry-aniyah
drung porn free tna knockout porn best free porn and pussy free milf streaming mobile porn olivia teen porn
Бесплатное продвижение товаров и услуг Вконтакте
Date: 2017-01-23 10:39 am (UTC)[b][i]Что может сервис обмена?[/i][/b]
[u]Увеличить количество лайков на любую запись;
Увеличить количество подписчиков паблика или группы;
Увеличить количество друзей;
Накрутить опросы;
Увеличить количество нужных комментариев;
Увеличить количество репостов любого поста.[/u]
[b][url=https://goo.gl/wLtL1X]Сервис бесплатного продвижения Вконтакте[/url] [/b]
Сервис полностью бесплатен и безопасен, работает на официальном API социальной сети Вконтакте. Никаких логинов и паролей указывать от своей страницы не нужно.
Рефка на сайт: [url=https://goo.gl/wLtL1X]https://goo.gl/wLtL1X[/url] .
Не рефка сайта: [url=https://goo.gl/8XEk4R]https://goo.gl/8XEk4R[/url] .
My unfamiliar website
Date: 2017-01-24 03:56 am (UTC)Ш§Щ†ЫЊЩ…ЫЊШґЩ† ЩѕЩ€Ш±Щ† gowns night ШіЪ©Ші Ъ©Ш§Ш±ШЄЩ€Щ†ЫЊ Ш±Ш§ЫЊЪЇШ§Щ†
http://feminisation.xblog.in/?post.jazmine
jobs in porn industry male ahir remover office feminization stories spanish english english spanish dictionary oxford dictionary us transsexual news custom t shirts design fashion heels 2014
Grown up galleries
Date: 2017-01-24 04:05 am (UTC)http://muslim.clit.pornpost.in/?post.kennedy
hakeem uganda conflicts gaza korea
Experimental Project
Date: 2017-01-24 07:43 am (UTC)http://single.dating.eblog.space/?post.mariana
chinese dating uk married dating toronto free dating 100 percent free dating sites for 50 and over meet people
Unencumbered galleries
Date: 2017-01-24 03:38 pm (UTC)http://asslick.photo.erolove.in/?post.alexandrea
sexy kerala teens nude asain females pics sex noveller japanese mom having sex on train viviane araujo hardcore
Acquisition bargain penny-pinching generics without preparation
Date: 2017-01-25 01:04 am (UTC)[url=http://pharmshop-online.com]generic cialis[/url] cialis kaufen
generic cialis (http://pharmshop-online.com) - cialis vs cialis vs levitra reviews guestbook.php?pg=
5 mg cialis tablets