Back to the RECOVC home page
Speech Reconstruction Quality Demo
IBM Research, Haifa Lab.
The following is a demo of the reconstruction quality at different sampling
rates and for different Mel Frequency Cepstral Coefficient (MFCC) vector
dimensions. The MFCC is the feature vector of the IBM
ViaVoice speech recognition engine.
The reconstruction is performed using the following as an input, with
an update rate of 10 msec:
- Voicing decision (a binary flag defining voiced or unvoiced speech).
- For voiced speech, the pitch period.
- The MFCC vector (of dimension 13 or 24)
4.0 kb/s are used to compress the MFCCs when the dimension is 13, and 6.4
kb/s are used when the dimension is 24. It was shown that at these rates,
the performance of large vocabulary continuous speech recognition (LVCSR)
task in terms of recognition rates are not affected by the MFCC compression.
The pitch and voicing is always compressed at 500 b/s, giving a total of
4.5 kb/s or 6.9 kb/s.
Click on the speaker icon to hear the speech files (all in 16 bit PCM
mono WAV format). Notice the files size (in Kbye).
|
Sampling
Rate [Khz]
|
File size
[Kbyte]
|
Original signal
|
6.9 kb/s reconstruction (24 MFCCs)
|
4.5 kb/s reconstruction (13 MFCCs)
|
|
8
|
139
|
|
|
|
|
104
|
|
|
|
|
112
|
|
|
|
|
80
|
|
|
|
|
57
|
|
|
|
|
128
|
|
|
|
|
16
|
170
|
|
|
|
313
|
|
|
|
281
|
|
|
|
395
|
|
|
Back to the RECOVC home page
For More Information Contact:
Gilad Cohen
IBM Haifa Research Lab.
Multimedia and Signal Processing Dept.
Tel: +972-4-8296-428
e-mail: giladc@il.ibm.com
|