[ IBM Research ]
[ Home ] [ News ] [ Products & Services] [ Support ] [ Solutions ] [ Partners ] [ About IBM ]

Back to the RECOVC home page


Speech Reconstruction Quality Demo

IBM Research, Haifa Lab.


The following is a demo of the reconstruction quality at different sampling rates and for different Mel Frequency Cepstral Coefficient (MFCC) vector dimensions. The MFCC is the feature vector of the IBM ViaVoice speech recognition engine.

The reconstruction is performed using the following as an input, with an update rate of 10 msec:

  1. Voicing decision (a binary flag defining voiced or unvoiced speech).
  2. For voiced speech, the pitch period.
  3. The MFCC vector (of dimension 13 or 24)
4.0 kb/s are used to compress the MFCCs when the dimension is 13, and 6.4 kb/s are used when the dimension is 24. It was shown that at these rates, the performance of large vocabulary continuous speech recognition (LVCSR) task in terms of recognition rates are not affected by the MFCC compression. The pitch and voicing is always compressed at 500 b/s, giving a total of  4.5 kb/s or 6.9 kb/s. 

Click on the speaker icon to hear the speech files (all in 16 bit PCM mono WAV format). Notice the files size (in Kbye).
 
 

Sampling
Rate [Khz]
File size
[Kbyte]
Original signal
6.9 kb/s reconstruction (24 MFCCs)
4.5 kb/s reconstruction (13 MFCCs)
8
139
speech files
speech files
speech files
104
speech files
speech files
speech files
112
speech files
speech files
speech files
80
speech files
speech files
speech files
57
speech files
speech files
speech files
128
speech files
speech files
speech files
16
170
speech files
speech files
313
speech files
speech files
281
speech files
speech files
395
speech files
speech files

 

Back to the RECOVC home page


For More Information Contact:

Gilad Cohen
IBM Haifa Research Lab.
Multimedia and Signal Processing Dept.
Tel: +972-4-8296-428
e-mail: giladc@il.ibm.com
 

[ Privacy ] [ Legal ] [ Search ] [ Contact ]