[ IBM Research ]
[ Home ] [ News ] [ Products & Services] [ Support ] [ Solutions ] [ Partners ] [ About IBM ]
REcognition COmpatible Voice Compression
(RECOVC)
Contents:
What is RECOVC ?

The Recognition Compatible Voice Coder (RECOVC) is a new IBM proprietary low bit rate speech compression algorithm. It is designed to work in applications where compressed speech is to be processed by an Automatic Speech Recognition (ASR) engine for converting the speech into text, or processed by a Speaker Recognition/Verification engine. Compressing speech using the low bit rate compression technologies available today causes a degradation in the recognition rates, especially for large vocabulary, continuous speech recognition tasks. RECOVC compresses speech while keeping the recognition rates intact. This is possible because of the development of a compression algorithm for the speech recognition feature vector, MFCC (Mel-Frequency Cepstral Coefficients), in which recognition rates are unimpaired and the development of a speech reconstruction method from the MFCC and pitch period.

The RECOVC technology can be used in Distributed Speech Recognition (DSR) systems as the preferred voice that carries format between a client that captures voice and a server that recognizes it. RECOVC technologies can be used in the following applications:

  • Digital voice recorders and other Personal Digital Assistant (PDA) devices, which record voice that is later recognized.
  • Accessing voice portals on the Web for information retrieval and other Interactive Voice Response (IVR) services from Internet phones, cellular phones, or other portable devices.
  • Voice mail services, voice logging, and other speech database services with enhanced features such as word spotting, speaker identification, speech transcription, and other speech recognition-based techniques.
  • Automatic transcription of voice messages for pagers.
  • Speech recognition based services, where playback of the archived recordings is important for legal issues.
RECOVC compresses telephony bandwidth voice as well as wideband voice at bit rates in the
range 4.5 – 7 Kb/s. It is a low complexity coder, suitable for implementations on simple fixed point or floating - point Digital Signal Processors. RECOVC voice can be deployed for real-time streaming over packet-based networks, such as IP or over wireless networks.



Applications:

Standardization:
  • ETSI STQ Aurora group develops standards for Distributed Speech Recognition (DSR) applications.
  • RECOVC is a substantial part of the new working item: "Front-end extension for speech reconstruction and tonal language recognition". 


Documents:

Speech Quality Demo:
  • Click here for an on-line demonstration of the speech reconstruction quality.
  • Beware ! The files in the above demo are WAV files (16 - bit per sample, PCM) and therefore are quite big. If you prefer to download this demo once as a single package to your local machine, download the following ZIP file (~3.4 Mbyte), extract it, and play the files locally.


Related IBM Publications:

Related Web Sites:

For More Information Contact:

Alex Sorin
IBM Haifa Research Lab.
Multimedia and Signal Processing Dept.
Tel: +972-4-8296-289
e-mail: sorin@il.ibm.com
 

[ Privacy ] [ Legal ] [ Search ] [ Contact ]