|
REcognition
COmpatible
Voice
Compression
(RECOVC)
Contents:
What is RECOVC ?
The Recognition Compatible
Voice Coder (RECOVC) is a new IBM proprietary low bit rate speech compression
algorithm. It is designed to work in applications where compressed speech
is to be processed by an Automatic Speech Recognition (ASR) engine for
converting the speech into text, or processed by a Speaker Recognition/Verification
engine. Compressing speech using the low bit rate compression technologies
available today causes a degradation in the recognition rates, especially
for large vocabulary, continuous speech recognition tasks. RECOVC compresses
speech while keeping the recognition rates intact. This is possible because
of the development of a compression algorithm for the speech recognition
feature vector, MFCC (Mel-Frequency Cepstral Coefficients), in which recognition
rates are unimpaired and the development of a speech reconstruction method
from the MFCC and pitch period.
The RECOVC technology can
be used in Distributed Speech Recognition (DSR) systems as the preferred
voice that carries format between a client that captures voice and a server
that recognizes it. RECOVC technologies can be used in the following applications:
- Digital voice recorders and other Personal Digital Assistant (PDA) devices, which record voice that is later recognized.
- Accessing voice portals on the Web for information retrieval and other Interactive Voice Response (IVR) services from Internet phones, cellular phones, or other portable devices.
- Voice mail services, voice logging, and other speech database services with enhanced features such as word spotting, speaker identification, speech transcription, and other speech recognition-based techniques.
- Automatic transcription of voice messages for pagers.
- Speech recognition based services, where playback of the archived recordings is important for legal issues.
RECOVC compresses telephony bandwidth voice as well as wideband voice at bit rates in the
range 4.5 – 7 Kb/s. It
is a low complexity coder, suitable for implementations on simple fixed
point or floating - point Digital Signal Processors. RECOVC voice can be
deployed for real-time streaming over packet-based networks, such as IP
or over wireless networks.
Applications:
Standardization:
- ETSI
STQ Aurora group develops standards for Distributed Speech Recognition
(DSR) applications.
- RECOVC is a substantial
part of the new working item: "Front-end extension for speech reconstruction
and tonal language recognition".
Documents:
Speech Quality Demo:
- Click here
for an on-line demonstration of the speech reconstruction quality.
- Beware ! The files in the above demo are
WAV files (16 - bit per sample, PCM) and therefore are quite big. If you
prefer to download this demo once as a single package to your local machine,
download the following ZIP file (~3.4 Mbyte),
extract it, and play the files locally.
Related IBM Publications:
- D. Chazan, M. Zibulski, R. Hoory, and G.
Cohen, "Efficient Periodicity
Extraction Based on Sine-wave Representation and its Application to Pitch
Determination of Speech Signals", in proc. of the 7th European Conference
on Speech Communication and Technology (EUROSPEECH-2001),
Sept. 3-7, 2001, Aalborg Denmark.
- S. H. Maes, G. Cohen, R. Hoory, and D. Chazan,
"Conversational
Networking: Conversational Protocols for Transport, Coding and Control",
in proc. 6th Int. Conf. Spoken Language Processing, Beijing China, Oct.
2000 (ICSLP-2000).
- D. Chazan, G. Cohen, R. Hoory, and M. Zibulski,
"Low
Bit Rate Speech Compression for Playback in Speech Recognition Systems",
in proc. European Signal Processing Conference, EUSIPCO
2000. Also presented in the Israeli IEEE
Signal Processing Workshop in the Technion (Israel Institute of Technology),
June 2000.
- D. Chazan, G. Cohen, R. Hoory, and M. Zibulski,
"Speech Reconstruction from Mel-frequency
Cepstral Coefficients and Pitch Frequency", in proc. IEEE Int.
Conf. on Acoustics, Speech and Signal Processing, ICASSP
2000.
- G. N. Ramaswamy and P.S. Gopalakrishnan,
"Compression of Acoustic Features
for Speech Recognition in Network Environments", in proc. IEEE Int.
Conf. on Acoustics, Speech and Signal Processing, ICASSP 1998.
Related Web Sites:
For More Information Contact:
Alex Sorin
IBM Haifa Research Lab.
Multimedia and Signal
Processing Dept.
Tel: +972-4-8296-289
e-mail: sorin@il.ibm.com
|