IBM®
Skip to main content
    Israel [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Media Services & Technologies

Information & Media Technologies


Hebrew Speech Recognition


Back to Speech Recognition and Synthesis

IBM Haifa researchers are developing the first technology that can recognize large vocabulary continuous speech in Hebrew. While there are systems that recognize a command such as 'Call Mom' and then respond by dialing the phone number, what IBM Haifa has developed is in another league. Large vocabulary continuous speech recognition is when software recognizes natural speech with a very large, or practically unlimited vocabulary spoken in a free flowing manner.

IBM Haifa envisions the Hebrew speech recognition solution going out to many different customers in Israel. They see potential use by organizations that want to offer advanced speech recognition services, such as the financial sector (banks, insurance companies), legal sector (courthouses, lawyers) or the medical sector (hospitals). Telcos and cellular service providers could also use the technology to offer distributed speech recognition services for telephony applications. These services will help people easily dictate messages into their cellphones or personal digital assistants, rather than use the keypad to enter it letter by letter.

IBM's ViaVoice is an example of a large vocabulary continuous speech recognition product that provides features such as direct dictation, command and control allowing you to use your voice instead of typing. Currently, ViaVoice is available in over ten international languages -- but not yet in Hebrew. With the perseverance of the multimedia group at the IBM Research Labs in Haifa, the Hebrew language will join the ranks of the other languages available in ViaVoice. The IBM Hebrew speech recognition efforts started a few years ago and picked up speed this past year. IBM now has a working demo from which you can dictate into a word processing program. While the program still has limited vocabulary, the potential is clear. Haifa researchers hope the product will reach the marketplace within two to three years.

Today, large vocabulary continuous speech recognition is being used in server applications for dialog systems from which you can call and get information by speaking (natural queries). For example, you may use one of these systems to call the airport and request flight information. Using large vocabulary recognition, you can interact with a computer and describe the information you need, rather than just give answers to computer-prompted questions.

Other uses of large vocabulary speech recognition include dictation and audio indexing. Audio indexing allows us to index speech segments such as webcasts, conference calls, TV and radio segments, or lectures. Ideally the user would like to search through these audio streams similar to the way we search through text on the web. Using speech recognition techniques, the speech segments are put through an indexing process so that keywords are identified. Although spontaneous speech is not as easily recognized as dictated speech, even less accurate recognition is usually enough to identify keywords.

In some cases, such as dictation, you can teach the computer to recognize your own voice and accent. This process, known as enrollment, serves to reduce the amount of recognition errors. It usually involves reading 50 to 100 sentences aloud to the software.

"Building a speaker-independent system from which the computer can recognize anyone's speech involves much more work", explains Ron Hoory, project leader at IBM Haifa, "First, an acoustic model is built by recording and processing the speech patterns of many different individuals. Next, a language model is built by processing different texts that are added to the software's knowledge base. These must reflect the type of vocabulary and language that will be used during system operation. Lastly, a vocabulary base is built to include a list of words and their pronunciation."

The Hebrew language, as opposed to other European languages, has its own special challenges. The dots that represent the vowels in Hebrew affect the pronunciation of each word. Hebrew also includes a large number of inflections and has prepositions attached directly to words. For example, "u-keh-she-ba-oo" ("and when they came") is one word. It is almost impossible for a vocabulary base to include all possible variations of a word. The challenge for the Haifa team will be to adapt the general purpose IBM speech recognition engine to face these Hebrew-specific problems.

IBM Related Links



 
 

    About IBMPrivacyContact