A speech interface provides a natural and comfortable input modality for users with limited mobility. Speech requires little training, and is relatively high bandwidth, thus allowing for rich communication between the human and robot. The speech performance recognition systems are influenced by many aspects, including the vocabulary, acoustic and language models, speaking mode, etc. some of these aspects have to be taken into account when designing the speech interface.
Selecting a speech recognizer that performs well for the task at hand is important. It considered two open source speech recognition systems: HTK and CMU’s Sphinx to preserve flexibility in the development process. Both if these systems are speaker independent, continuous speech recognition systems, whish typically require less customization than commercial systems. It is important that the system be pre-trained on a large speech corpus such that appropriate acoustic models can be pre-computed because customization is minimal. Usually such corpora falls under one of two categories: those developed for acoustic phonetic research and those developed for very specific tasks. Since SmartWheeler is still at an early stage development, and domain specific data is not available.
A small vocabulary makes speech recognition more accurate but requires the user to learn which word or phrases are allowed. And while our recent focus is on building and validating an interaction platform for a specific set of tasks, it also want the user to be able to interact with the system in the same way, when interacting with any caregiver, and with very little prior training. Thus a fixed set of tasks is considered, but several possible commands for each task are allowed. For instance, if a user wants to drive forward two meters, possible commands include:
• ROLL TWO METERS FORWARD
• ROLL FORWARD TWO METERS
• DRIVE FORWARD TWO METERS FAST
• DRIVE FAST TWO METERS FORWARD
No comments:
Post a Comment