A series of preliminary tests using both Sphinx-4 and HTK, three male and one female student in their early twenties recorded 179 – 183 commands each. Subjects were presented with a command script, corresponding to each task, and instructed to read them in order.

The results in term of substitutions (when the speech recognizer fails to recognize a word and substitutes another incorrect word), insertions (words that were not spoken but added by the recognizer), deletions (words that were spoken but missed by the recognizer), word error rate (proportion of recognized incorrectly words including substitutions, insertions and deletions), and sentence error rate (proportion of sentences in which one or more words are recognized incorrectly).

Both speech recognition packages showed equivalent performance over the board. The mean sentence error rate was 46.7% when using HTK and 45.2% when using Sphinx-4, while the mean word error rate was 16.6% for HTK and 16.1% with Sphinx-4. this analysis suggests the performance of the two speech recognition systems is equal.

While the error rates are quite high, upon closer analysis the situation is not as discouraging as it would seem. Observed errors can be classified into one of three types, design errors, semantic errors, and Syntax errors.

Design errors
Errors of this type are introduced because of the design error of the task vocabulary. For instance, a number of substitutions occurred when Sphinx recognized METER as METER. The task grammar within the speech recognizer was modified to avoid such minor errors.

Semantic errors
Errors of this type are introduced when the speech recognition system fails to recognize the correct command. For instance, when subject 3 said, DESCEND THE CURB, Sphinx recognized it as ASCEND THE CURB, it is necessary to reason about the environment and the user’s intents.

Syntax errors
Errors of this type are introduced because the task grammar contains many ways to say the same thing. For instance, when subject 1 said ROLL BACK ONE METER, HTK recognized it as ROLL BACKWARD ONE METER. This is counted as one error of substitution.

