Control Architecture of the Robota

Speech Module
Speech module sentences and the word from the speech stream are extracted by the CONVERSARY Automatic Speech Recognition (ASR) engine, using pre-programmed syntactic rules. The syntax is described as a set of rules. Multiple sentences can then describe same meaning. In the system only the subset of keywords are kept for further processing by the learning module. For example, when the user says “This is your face”, the ASR detects the use of an indexed grammar. In the present example, the grammar, encoded by the programmer, specifies that “This is your” is always followed by a noun, here “face”. Among the list of nouns that the ASR programmed to recognize, the word “face” is in this example is keyword that is extracted and processed for learning. The advantage of the syntax definition instead of a list of sentences is that the description is shorter, use less computational power, and can generalize sentences. The user then can omit unimportant words without perturbing the system.

Vision Module
The vision module of Robota grabs images of the upper part of the user’s body, including the head, arms, and shoulders. It tracks the vertical movements of both arms and the horizontal movements or rotation of the head. Tracking of the arms is based on luminosity and optical flow detection. The luminosity is extracted from the pixels RGB color intensity.

