Menu Content/Inhalt
Home arrow Project Description arrow Introduction
1. Introduction to the project Print E-mail

Placing PR and CV within the human-interaction framework requires changes in the way we look at problems in these areas. Classical PR/CV minimum-error performance criteria [3] should be complemented with better estimates of the amount of effort that the interactive process will demand from the user. But, to estimate this effort, we should stick with the traditional testing-corpus based approach which has proved so successful in PR and CV. Furthermore, since all the existing PR/CV techniques are intrinsically grounded on error-minimisation algorithms, they need to be revised and adapted to the new, minimum human-effort performance criterion. Interestingly, such a paradigm shift entails important research opportunities which hold promise for a new generation of truly human-friendly PR and CV devices. Three main types of opportunities can be identified:


Feedback:
Take direct advantage of the feedback information provided by the user in each interaction step to improve raw performance,
Adaptation:
Use feedback-derived data to adaptively (re-)train the system and tune it to the user behaviour and the specific task considered,
Multimodality:
Acknowledge the inherent multimodality of interaction to improve overall system behaviour and usability.

Figure 1 shows a schematic view of these ideas. Here, `x` is an input stimulus, observation or signal and `h` is a hypothesis or output, which the system derives from `x`. By observing `x` and `h` the user provides some (perhaps null) feedback signal, `f`, which may iteratively help the system refine or improve its hypothesis until it is finally accepted. `M` is a model which the system uses to derive its hypotheses. In general, the model is initially obtained through a classical "batch" or "off-line" training process (drawn in grey lines) from a previously given training sequence of pairs `(x_i,h_i)` from the task being considered. Now, during the interactive operation, the valuable user feedback signals produced in each interaction step are advantageously used in an on-line training process which progressively adapts `M` to the specific task and/or to the way the user makes use of the system in this task.

 

Figure 1: Diagram of a multimodal interactive system.
 
diagram.gif

 

It should be noted that MI may entail two types of multimodality. One corresponds to the input signal, which can be a complex blend of different types of data, ranging from conventional keyboard-and-pointer data to complex images, audio and video streams. The other, more subtle but not less important type, comes from the generally different nature of input and feedback signals. It is this second type that makes multimodality an inherent feature of human interaction.

Consider for instance a MI application in the context of tele-care for elderly people. Here, `x` is itself a multimodal signal which comes from human monitoring sensors such as microphones, cameras and perhaps other non-invasive medical sensors. Now `x` is processed by a MI system, which tracks the human activities in order to detect possibly abnormal or risky situations. Clearly, such type of systems will never be fully automatic. Instead, the system should set adequate alarms (`h`) for another human to consider and act accordingly. It is this other human, or operator, the one which is depicted in Fig 1. For a specific system hypothesis `h`, the operator may doubt about its correctness and, after examining `x`, may try to improve `h` by providing some feedback `f` to the system. Feedback complexity may range from simple directions for the system to improve "perceptive" parameters such as camera focus or microphone gain, to much more subtle informations directly aimed at improving the system's "cognitive" or decision-making performance (cf. section 2.1.). In any case, the operator's feedback signals will rarely be of the same nature as those in `x`. Typically, they will probably involve keyboard-and-pointer signals and/or perhaps hand gestures or voice commands.

In the fields of PR and CV other examples abound where the MI framework also applies very naturally. A selection of these examples will be discussed in section 3. It should be noted that interaction may also occur between MI systems themselves. This happens mainly in the field of robotics, where a team of robots may cooperate to achieve a common goal.

 

<Prev                     Next>  
 

Login