2. General methodological approaches

General approaches to deal with the opportunities and challenges of the MI framework introduced in Section 1. are examined here under a probabilistic point of view. The symbols used below correspond to those introduced in Figure 1.

### 2.1. Directly using the human feedback

Without varying the model M, human interaction offers a unique opportunity to improve the quality of system hypotheses h, using information directly derived from the interaction process. In traditional PR [3], for fixed M and input x, a best hypothesis, hat(h), is one which maximises the posterior probability among all possible hypothesis. Formally,

hat(h) = argmax_(h)\ Pr(h|M, x)

Now interaction allows adding feedback-derived conditions in the form, for instance, of partial hypothesis or constraints on the input domain. Therefore, the best system hypothesis now corresponds to one which maximizes its posterior probability, given M, x and the feedback data. Formally,

\hat(h) = argmax_(h)\ Pr(h|M, x, f_1, f_2,...)

where f_1, f_2,... are the feedback-derived data.

The new system hypothesis, hat(h), may prompt the user to provide further feedback informations, thereby starting a new interaction step. And the process continues in this way until the system output is acceptable by the user. This is a first-order approach, where hat(h) is derived using only the feedback obtained in the previous iteration step. More complex, higher-order models with longer memory will be also explored in the proposed project.

Clearly, the more feedback-derived constraints can be added,the greater the opportunity to obtain better hat(h). But constructing the new probability distribution and solving the corresponding maximisation, may be more difficult than the corresponding problems with our familiar feedback-free posterior distributions. Some approaches will be discussed in section 3. In paprticular, Graphical models [4] provide interesting solutions to construct the joint probability and to propagate of the evidence conveyed by each feedback-derived constraint, as well as to adequately adapt the modelling problem to the interactive framework, as discussed below.

So far M has been kept fixed. But now human interaction offers another unique opportunity to improve system's behaviour by tuning M. The feedback data obtained at each step of the interaction process can generally be converted into new, fresh training information, useful for adapting the system to changing environment.

For many years, adaptive learning has been the focus of thorough studies.One outstanding formal framework for this type of learning is Bayesian learning [3], where model parameter priors are used to statistically model how to modify M when additional information becomes available. Recently, interesting results have been reported for adaptive learning of graphical models and Bayesian networks [5].

The application of these ideas in our PR/CV MI framework will require establishing adequate training criteria. These criteria should allow the development of adaptive training algorithms that take the maximum advantage of the interaction-derived data to ultimately minimise the overall human effort.

### 2.3. Multimodality

For the moment, we have not discussed how the interaction feedback informations can be obtained. In general, these informations do not naturally belong to the original domain from which the main data, x, come from. Therefore, human interaction naturally entails some sort of multimodality, which adds to the possible multimodal nature that input signals themselves may exhibit. Multimodality appears in many areas of Computer Science and Engineering [6]. The challenge here is how to achieve an adequate modality synergy which takes full advantage of all the modalities involved.

Assume for simplicity that both the input x and the feedback f are unimodal. This raises a modality fusion problem, which entails finding a hypothesis hat(h) that maximises the posterior probability given both the input and the feedback. Formally,

hat(h)=argmax_(h) Pr(h|M,x,f) = argmax_(h) Pr(x,f|M,h)·Pr(h)

In many applications it is natural and/or convenient to assume independence of x and f. Consider for instance that x is an input image and f the acoustic signal of a speech command given as feedback. In this case, a naive Bayes decomposition leads to a factorization of the posterior probability into three probabilities, corresponding to the input signal, the feedback and the hypothesis prior. Formally,

hat(h)~~argmax_(h)Pr(x|M_(X),h)·Pr(f|M_(F),h)·Pr(h)

This allows for a separate estimation of independent models, M_X,M_F for the image and speech components, respectively, and the only remaining "joint" problem is the joint optimisation of the mentioned probability product. Recently, this idea has been successfully followed by Vidal et al. [7] in a Multimodal Interactive Machine Translation application.

In applications where independence cannot be realistically assumed, we can attempt to estimate the posterior distribution directly. Note that the definition of the probability distributions and also the possibility of including a hyperprior on the models M` lead to challenging modelling and inference problems (see [8]).

### 2.4. Performance evaluation

Perhaps one of the most influential factors for the rapid development of PR and CV technology in the last few decades, is the nowadays commonly adopted assessment paradigm based on labelled training and testing corpora. Under this paradigm, different approaches or algorithms can be easily, objectively and automatically tested and compared, without having to implement complete prototypes or requiring human intervention in the assessment procedures.

In the MI framework proposed here, a human being is embedded "in the loop", and system performance has to be gauged mainly in terms of how much human effort is required to achieve the goals of the considered task. Although evaluating system performance in this new scenario apparently requires human work and judgement, by carefully specifying precise goals and ground-truth, the corpus-based assessment paradigm is still applicable in most MI tasks. An example of application of such corpus-based testing approach is given in section 3.

<Prev                     Next>