Menu Content/Inhalt
WP1: Perception and basic, non-interactive decision-making PDF Print E-mail

The main goal of the Workpackage 1 is the development of basic common tools to be used in application WPs, aimed at processing raw input signals and probabilistic modelling at low and middle level to extract higher content information as a initial step for multimodal fusion and interactive learning processes (WP2). State of the art techniques are reviewed and identified according to the requierements. Novel specific methods and techniques are also developed where required.

The workpackge is organized in five different tasks:

  • Probabilistic modelling and inference
  • Audio, speech, and language processing
  • Image processing and analysis
  • Video processing and interpretation
  • Robotic sensor subsystems analysis

Coordinator: Filiberto Pla (CVDSP-UJI)

Group delegates: Ángel Sappa (CV-CVC), Pedro García-Sevilla (CVDSP-UJI), Pedro M. Martínez (CVL-UGR), Alejandro Párraga (PR-CVC), José Ramón Rico (PRAI-UA), Jorge Civera (PRHLT-ITI), Francesc Moreno (RP-IRI).


Featured publications

  1. L. Gómez-Chova, J. Muñoz, V. Laparra, J. Malo and G. Camps. Chapter 9, "A Review of Kernel Methods in Remote Sensing Data Analysis" in: Optical Remote Sensing - Advances in Signal Processing and Exploitation Techniques. Saurabh Prasad, Lori Bruce and Jocelyn Chanussot, Editors. Springer-Verlag, Germany, 2011.

    MIPRCV Relevance
    This work revises standard learning methodologies but also introduces the new paradigms that can be beneficial for other WPs in the project: the important issue of multimodal information fusion, the adaptation to changing environments and how structured learning may help in introducing output class information are treated under the formalism of kernel methods. This general theoretical aspects can be helpful for the development of prototypes as well. In particular, those concerning adaptive image processing and pattern classification: from illuminant invariant feature classifiers to change detection problems. The basic methods detailed perfectly fit in WP1 and are further developed in WP2 in the form of other semisupervised and active kernel machines following graph-based approaches and manifold learning theory.

  2. J. Bekios-Calfa, J. M. Buenaposada, and L. Baumela. "Revisiting Linear Discriminant Techniques in Gender Recognition". IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 33, Nº 4, April 2011.

    MIPRCV Relevance
    This paper considers whether linear techniques are applicable for gender classification. This is a relevant issue in the context of resource-limited computation for ubiquitous computing and in the design of efficient algorithms for processing millions of images. The main conclusion of the paper is that, when properly trained, linear classifiers achieve performances similar to other well known non-linear classifiers at a fraction of the computational cost and requiring less training data.

    This work is related to the image processing and analysis task within work-package 1. It can also be used as a basic algorithm for image and video retrieval, surveillance and biometry in work-packages 4 and 5.

  3. J. Andrés-Ferrer, D. Ortiz-Martínez, I. García-Varea, and F. Casacuberta. "On the use of different loss functions in statistical pattern recognition applied to machine translation", Pattern Recognition Letters, 29: 1072-1081, 2008.

    MIPRCV Relevance
    None of the standard PR assessment measures such as the classification error rate or word error rate, properly evaluate the mistakes produced in the multimodal and interactive environment of the MIPRCV project. A key idea of this framework is to assume that the embebed PR system is going to produce several errors or mistakes in the standard PR sense, and, hence, a user is added to the system. Therefore, the aim of the system is no longer to achieve a good performance in the standard PR way but to minimise the user effort. This effort also depends on the protocol through which the interaction is established. Hence, it is mandatory to define new systems that attempt to minimise the user effort. In this paper, it is shown how to successfully change the system loss function so that it minimises different assessment measures even thought the system are not specifically trained for that aim as often happens. This paper is based on a similar idea of that of the MIPRCV, that is to say, we assume that a perfect systems cannot been built, and consequently more flexibility is given to the system so that it can minimize mistakes in likely inputs at the expense of failing on the less probable inputs. Moreover, a number of works and papers under the MIPRCV project have developed and further analysed the ideas proposed in this paper.

  4. C. Pérez-Sancho, D. Rizo, J. M. Iñesta, P. J. Ponce de León, S. Kersten, and R. Ramirez. "Genre classification of music by tonal harmony", Intelligent Data Analysis, vol. 14, pp. 533-545, 2010.

    MIPRCV Relevance
    In this work we have shown that chords extracted from raw audio signals can be used in a probabilistic framework to obtain high-level descriptors such as musical genre. The chord sequences obtained this way can be also used in a multimodal scenario, in combination with musical metadata, in order to improve the classification rates. These chord sequences could be also used by the Interactive Music Transcription Prototype developed in WP3, restricting the output (notes detected) using the information provided by the chord sequences.

  5. D. Gerónimo, A. M. López, A. D. Sappa, and T. Graf. "Survey of Pedestrian Detection for Advanced Driver Assitance Systems". IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 32, Num. 7, pp. 1239-1258, 2010.

    MIPRCV Relevance
    As the title of the paper suggests and the abstract explains, this paper reviews quite in deep the state of the art on pedestrian detection in images. One of the most relevant modules of a pedestrian detector has been identified as the pedestrian classifier, i.e., a machine that given an image window determines if it contains a pedestrian. Therefore, machine learning techniques for building pedestrian classifiers are included in this survey. This has allowed to confirm that most successful machine learning techniques for classifying pedestrians in images follow the discriminative paradigm, as well as a batch-mode procedure for learning. This is also relevant for WP5 since exploring other procedures as on-line learning and adaptive learning for building pedestrian classifiers will be the basis of one of the prototypes.

    Additionally, the survey has allowed to identify the potential usefulness of using multimodal data to detect pedestrians, in particular, 3D data as well as LWR spectrum.

  6. A. Sanfeliu, J. Andrade-Cetto, M. Barbosa, R. Bowden, J. Capitán, A. Corominas, A. Gilbert, J. Illingworth, L. Merino, J.M. Mirats, P. Moreno, A. Ollero, J. Sequeira, and M.T.J. Spaan, "Decentralized Sensor Fusion for Ubiquitous Networking Robotics in Urban Areas", Sensors, 10, pp. 2274 - 2314; doi:10.3390/s100302274; ISSN 1424-8220, 2010.

    MIPRCV relevance
    This architecture and the infrastructure associated with it provide the foundations on which the MIPRCV Ubiquitous Robotics Prototype rests. The paper describes low level and middle level data processing modules specific to MIPRCV's WP1 aims, that include amongst others, camera sensor network feeds, multirobot communication subsystems; as well as higher level applications in the prototype related to other MIPRCV workpackages, such as robust robot localization, 3d mapping, people tracking, and human activity recognition.

Last Updated ( Thursday, 07 April 2011 )