Menu Content/Inhalt
WP3: MI prototypes for audio, speech and language PDF Print E-mail

Several MI prototypes are being developed. The main preprocessing techniques that have been developed in WP1 have been incorporated into the MI prototypes. Then, the principal ideas about MI theory that have been developed in WP2 are being used in the prototypes. These prototypes are useful for validating and showing the viability of the proposed approaches, as well as for dissemination activities of WP6 aimed at the international community, both scientific and non-scientific audience.


Selected publications

  • Interactive machine translation

    • Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., Ney, H., Lagarda A., Tomás, J. and Vidal, E. (2009), "Statistical approaches to computer-assisted translation", Computational Linguistics, 35(1): 3-28.

    • Alabau, V, Leiva, L. A., Ortiz, D. and Casacuberta, F. (2012), "User Evaluation of Interactive Machine Translation Systems", Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT), 2012, pp. 20-23.

    • MIPRCV relevance: The basic ideas proposed in MIPRCV, and implemented in WP2, about human interaction, multimodality and adaptation are presented in this prototype in order to produce high quality translations in an efficient way.
      The interactive machine translation prototype entails an iterative process in which the human translator activity is included in the loop: In each iteration, a prefix of the translation is validated (accepted or amended) by the human and the system computes its best translation suffix hypothesis to complete this prefix.
      The translations that are validated by the human translator constitute new adaptation data that are also used to modified the models in an incremental way.
      The first article deals with the basic formulation of the interactive machine translation framework, and the second one deals with a human evaluation of the prototype.

  • Interactive Predictive Parsing

    • Sánchez-Sáez, R., Leiva, L.-A., Sánchez, J.-A. and Benedí, J.-M. (2010), "Interactive Predictive Parsing using a Web-based Architecture", Proceedings of the Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT'10). pp. 37-40. Los Angeles, California. June.
    • MIPRCV relevance: This web-based demonstration implements a new formal framework for interactive parsing that was developed in the WP2. This paper introduces this interactive paradigm for syntactic tree annotation. In this demo, the user is tightly integrated into the interactive parsing system. In contrast with the traditional post-editing approach, both the user and the system cooperate to generate error-free annotated trees. User feedback is provided by means of natural mouse gestures and keyboard strokes.

  • Interactive speech transcription

    • Vidal, E., Rodríguez, L., Casacuberta, F. and García-Varea, I. (2007), "Interactive Pattern Recognition", Proceedings of the 4th Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, Lecture Notes in Computer Science (MLMI 2007), 4892: 60-71. Brno, Czech Republic. June.
    • Revuelta A, Rodríguez L and García-Varea (2012), "A Computer Assisted Speech Transcription Systems", Proceedings of the 13th Conference of the European Association for Computational Linguistics (EACL 2012), pp. 41–45, Avignon, France, April 23 - 27 .
    • MIPRCV relevance: A first version of the prototype for Interactive Speech Transcription was presented as part of this paper. The prototype implemented the theoretical formulation and the search techniques that are being developed in WP2. The application allows the user to conduct a real interactive transcription session from a set of audio files storing speech utterances. The prototype presented is one of the applications included in WP3.

  • Interactive text prediction

    • Rodríguez, L., Revuelta A., García-Varea, I. and Vidal, E. (2010), "A Multi-modal Interactive Text Generation System", Proceedings of the 2010 International Conference on Multimodal Interfaces (ICMI-MLMI 2010). pp. 11. Beijing, China, November.
    • MIPRCV relevance: This paper describes the prototype for Interactive Text Prediction that is included in WP3. The prototype is based on the prediction techniques that are being investigated as part of WP2. Two different prediction modalities can be chosen so that application can be used in a normal typing scenario and also with a very constrained input interface. As a result of the developments made in WP3, the prototype includes adaptive learning which allows the system to learn from the user's writing style.

  • Interactive music transcription

    • Pertusa A., Iñesta J.M. (2008). "Multiple Fundamental Frequency estimation using Gaussian smoothness". Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing,(ICASSP 2008), pp. 105-108, Las Vegas, USA. March.
    • MIPRCV relevance: Music transcription is defined as the act of writing down the score that hypothetically produced a recorded musical sound. The automatic process tries to identify and track the notes that are sounding in a digital audio file. It is a process related to speech transcription and shares techniques with it. The task is multimodal by nature because in addition to notes, there are other dimensions like tempo and meter that are also extracted and collaborate in the task. The interative approach to it is very relevant in the context of MIPRCV, since different kinds of feedback can be obtained from the user that may validate or change all the extracted dimensions, introducing qualified information into the system. The interactions can modify the given hypotheses improving the transcription in successive steps.

  • Spoken dialogue

    • Doncel, J., Olaso, J.-M., Justo, R., Guijarrubia, V., Pérez, A. and Torres, M.-I. (2010), "A multimodal dialogue interface", Proceedings of Workshop on Interactive Multimodal Pattern Recognition in Embedded Systems (IMPRESS 2010): 266--267. Bilbao, Spain. September.
    • Olaso, J. and Torres, M.I. (2010), "Dialogue System Based on EDECÁN Architecture". Proceedings of the 13th International Conference on Text, Speech and Dialogue. Lecture Notes in Computer Science, 6231: 547-551. Brno, Czech Republic. September.
    • MIPRCV relevance: The ideas of multimodality and interaction are closely related to dialogue system applications. Thus, the prototype described in this work is particularly relevant in the framework of WP3. It consists on a stand with a dialogue system that provides information related to news and weather forecast reports by means of different multimodal input/outputs. In this particular case the system is switched on when the presence of a face is detected and then the dialogue starts. The requested information is provided by using speech and a graphical user interface. In this way a more flexible interaction that could result more pleasant and use ful for the user, even when considering disabled people, can be obtained. The information coming from different knowledge sources has been managed by using a flexible architecture (EDECAN) based on the communication of any kind of service through TCP/IP protocols.

Last Updated ( Friday, 28 September 2012 )