Abstract
The DIRAC project was an integrated project that was carried out between January 1st 2006 and December 31st 2010. It was funded by the European Commission within the Sixth Framework Research Programme (FP6) under contract number IST-027787. Ten partners joined forces to investigate the concept of rare events in machine and cognitive systems, and developed multi-modal technology to identify such events and deal with them in audio-visual applications.
This document summarizes the project and its achievements. In Section 2 we present the research and engineering problem that the project set out to tackle, and discuss why we believe that advance made on solving these problems will get us closer to achieving the general objective of building artificial cognitive system with cognitive capabilities. We describe the approach taken to solving the problem, detailing the theoretical framework we came up with. We further describe how the inter-disciplinary nature of our research and evidence collected from biological and cognitive systems gave us the necessary insights and support for the proposed approach. In Section 3 we describe our efforts towards system design that follow the principles identified in our theoretical investigation. In Section 4 we describe a variety of algorithms we have developed in the context of different applications, to implement the theoretical framework described in Section 2. In Section 5 we describe algorithmic progress on a variety of questions that concern the learning of those rare events as defined in our Section 2. Finally, in Section 6 we describe our application scenarios, an integrated test-bed developed to test our algorithms in an integrated way.
Keywords
- Equal Error Rate
- Superior Temporal Sulcus
- Word Error Rate
- Novelty Detection
- Large Vocabulary Continuous Speech Recognition
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bach, J.-H., Anemüller, J.: Detecting novel objects through classifier incongruence. In: Interspeech, pp. 2206–2209 (2010)
Bach, J.-H., Kollmeier, B., Anemüller, J.: Modulation-based detection of speech in real background noise: Generalization to novel background classes. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 41–44 (2010)
De Baene, W., Premereur, E., Vogels, R.: Properties of shape tuning of macaque inferior temporal neurons examined using Rapid Serial Visual Presentation. Journal of Neurophysiology 97, 2900–2916 (2007)
Burget, L., Schwarz, P., Matejka, P., Hannemann, M., Rastrow, A., White, C., Khudanpur, S., Hermansky, H., Cernocky, J.: Combination of strongly and weakly constrained recognizers for reliable detection of OOVs. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), p. 4 (2008)
Castellini, C., Tommasi, T., Noceti, N., Odone, F., Caputo, B.: Using object affordances to improve object recognition. IEEE Transaction on Autonomous Mental Development (2011)
De Baene, W., Vogels, R.: Effects of adaptation on the stimulus selectivity of macaque inferior temporal spiking activity and local field potentials. Cerebral Cortex 20(9), 2145–2165 (2010)
De Baene, W., Ons, B., Wagemans, J., Vogels, R.: Effects of category learning on the stimulus selectivity of macaque inferior temporal neurons. Learning and Memory 15, 717–727 (2008)
Deliano, Ohl: Neurodynamics of category learning: Towards understanding the creation of meaning in the brain. New Mathematics and Natural Computation (NMNC) 5, 61–81 (2009)
Hannemann, M., et al.: Similarity scoring for recognized repeated Out-of-Vocabulary words. In: Proc. Interspeech 2010, Makuhari, Japan (2010)
Hermansky, H.: Dealing With Unexpected Words in Automatic Recognition of Speech. Technical report, Idiap Research Institute (2008)
Herrmann, C.S., Ohl, F.W.: Cognitive adequacy in brain-like intelligence. In: Sendhoff, B., Körner, E., Sporns, O., Ritter, H., Doya, K. (eds.) Creating Brain-Like Intelligence. LNCS, vol. 5436, pp. 314–327. Springer, Heidelberg (2009)
Jie, L., Orabona, F., Caputo, B.: An online framework for learning novel concepts over multiple cues. In: Proceedings of Asian Conference on Computer Vision (ACCV), vol. 1, pp. 1–12 (2009)
Ketabdar, H., Hannemann, M., Hermansky, H.: Detection of Out-of-Vocabulary (2007)
Kayser, H., Ewert, S.D., Anemüller, J., Rohdenburg, T., Hohmann, V., Kollmeier, B.: Database of Multichannel In-Ear and Behind-the-Ear Head-Related and Binaural Room Impulse Responses. EURASIP Journal on Advances in Signal Processing, 1–10 (2009)
Words in Posterior Based ASR. In: 8th Annual Conference of the International Speech Communication Association INTERSPEECH 2007, pp. 1757–1760 (2007)
Kombrink, S.: OOV detection and beyond. In: DIRAC Workshop at ECML/PKDD (2010)
Kombrink, S., Hannemann, M., Burget, L., Heřmanský, H.: Recovery of rare words in lecture speech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS(LNAI), vol. 6231, pp. 330–337. Springer, Heidelberg (2010)
Kombrink, S., Burget, L., Matejka, P., Karafiat, M., Hermansky, H.: Posterior-based Out of Vocabulary Word Detection in Telephone Speech. In: ISCA, Interspeech 2009, Brighton, GB, pp. 80–83 (2009), ISSN 1990-9772
Mikolov, T., Karafiát, M., Burget, L., Černocký, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), pp. 1045–1048 (2010)
Nater, F., Grabner, H., Jaeggli, T., Gool, L.v.: Tracker trees for unusual event detection. In: ICCV 2009 Workshop on Visual Surveillance (2009)
Nater, F., Vangeneugden, J., Grabner, H., Gool, L.v., Vogels, R.: Discrimination of locomotion direction at different speeds: A comparison between macaque monkeys and algorithms. In: ECML Workshop on rare audio-visual cues (2010)
Orabona, F., Jie, L., Caputo, B.: Online-Batch Strongly Convex Multi Kernel Learning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010 (2010)
Orabona, F., Caputo, B., Fillbrandt, A., Ohl, F.: A Theoretical Framework for Transfer of Knowledge Across Modalities in Artificial and Biological Systems. In: IEEE 8th International Conference on Development and Learning, ICDL 2009 (2009)
Orabona, F., Castellini, C., Caputo, B., Luo, J., Sandini, G.: Towards Life-long Learning for Cognitive Systems: Online Independent Support Vector Machine. Pattern Recognition 43(4), 1402–1412 (2010)
Orabona, F., Keshet, J., Caputo, B.: Bounded kernel-based perceptrons. Journal of Machine Learning Research 10, 2643–2666 (2009)
Orabona, F., Keshet, J., Caputo, B.: The projectron: a bounded kernel-based perceptron. In: 25th International Conference on Machine Learning (2008)
Orabona, F., Castellini, C., Caputo, B., Luo, J., Sandini, G.: Indoor Place Recognition using Online Independent Support Vector Machines. In: Proceedings of the 18th British Machine Vision Conference (BMVC), pp. 1090–1099 (2007)
Pajdla, T., Havlena, M., Heller, J., Kayser, H., Bach, J.-H., Anemüller, J.: Incongruence Detection for Detecting, Removing, and Repairing Incorrect Functionality in Low-Level Processing (CTU-CMP-2009-19). Technical report, CTU Research Report (2009)
Schmidt, D., Anemüeller, J.: Acoustic Feature Selection for Speech Detection Based on Amplitude Modulation Spectrograms. In: Fortschritte der Akustik: DAGA 2007, Deutsche Gesellschaft für Akustik (DEGA), pp. 347–348 (2007)
Szöke, I., Fapso, M., Burget, L., Cernocky, J.: Hybrid Word-Subword Decoding for Spoken Term Detection. In: SSCS 2008 - Speech search Workshop at SIGIR, p. 4 (2008)
Tommasi, T., Orabona, F., Caputo, B.: Safety in numbers: learning categories from few examples with multi model knowledge transfer. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010 (2010)
Tommasi, T., Caputo, B.: The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: British Machine Vision Conference, BMVC 2009 (2009)
Vangeneugden, J., De Mazière, P., Van Hulle, M., Jaeggli, T., Van Gool, L., Vogels, R.: Distinct Mechanisms for Coding of Visual Actions in Macaque Temporal Cortex. Journal of Neuroscience 31(2), 385–401 (2011)
Vangeneugden, J., Vancleef, K., Jaeggli, T., Van Gool, L., Vogels, R.: Discrimination of locomotion direction in impoverished displays of walkers by macaque monkeys. Journal of Vision 10(4), 22.1–22.19 (2010)
Vangeneugden, J., Pollick, F., Vogels, R.: Functional differentiation of macaque visual temporal cortical neurons using a parametric action space. Cerebral Cortex 19(3), 593–611 (2009)
Verhoef, B.E., Kayaert, G., Franko, E., Vangeneugden, J., Vogels, R.: Stimulus similarity-contingent neural adaptation can be time and cortical area dependent. Journal of Neuroscience 28, 10631–10640 (2008)
White, C., Zweig, G., Burget, L., Schwarz, P., Hermansky, H.: Confidence Estimation, Oov Detection And Language Id Using Phone-To-Word Transduction And Phone-Level Alignments. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 4085–4088 (2008)
Witte, H., Charpentier, M., Mueller, M., Voigt, T., Deliano, M., Garke, B., Veit, P., Hempel, T., Diez, A., Reiher, A., Ohl, F., Dadgar, A., Christen, J., Krost, A.: Neuronal cells on GaN-based materials. Deutsche Physikalische Gesellschaft, Spring Meeting of the Deutsche Physikalische Gesellschaft, Berlin (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Anemüller, J. et al. (2012). DIRAC: Detection and Identification of Rare Audio-Visual Events. In: Weinshall, D., Anemüller, J., van Gool, L. (eds) Detection and Identification of Rare Audiovisual Cues. Studies in Computational Intelligence, vol 384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24034-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-24034-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24033-1
Online ISBN: 978-3-642-24034-8
eBook Packages: EngineeringEngineering (R0)