Abstract
The NITE XML Toolkit (NXT) provides library support for working with multimodal language corpora. We describe work in progress to explore its potential for the AMI project by applying it to the ICSI Meeting Corpus. We discuss converting existing data into the NXT data format; using NXT’s query facility to explore the corpus; hand-annotation and automatic indexing; and the integration of data obtained by applying NXT-external processes such as parsers. Finally, we describe use of NXT as a meeting browser itself, and how it can be used to integrate other browser components.
This work was carried out under funding from the European Commission (AMI, FP6-506811).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Carletta, J., Kilgour, J., Evert, S., Heid, U., Chen, Y.: The NITE XML Toolkit: data handling and search (submitted for publication)
Carletta, J., Evert, S., Heid, U., Kilgour, J., Robertson, J., Voormann, H.: The NITE XML Toolkit: flexible annotation for multi-modal language data. Behavior Research Methods, Instruments, and Computers 35, 353–363 (2003)
Carletta, J., Dingare, S., Nissim, M., Nikitina, T.: Using the NITE XML Toolkit on the Switchboard Corpus to study syntactic choice: a case study. In: Fourth Language Resources and Evaluation Conference, Lisbon, Portugal (2004)
Heid, U., Voormann, H., Milde, J.T., Gut, U., Erk, K., Pad, S.: Querying both time-aligned and hierarchical corpora with NXT Search. In: Fourth Language Resources and Evaluation Conference, Lisbon, Portugal (2004)
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI Meeting Corpus. In: ICASSP, Hong Kong (2003)
TEI Consortium: TEI: The Text Encoding Initiative (n.d.) (accessed May 26, 2003), http://www.tei-c.org/
Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H.: The ICSI Meeting Recorder Dialog Act (MRDA) Corpus. In: HLT-NAACL SIGDIAL Workshop, Boston (2004)
Wrede, B., Shriberg, E.: Spotting “hot spots” in meetings: Human judgements and prosodic cues. In: EUROSPEECH, Geneva (2003)
National Institute of Standards, Technology: ATLAS Project (2000) (last update February 6, 2003; accessed March 1, 2004), http://www.nist.gov/speech/atlas/
Linguistic Data Consortium: AGTK: Annotation Graph Toolkit (n.d.) (accessed March 1, 2004), http://agtk.sourceforge.net/
Grover, C., Matheson, C., Mikheev, A., Moens, M.: LT TTT - a flexible tokenisation tool. In: Second International Conference on Language Resources and Evaluation (LREC 2000), vol. 2, pp. 1147–1154 (2000)
Bales, R.F.: Social Interaction Systems: Theory and Measurement. Transaction Publishers (1999)
Bales, R.F.: Interaction Process Analysis: A method for the study of small groups. Addison-Wesley, Cambridge (1951)
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: development and use of a tool for assisting speech corpora production. Speech Communication 33 (2000), special issue, Speech Annotation and Corpus Tools
Noldus, L., Trienes, R., Hendriksen, A., Jansen, H., Jansen, R.: The Observer Video-Pro: new software for the collection, management, and presentation of time-structured data from videotapes and digital media files. Behavior Research Methods, Instruments & Computers 32, 197–206 (2000)
Kipp, M.: Anvil - a generic annotation tool for multimodal dialogue. In: Seventh European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, pp. 1367–1370 (2001)
Milde, J.T., Gut, U.: The TASX-environment: an XML-based corpus database for time aligned language data. In: Bird, S., Buneman, P., Liberman, M. (eds.) Proceedings of the IRCS Workshop on Linguistic Databases. University of Pennsylvania, Philadelphia, pp. 174–180 (2001), Anvil look-alike; open source
University of Edinburgh Language Technology Group: LTG Software (n.d.) (accessed March 1 2004), http://www.ltg.ed.ac.uk/software/
Reidsma, D.: Personal communication (March 11, 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carletta, J., Kilgour, J. (2005). The NITE XML Toolkit Meets the ICSI Meeting Corpus: Import, Annotation, and Browsing. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-30568-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24509-4
Online ISBN: 978-3-540-30568-2
eBook Packages: Computer ScienceComputer Science (R0)