Skip to main content

The NITE XML Toolkit Meets the ICSI Meeting Corpus: Import, Annotation, and Browsing

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3361))

Included in the following conference series:

Abstract

The NITE XML Toolkit (NXT) provides library support for working with multimodal language corpora. We describe work in progress to explore its potential for the AMI project by applying it to the ICSI Meeting Corpus. We discuss converting existing data into the NXT data format; using NXT’s query facility to explore the corpus; hand-annotation and automatic indexing; and the integration of data obtained by applying NXT-external processes such as parsers. Finally, we describe use of NXT as a meeting browser itself, and how it can be used to integrate other browser components.

This work was carried out under funding from the European Commission (AMI, FP6-506811).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Carletta, J., Kilgour, J., Evert, S., Heid, U., Chen, Y.: The NITE XML Toolkit: data handling and search (submitted for publication)

    Google Scholar 

  2. Carletta, J., Evert, S., Heid, U., Kilgour, J., Robertson, J., Voormann, H.: The NITE XML Toolkit: flexible annotation for multi-modal language data. Behavior Research Methods, Instruments, and Computers 35, 353–363 (2003)

    Article  Google Scholar 

  3. Carletta, J., Dingare, S., Nissim, M., Nikitina, T.: Using the NITE XML Toolkit on the Switchboard Corpus to study syntactic choice: a case study. In: Fourth Language Resources and Evaluation Conference, Lisbon, Portugal (2004)

    Google Scholar 

  4. Heid, U., Voormann, H., Milde, J.T., Gut, U., Erk, K., Pad, S.: Querying both time-aligned and hierarchical corpora with NXT Search. In: Fourth Language Resources and Evaluation Conference, Lisbon, Portugal (2004)

    Google Scholar 

  5. Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI Meeting Corpus. In: ICASSP, Hong Kong (2003)

    Google Scholar 

  6. TEI Consortium: TEI: The Text Encoding Initiative (n.d.) (accessed May 26, 2003), http://www.tei-c.org/

  7. Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H.: The ICSI Meeting Recorder Dialog Act (MRDA) Corpus. In: HLT-NAACL SIGDIAL Workshop, Boston (2004)

    Google Scholar 

  8. Wrede, B., Shriberg, E.: Spotting “hot spots” in meetings: Human judgements and prosodic cues. In: EUROSPEECH, Geneva (2003)

    Google Scholar 

  9. National Institute of Standards, Technology: ATLAS Project (2000) (last update February 6, 2003; accessed March 1, 2004), http://www.nist.gov/speech/atlas/

  10. Linguistic Data Consortium: AGTK: Annotation Graph Toolkit (n.d.) (accessed March 1, 2004), http://agtk.sourceforge.net/

  11. Grover, C., Matheson, C., Mikheev, A., Moens, M.: LT TTT - a flexible tokenisation tool. In: Second International Conference on Language Resources and Evaluation (LREC 2000), vol. 2, pp. 1147–1154 (2000)

    Google Scholar 

  12. Bales, R.F.: Social Interaction Systems: Theory and Measurement. Transaction Publishers (1999)

    Google Scholar 

  13. Bales, R.F.: Interaction Process Analysis: A method for the study of small groups. Addison-Wesley, Cambridge (1951)

    Google Scholar 

  14. Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: development and use of a tool for assisting speech corpora production. Speech Communication 33 (2000), special issue, Speech Annotation and Corpus Tools

    Google Scholar 

  15. Noldus, L., Trienes, R., Hendriksen, A., Jansen, H., Jansen, R.: The Observer Video-Pro: new software for the collection, management, and presentation of time-structured data from videotapes and digital media files. Behavior Research Methods, Instruments & Computers 32, 197–206 (2000)

    Article  Google Scholar 

  16. Kipp, M.: Anvil - a generic annotation tool for multimodal dialogue. In: Seventh European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, pp. 1367–1370 (2001)

    Google Scholar 

  17. Milde, J.T., Gut, U.: The TASX-environment: an XML-based corpus database for time aligned language data. In: Bird, S., Buneman, P., Liberman, M. (eds.) Proceedings of the IRCS Workshop on Linguistic Databases. University of Pennsylvania, Philadelphia, pp. 174–180 (2001), Anvil look-alike; open source

    Google Scholar 

  18. University of Edinburgh Language Technology Group: LTG Software (n.d.) (accessed March 1 2004), http://www.ltg.ed.ac.uk/software/

  19. Reidsma, D.: Personal communication (March 11, 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carletta, J., Kilgour, J. (2005). The NITE XML Toolkit Meets the ICSI Meeting Corpus: Import, Annotation, and Browsing. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30568-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24509-4

  • Online ISBN: 978-3-540-30568-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics