Skip to main content

The Corpus Analysis Toolkit - Analysing Multilevel Annotations

  • Conference paper
Human Language Technology. Challenges for Computer Science and Linguistics (LTC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6562))

Included in the following conference series:

  • 1090 Accesses

Abstract

This paper considers a number of issues surrounding current annotation science and corpus analysis and presents a bespoke suite of software, the Corpus Analysis Toolkit, for processing and analysing multilevel annotations of time-aligned linguistic data. The toolkit provides a variety of specialised tools for performing temporal analysis of annotated linguistic data. The toolkit is feature-set and corpus independent and offers support for a number of commonly used annotations formats.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aioanei, D.: YASPER: A Knowledge-based and Data-driven Speech Recognition Framework. PhD Thesis, University College Dublin (2008)

    Google Scholar 

  2. Bird, S., Klein, E.: Phonological Events. Journal of Linguistics 26, 33–56 (1990)

    Article  Google Scholar 

  3. Bird, S., Liberman, M.: A Formal Framework for Linguistic Annotation. Speech Communication 33, 23–60 (2001)

    Article  MATH  Google Scholar 

  4. Boersma, P., Weenik, D.: A System for Doing Phonetics by Computer. Glot International 5, 9–10 (2001)

    Google Scholar 

  5. Browman, C., Goldstein, L.: Towards an Articulatory Phonology. Phonology Yearbook 2, 219–252 (1986)

    Google Scholar 

  6. Brugman, H., Russell, A., Broeder, D., Wittenburg, P.: Eudico - Annotation and Exploitation of Multimedia Corpora over the Internet (2000)

    Google Scholar 

  7. Carson-Berndsen, J.: Time Map Phonology 5, Text Speech and Language Technology. Kluwer, Dordrecht (1998)

    Book  Google Scholar 

  8. Goldsmith, J.: Autosegmental Phonology. PhD Thesis, MIT, Boston, USA (1976)

    Google Scholar 

  9. Greenberg, S.: Speaking in shorthand a syllable-centric perspective for pronunciation variation. Speech Communication 29, 159–176 (1999)

    Article  Google Scholar 

  10. Ide, N., Romary, L., de la Clergerie, E.: International Standard for a Linguistic Annotation Framework. CoRR abs/0707.3269 (2007)

    Google Scholar 

  11. Kanokphara, S., Carson-Berndsen, J.: Better HMM-Based Articulatory Feature Extraction with Context-Dependent Models. FLAIR (2005)

    Google Scholar 

  12. Kelly, R.: Learning Multitape Finite-state Machines from Multilevel Annotations. PhD Thesis, University College Dublin (2005)

    Google Scholar 

  13. Kelly, R., Neugebauer, M., Walsh, M., Wilson, S.: Annotating Syllable Corpora with Linguistic Data Categories in XML. In: Proceedings of the 4th International Conference on Linguistic Resources and Evaluation (2004)

    Google Scholar 

  14. Kipp, M.: Anvil A Generic Annotation Tool for Multimodal Dialogue. In: Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech), pp. 1367–1370 (2001)

    Google Scholar 

  15. Patterson, E.K., Gurbuz, S., Tufecki, Z., Gowdy, J.N.: CUAVE: A New Audio-visual Database for Multimodal Human Computer Interface Research. IEEE Conference on Acoustics, Speech and Signal Processing (2002)

    Google Scholar 

  16. Saenko, K., Livescu, K., Glass, J., Darell, T.: Visual Speech Recognition with Loosely Synchronised Feature Streams. In: Proceedings ICCV, Beijing (2005)

    Google Scholar 

  17. Sagey, E.: On the ill-formedness of crossing association lines. Linguistic Enquiry 19(1), 109–118 (1988)

    Google Scholar 

  18. Schmidt, T.: The transcription system EXMARaLDA: an application of the annotation graph formalism as the basis of a database of multilingual spoken discourse. In: Proceedings of the IRCS Workshop on Linguistic Databases (2001)

    Google Scholar 

  19. Van Bael, C., Boves, L., van den Heuvel, H., Strik, H.: Automatic Transcription of Large Speech Corpora. Computer Speech and Language 21(4), 652–668 (2007)

    Article  Google Scholar 

  20. Walsh, M., Wilson, S.: An Agent-based Framework for Audio-visual Speech Investigation. In: Proceedings of Audio Visual Speech Processing Conference (2005)

    Google Scholar 

  21. Wilson, S.: Gesture-based Representations of Speech - Acquiring and Analysing Resources for Audio-visual Processing. PhD Thesis. University College Dublin (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wilson, S., Carson-Berndsen, J. (2011). The Corpus Analysis Toolkit - Analysing Multilevel Annotations. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20095-3_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20094-6

  • Online ISBN: 978-3-642-20095-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics