The Corpus Analysis Toolkit - Analysing Multilevel Annotations

Wilson, Stephen; Carson-Berndsen, Julie

doi:10.1007/978-3-642-20095-3_9

Stephen Wilson²⁰ &
Julie Carson-Berndsen²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6562))

Included in the following conference series:

Language and Technology Conference

1090 Accesses

Abstract

This paper considers a number of issues surrounding current annotation science and corpus analysis and presents a bespoke suite of software, the Corpus Analysis Toolkit, for processing and analysing multilevel annotations of time-aligned linguistic data. The toolkit provides a variety of specialised tools for performing temporal analysis of annotated linguistic data. The toolkit is feature-set and corpus independent and offers support for a number of commonly used annotations formats.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aioanei, D.: YASPER: A Knowledge-based and Data-driven Speech Recognition Framework. PhD Thesis, University College Dublin (2008)
Google Scholar
Bird, S., Klein, E.: Phonological Events. Journal of Linguistics 26, 33–56 (1990)
Article Google Scholar
Bird, S., Liberman, M.: A Formal Framework for Linguistic Annotation. Speech Communication 33, 23–60 (2001)
Article MATH Google Scholar
Boersma, P., Weenik, D.: A System for Doing Phonetics by Computer. Glot International 5, 9–10 (2001)
Google Scholar
Browman, C., Goldstein, L.: Towards an Articulatory Phonology. Phonology Yearbook 2, 219–252 (1986)
Google Scholar
Brugman, H., Russell, A., Broeder, D., Wittenburg, P.: Eudico - Annotation and Exploitation of Multimedia Corpora over the Internet (2000)
Google Scholar
Carson-Berndsen, J.: Time Map Phonology 5, Text Speech and Language Technology. Kluwer, Dordrecht (1998)
Book Google Scholar
Goldsmith, J.: Autosegmental Phonology. PhD Thesis, MIT, Boston, USA (1976)
Google Scholar
Greenberg, S.: Speaking in shorthand a syllable-centric perspective for pronunciation variation. Speech Communication 29, 159–176 (1999)
Article Google Scholar
Ide, N., Romary, L., de la Clergerie, E.: International Standard for a Linguistic Annotation Framework. CoRR abs/0707.3269 (2007)
Google Scholar
Kanokphara, S., Carson-Berndsen, J.: Better HMM-Based Articulatory Feature Extraction with Context-Dependent Models. FLAIR (2005)
Google Scholar
Kelly, R.: Learning Multitape Finite-state Machines from Multilevel Annotations. PhD Thesis, University College Dublin (2005)
Google Scholar
Kelly, R., Neugebauer, M., Walsh, M., Wilson, S.: Annotating Syllable Corpora with Linguistic Data Categories in XML. In: Proceedings of the 4th International Conference on Linguistic Resources and Evaluation (2004)
Google Scholar
Kipp, M.: Anvil A Generic Annotation Tool for Multimodal Dialogue. In: Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech), pp. 1367–1370 (2001)
Google Scholar
Patterson, E.K., Gurbuz, S., Tufecki, Z., Gowdy, J.N.: CUAVE: A New Audio-visual Database for Multimodal Human Computer Interface Research. IEEE Conference on Acoustics, Speech and Signal Processing (2002)
Google Scholar
Saenko, K., Livescu, K., Glass, J., Darell, T.: Visual Speech Recognition with Loosely Synchronised Feature Streams. In: Proceedings ICCV, Beijing (2005)
Google Scholar
Sagey, E.: On the ill-formedness of crossing association lines. Linguistic Enquiry 19(1), 109–118 (1988)
Google Scholar
Schmidt, T.: The transcription system EXMARaLDA: an application of the annotation graph formalism as the basis of a database of multilingual spoken discourse. In: Proceedings of the IRCS Workshop on Linguistic Databases (2001)
Google Scholar
Van Bael, C., Boves, L., van den Heuvel, H., Strik, H.: Automatic Transcription of Large Speech Corpora. Computer Speech and Language 21(4), 652–668 (2007)
Article Google Scholar
Walsh, M., Wilson, S.: An Agent-based Framework for Audio-visual Speech Investigation. In: Proceedings of Audio Visual Speech Processing Conference (2005)
Google Scholar
Wilson, S.: Gesture-based Representations of Speech - Acquiring and Analysing Resources for Audio-visual Processing. PhD Thesis. University College Dublin (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Informatics, University College Dublin, Ireland
Stephen Wilson & Julie Carson-Berndsen

Authors

Stephen Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Julie Carson-Berndsen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Mathematics and Computer Science, Adam Mickiewicz University in Poznan, ul. Umultowska 87, 61614, Poznan, Poland
Zygmunt Vetulani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wilson, S., Carson-Berndsen, J. (2011). The Corpus Analysis Toolkit - Analysing Multilevel Annotations. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-20095-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20094-6
Online ISBN: 978-3-642-20095-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics