Skip to main content

Low Inter-Annotator Agreement in Sentence Boundary Detection and Annotator Personality

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9811))

Included in the following conference series:

Abstract

The paper investigates how the annotators personality affects the result of their segmentation of unscripted speech into sentences. This task is inherently ambiguous and the disagreement between the annotators may result from a variety of factors – from speech disfluencies and linguistic properties of the text to social characteristics and the individuality of a speaker. While some boundaries are marked by the majority of annotators, there is also a substantial number of boundaries marked only by one or several experts.

In this paper we focus on sentence boundaries that are only marked by a small number of annotators. We test the hypothesis that such “uncommon” boundaries are more likely to be identified by experts with particular personality traits. We found significant relationship between uncommon boundaries and two psychological traits of annotators measured by the Big Five personality inventory: emotionality and extraversion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We follow [13] for factor names since this version of FFPQ was used as the basis for the Russian version.

References

  1. Vannikov, Y., Abdalyan, I.: Eksperimentalnoe issledovanie chleneniya razgovornoj rechi na diskretnye intonacionno-smyslovye edinicy (frazy). In: Sirotinina, O.B., Barannikova, L.I., Serdobintsev, L. Ja. (eds.) Russkaya razgovornaya rech, Saratov, pp. 40–46 (1973). (in Russian)

    Google Scholar 

  2. Guaïtella, I.: Rhythm in speech: What rhythmic organizations reveal about cognitive processes in spontaneous speech production versus reading aloud. J. Pragmat. 31, 509–523 (1999)

    Article  Google Scholar 

  3. Strassel, S., Walker, C.: Data and annotation issues in RT-03. In: EARS Rich Transcription Workshop (2003)

    Google Scholar 

  4. Liu, Y., Chawla, V.N., Harper, M.P., Shriberg, E., Stolcke, A.: A study in machine learning from imbalanced data for sentence boundary detection in speech. Comput. Speech Lang. 20(4), 468–494 (2006)

    Article  Google Scholar 

  5. Lee, A., Glass, J.: Sentence detection using multiple annotations. In: Proceedings of the Interspeech 2012, pp. 1848–1851 (2012)

    Google Scholar 

  6. Stepikhov, A.: Resolving ambiguities in sentence boundary detection in russian spontaneous speech. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 426–433. Springer, Heidelberg (2013)

    Google Scholar 

  7. Stepikhov, A.: Analysis of expert manual annotation of the russian spontaneous monologue: evidence from sentence boundary detection. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 33–40. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Stepikhov, A., Loukina, A.: Annotation and personality: individual differences in sentence boundary detection. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 105–112. Springer, Heidelberg (2014)

    Google Scholar 

  9. Evanini, K., Zechner, K.: Using crowdsourcing to provide prosodic annotations for non-native speech. In: Proceedings of the Interspeech 2011, pp. 3069–3072 (2011)

    Google Scholar 

  10. Cuendet, S., Hakkani-Tür, D., Shriberg, E.: Automatic labeling inconsistencies detection and correction for sentence unit segmentation in conversational speech. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 144–155. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Eysenck, H.J., Eysenck, S.B.G.: Manual of the Eysenck Personality Inventory. University of London Press, London (1964)

    Google Scholar 

  12. Shmelev, A.G.: Test-oprosnik Ajzenka. In: Bodalev, A.A., Karpinskaya, et al. (eds.) Praktikum po psikhodiagnostike. Psikhodiagnosticheskie materialy, pp. 11–16. MGU, Moscow (1988). (in Russian)

    Google Scholar 

  13. Tsuji, H., Fujishima, Y., Tsuji, H., Natsuno, Y., Mukoyama, Y., Yamada, N., Morita, Y., Hata, K.: Five-factor model of personality: Concept, structure, and measurement of personality traits. Jpn. Psychol. Rev. 40(2), 239–259 (1997)

    Google Scholar 

  14. Khromov, A.B.: Patifactornyj oprosnik lichnosti: Uchebno-metodicheskoe posobie. Izd-vo Kurganskogo gosudarstvennogo universiteta, Kurgan (2000). (in Russian)

    Google Scholar 

Download references

Acknowledgments

This study was supported by the Russian Foundation for Humanities, project No. 15–04–00165. We thank Keelan Evanini, Su-Youn Yoon and two anonymous reviewers for their comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton Stepikhov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Stepikhov, A., Loukina, A. (2016). Low Inter-Annotator Agreement in Sentence Boundary Detection and Annotator Personality. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43958-7_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43957-0

  • Online ISBN: 978-3-319-43958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics