Skip to main content
Log in

Sentence boundary detection in conversational speech transcripts using noisily labeled examples

  • Original Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

This paper presents a technique for adding sentence boundaries to text obtained by Automatic Speech Recognition (ASR) of conversational speech audio. We show that starting with imprecise boundary information, added using only silence information from an ASR system, we can improve boundary detection using Head and Tail phrases. We develop our technique and show its effectiveness on two manually transcribed and one automatically transcribed corpus. The main purpose of adding sentence boundaries to ASR transcripts is to improve linguistic analysis, namely information extraction, for text mining systems that handle huge volumes of textual data and analyze trends and features of the concepts. Hence, we also show how the addition of boundaries improves two basic natural language processing tasks—PoS label assignment and adjective-noun extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Gotoh, Y., Renals, S.: Sentence boundary detection in broadcast speech transcripts. In: Proceedings of International Speech Communication Association (ISCA) Workshop: Authomatic Speech Recognition: Challanges for the New Millenium (ASR-2000), Paris, France (2000)

  2. Hain, T., Woodland, P., Evermann, G., Povey, D.: The CU-HTK March 2000 Hub5E transcription system. In: Proceedings of Speech Transcription Workshop (2000)

  3. Hillard, D., Huang, Z., Ji, H., Grishman, R., Hakkani-Tur, D., Harper, M., Ostendorf, M., Wang, W.: Impact of automatic comma prediction on pos/name tagging of speech. In: Proceedings of IEEE Workshop on Spoken Language Technology (SLT) (2006)

  4. Huang, J., Zweig, G.: Maximum entropy model for punctuation annotation from speech. In: Proceedings of the International Conference on Spoken Language Processing, pp. 917–920 (2002)

  5. Kahn, J.G., Ostendorf, M., Chelba, C.: Parsing conversational speech using enhanced segmentation. In: Proc. of Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics Annual Meeting, pp. 125–128 (2004)

  6. Kim, J., Schwarm, S.E., Ostendorf, M.: Detecting structural metadata with decision trees and transformation-based learning. In: HLT-NAACL, pp. 137–144 (2004)

  7. Liu Y., Shriberg E., Stolcke A., Hillard D., Ostendorf M. and Harper M. (2006). Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Trans. Audio Speech Lang. Process. 14(5): 1526–1540

    Article  Google Scholar 

  8. Liu, Y., Stolcke, A., Shriberg, E., Harper, M.: Comparing and combining generative and posterior probability models: Some advances in sentence boundary detection in speech. In: Proc. of EMNLP, Barcelona, Spain, pp. 64–71 (2004)

  9. Liu, Y., Stolcke, A., Shriberg, E., Harper, M.: Using conditional random fields for sentence boundary detection in speech. In: Proceedings of ACL-05, Ann Arbor, MI, USA, pp. 451–458 (2005)

  10. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification (1998). http://www.citeseer.ist.psu.edu/mccallum98comparison.html

  11. Nasukawa, T., Nagano, T.: Text analysis and knowledge mining system. IBM Syst. J. pp. 967–984 (2001)

  12. Ott, L., Ott, R.L., Longnecker, M.T.: An Introduction to Statistical Methods and Data Analysis. Duxbury Pr, Florence (2000)

  13. Roy, S., Subramaniam, L.V.: Automatic generation of domain models for call centers from noisy transcriptions. In: Proceedings of COLING/ACL 06, Sydney, Australia, pp. 737–744 (2006)

  14. Shriberg, E., Stolcke, A., Hakkani-Tur, D., Tur, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. pp. 127–154 (2000)

  15. Strassel, S.: Simple metadata annotation specification. Annotation guide, Linguistic Data Consortium (2003). Version 5.0—http://www.ldc.upenn.edu/Projects/MDE/

  16. Takeuchi, H., Subramaniam, L.V., Nasukawa, T., Roy, S.: Automatic identification of important segments and expressions for mining of business-oriented conversations at contact centers. In: Proc. of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 458–467 (2007)

  17. Takeuchi, H., Subramaniam, L.V., Nasukawa, T., Roy, S., Balakrishnan, S.: A conversation-mining system for gathering insights to improve agent productivity. In: Proc. of the IEEE Joint Conference on E-Commerce Technology (CEC’07) and Enterprise Computing, E-Commerce and E-Services (EEE ’07) (2007)

  18. Toutanova, K., Klein, D., Manning, C., Singer., Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL, Edmonton, Canada, pp. 252–259 (2003)

  19. Toutanova, K., Manning, C.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), Hong Kong, pp. 63–70 (2000)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L. Venkata Subramaniam.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Takeuchi, H., Subramaniam, L.V., Roy, S. et al. Sentence boundary detection in conversational speech transcripts using noisily labeled examples. IJDAR 10, 147–155 (2007). https://doi.org/10.1007/s10032-007-0056-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-007-0056-y

Keywords

Navigation