Text segmentation of spoken meeting transcripts

Sharp, Bernadette; Chibelushi, Caroline

doi:10.1007/s10772-009-9048-2

Text segmentation of spoken meeting transcripts

Published: 12 November 2009

Volume 11, article number 157, (2008)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Bernadette Sharp¹ &
Caroline Chibelushi¹

190 Accesses
2 Citations
Explore all metrics

Abstract

Text segmentation has played an important role in information retrieval as well as natural language processing. Current segmentation methods are well suited for written and structured texts making use of their distinctive macro-level structures; however text segmentation of transcribed multi-party conversation presents a different challenge given its ill-formed sentences and the lack of macro-level text units. This paper describes an algorithm suitable for segmenting spoken meeting transcripts combining semantically complex lexical relations with speech cue phrases to build lexical chains in determining topic boundaries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Arguello, J., & Rosé, C. (2006). Topic segmentation of dialogue. In Proceedings of the analyzing conversations in text and speech (ACTS) workshop at HLT-NAACL 2006. New York (pp. 42–49).
Beeferman, D., Berger, A., & Laffety, J. (1997). Text segmentation using exponential models. In EMNLP-2 proceedings of the 2nd conference on empirical methods in natural language processing (pp. 35–46).
Beeferman, D., Berger, A., & Laffety, J. (1999). Statistical models for text segmentation. Machine Learning, Special Issue on Natural Language Processing, 34(1–3), 177–210.
MATH Google Scholar
Bengel, J., Gauch, S., Mittur, E., & Vijayaraghavan, R. C. (2004). Chat room topic detection using classification. In Proceedings of the 2nd symposium on intelligence and security informatics (ISI-2004). Tucson, Arizona (pp. 266–277).
Bilan, Z., & Nakagawa, M. (2005). Segmentation of on-line handwritten Japanese text of arbitrary line direction by a neural network for improving text recognition. In Proceedings of the eighth international conference on document analysis and recognition (pp. 157–161).
Boehm, B. W., & Basili, V. R. (2001). Software defect reduction. IEEE Computer, 34(1), 135–137.
Google Scholar
Boufaden, N., Lapalme, G., & Bengio, Y. (2001). Topic segmentation: A first stage to dialog-based information extraction. In Proceedings of the natural language processing rim symposium, NLPRS’01 (pp. 273–280).
Chai, J. Y., & Jin, R. (2004). Discourse structure for context question answering. In HLT-NAACL’04 workshop on pragmatics of question answering (pp. 23–30).
Chibelushi, C. (2008). Text mining for meeting transcripts analysis to support decision management. PhD thesis, Stafford: Staffordshire University.
Choi, F., Wiemer-Hastings, P., & Moore, J. (2001). Latent semantic analysis for text segmentation. In Proceedings of the 6th conference on empirical methods in natural language processing (pp. 109–117).
Crystal, D. (1991). A dictionary of linguistics and phonetics (3rd ed.). Cambridge: Basil Blackwell.
Google Scholar
Eisenstein, J. (2009). Hierarchical text segmentation from multi-scale lexical cohesion. In Human language technologies: The 2009 annual conference of the North American chapter of the ACL. Boulder, Colorado (pp. 353–361).
Fellbaum, C. D. (1998). A lexical database of English: The mother of all WordNets. In P. Vossen (Ed.), Special issue of computers and the humanities (pp. 209–220). Dordrecht: Kluwer.
Google Scholar
Flammia, G. (1998). Discourse segmentation on spoken language: An empirical approach. PhD Thesis, Massachusetts Institute of Technology.
Fraser, B. (1996). Pragmatic markers. Pragmatics, 6, 167–190.
Google Scholar
Galley, M., McKeown, K. Fosler-Lussier, E., & Jing, H. (2003). Discourse segmentation of multi-party conversation. In Proceedings of the ACL (pp. 562–569).
Gruenstein, A., Niekrasz, J., & Purver, M. (2005). Meeting structure annotation: Data and Tools. In Proceedings of the 6th SIGdial workshop on discourse and dialogue (pp. 117–127).
Halliday, M., & Hasan, R. (1976). Cohesion in English. London: Longman.
Google Scholar
Hearst, M. (1994). Multi-paragraph segmentation of expository text. In Proceedings of the 32nd annual meeting of the association for computational linguistics. Las Cruces, New Mexico (pp. 9–16).
Hearst, M. (1997). TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1), 33–64.
Google Scholar
Hearst, M. A. (2002). A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1), 19–36.
Article Google Scholar
Hirschberg, J., & Litman, D. (1993). Empirical studies on the disambiguation and cue phrases. Computational Linguistics, 19, 501–530.
Google Scholar
Kan, M., Klavans, J. L., & McKeown, K. R. (1998). Linear segmentation and segment relevance. In Proceedings of the sixth workshop on very large corpora (pp. 197–205).
Kawahara, T., Nanjo, H., & Furui, S. (2001). Automatic transcription of spontaneous lecture speech. In Proceedings of the IEEE workshop on automatic speech recognition and understanding (pp. 186–189).
Lampert, A., Dale, R., & Paris, C. (2009). Segmenting email message text into zones. In Proceedings of empirical methods in natural language processing, Singapore, August 6–7.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.
Article Google Scholar
Levow, G. A. (2004). Prosodic cues to discourse segment boundaries in human-computer dialogue. In Proceedings of the 5th sigdial workshop on discourse and dialogue (pp. 93–96).
Manning, C. (1998). Rethinking text segmentation models: An information extraction case study (Technical Report SULTRY-98-07-01). University of Sydney.
Morris, J., & Hirst, G. (1991). Lexical cohesion, the thesaurus, and the structure of text. Computational Linguistics, 17(1), 211–232.
Google Scholar
Mulbregt, P., Carp, I., Gillick, L., Lowe, S., & Yamron, J. (1998). Text segmentation and topic tracking on broadcast news via hidden Markov model approach. Proceedings of the ICSLP-98, 6, 2519–2522.
Google Scholar
Oard, D., Ramabhadran, B., & Gustman, S. (2004). Building an information retrieval test collection for spontaneous conversational speech. In Proceedings of the 27th annual international. ACM SIGIR conference on research and development in information retrieval. Sheffield (pp. 41–48).
Passoneau, R., & Litman, D. (1997). Discourse segmentation by human and automated means. Computational Linguistics, 23(1), 103–139.
Google Scholar
Pevzner, L., & Hearst, M. (2002). Evaluation metric for text segmentation. Computational Linguistics, 1(28), 19–36.
Article Google Scholar
Rayson, P. (2003). Matrix: A statistical method and software tool for linguistic analysis through corpus comparison. PhD thesis. Lancaster: Lancaster University.
Reynar, J. (1999). Statistical models for topic segmentation. In Proceedings of the association for computational linguistics (pp. 357–364).
Reynar, J. (1998). Topic segmentation: Algorithms and applications. PhD Thesis. University of Pennsylvania.
Senda, S., & Yamada, K. (2001). A Maximum-likelihood approach to segmentation-based recognition of unconstrained handwriting text. In Proceedings of the sixth international conference on document analysis and recognition (pp. 184–188).
Sharp, B. (1989). Elaboration and testing of new methodologies in automatic abstracting. PhD Thesis. Birmingham: Aston University.
Stokes, N. (2003). Spoken and written news story segmentation using lexical chains. In HLT-NAACL proceedings, student research workshop. Edmonton (pp. 49–54).
Stokes, N. (2004). Applications of lexical cohesion analysis in the topic detection and tracking domain. PhD Thesis. Dublin: University College Dublin.
Strayer, S. E., Heeman, P. A., & Yang, F. (2003). Reconciling control and discourse structure. In J. van Kuppevelt & R. Smith (Eds.), Current and new directions in discourse and dialogue (pp. 305–323). Dordrecht: Kluwer.
Google Scholar
Tsenga, Y. H., Linb, C. J., & Lin, Y. L. (2007). Text mining techniques for patent analysis. Information Processing & Management, 43(5), 1216–1247.
Article Google Scholar
Yamron, J., Carp, I., Gillick, L., Lowe, S., & Mulbregt, P. V. (1998). A hidden Markov model approach to text segmentation and event tracking. In Proceedings of ICASSP’98 (pp. 333–336).
Youmans, G. (1991). A new tool for discourse analysis: The vocabulary management profile. Languages, 763–789.
Zechner, K. (2001). Automatic summarization of spoken dialogues in unrestricted domains. PhD Thesis. Carnegie Mellon University.

Download references

Author information

Authors and Affiliations

FCET, Staffordshire University, Beaconside, Stafford, ST18 0AD, UK
Bernadette Sharp & Caroline Chibelushi

Authors

Bernadette Sharp
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Chibelushi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bernadette Sharp.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharp, B., Chibelushi, C. Text segmentation of spoken meeting transcripts. Int J Speech Technol 11, 157 (2008). https://doi.org/10.1007/s10772-009-9048-2

Download citation

Received: 17 July 2009
Accepted: 13 October 2009
Published: 12 November 2009
DOI: https://doi.org/10.1007/s10772-009-9048-2

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text segmentation of spoken meeting transcripts

Abstract

Access this article

Similar content being viewed by others

An Analysis of Various Text Segmentation Approaches

Automatic Speech Recognition Texts Clustering

Topic segmentation on spoken documents using self-validated acoustic cuts

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Text segmentation of spoken meeting transcripts

Abstract

Access this article

Similar content being viewed by others

An Analysis of Various Text Segmentation Approaches

Automatic Speech Recognition Texts Clustering

Topic segmentation on spoken documents using self-validated acoustic cuts

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation