Topic segmentation on spoken documents using self-validated acoustic cuts

Chen, Hongjie; Xie, Lei; Feng, Wei; Zheng, Lilei; Zhang, Yanning

doi:10.1007/s00500-014-1383-9

Topic segmentation on spoken documents using self-validated acoustic cuts

Focus
Published: 25 July 2014

Volume 19, pages 47–59, (2015)
Cite this article

Soft Computing Aims and scope Submit manuscript

Hongjie Chen¹,
Lei Xie¹,
Wei Feng²,
Lilei Zheng¹ &
…
Yanning Zhang¹

364 Accesses
6 Citations
Explore all metrics

Abstract

Topic segmentation serves as a necessary prerequisite for multimedia content analysis and management. The normalized cuts (NCuts) approach has shown superior performance in topic segmentation of spoken document. However, in this method, the number of topics in a document has to be known prior to segmentation. This is impractical for real-world applications with exponential growth of multimedia data. On the other hand, previous lexical-based spoken document segmentation approaches, including NCuts, work on text transcripts generated by a large vocabulary continuous speech recognizer (LVCSR). As we know, training such a recognizer requires a large amount of transcribed speech data and language-specific knowledges. Moreover, inevitable speech recognition errors and the out-of-vocabulary (OOV) problem apparently affect the segmentation performance. This paper addresses these problems by a self-validated acoustic normalized cuts approach, namely SACuts. First, as compared with NCuts, our approach can determine the topic number in a spoken document automatically without extra computation load. Second, as compared with lexical approaches that rely on a high-resource speech recognizer, our approach can achieve comparable and even better segmentation performance using only acoustic-level information. Evaluation on a broadcast news topic segmentation task shows the superiority of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topic Segmentation of Educational Video Lectures Using Audio and Text

Speech Keyword Spotting with Rule Based Segmentation

Automatic Speech Recognition Texts Clustering

References

Bamberg P, Chow Yl, Gillick L, Roth R, Sturtevant D (1990) The dragon continuous speech recognition system: a real-time implementation. In: Proceedings of DARPA Speech and Natural Language Workshop, pp 78–81
Banerjee S, Rudnicky IA (2006) A texttiling based approach to topic boundary detection in meetings. In: Proceedings of annual conference of the International Speech Communication Association (INTERSPEECH), ISCA, pp 57–60
Beeferman D, Berger A, Lafferty J (1997) Text segmentation using exponential models. In: Proceedings of 2nd Conference of Empirical methods on natural language process. (EMNLP), pp 35–46
Beeferman D, Berger A, Lafferty J (1999) Statistical models for text segmentation. Mach Learn 34(1–3):177–210
Article MATH Google Scholar
Blei DM, Moreno PJ (2001) Topic segmentation with an aspect hidden markov model. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 343–348
Brants T, Chen F, Tsochantaridis I (2002) Topic-based document segmentation with probabilistic latent semantic analysis. In: Proceedings of 12th International conference on information and knowledge management (CIKM), ACM, pp 211–218
Chan SK, Xie L, Meng H (2007) Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation. In: Proceedings of the annual conference of the International Speech Communication Association (INTERSPEECH), ISCA, pp 2581–2584
Choi FYY (2000) Advances in domain independent linear text segmentation. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics (NAACL)., ACL, pp 26–33
Clauset A, Shalizi CR, Newman ME (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703
Article MATH MathSciNet Google Scholar
Dharanipragada S, Franz M, Mccarley J, Roukos S, Ward T (1999) Story segmentation and topic detection in the broadcast news domain. In: Proceedings of the DARPA Broadcast News Workshop, pp 65–68
Dredze M, Jansen A, Coppersmith G, Church K (2010) Nlp on spoken documents without asr. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), ACL, pp 460–470
Eisenstein J, Barzilay R (2008) Bayesian unsupervised topic segmentation. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), ACL, pp 334–343
Feng W, Liu ZQ (2006) Self-validated and spatially coherent clustering with net-structured mrf and graph cuts. In: Proceedings of the international conference on pattern recognitition, vol 4, pp 37–40
Feng W, Jia J, Liu ZQ (2010) Self-validated labeling of markov random fields for image segmentation. IEEE Trans Pattern Anal Mach Intell 32(10):1871–1887
Article Google Scholar
Fragkou P, Petridis V, Kehagias A (2004) A dynamic programming algorithm for linear text segmentation. J Intell Inform Syst 23(2):179–197
Article MATH MathSciNet Google Scholar
Harwath DF, Hazen TJ, Glass JR (2013) Zero resource spoken audio corpus analysis. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 8555–8559
Hazen TJ, Shen W, White C (2009) Query-by-example spoken term detection using phonetic posteriorgram templates. In: Proceedings of the Workshop Automatic speech recognition and understanding Workshop (ASRU), IEEE, pp 421–426
Hearst M (1997) TextTiling: segmenting text into multi-paragraph subtopic passages. Comput Linguist 23(1):33–64
Google Scholar
Heinonen O (1998) Optimal multi-paragraph text segmentation by dynamic programming. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics (COLING-ACL), Morgan Kaufmann Publishers/ACL, pp 1484–1486
Hsu W, Chang SF, Huang CW, Kennedy L, Lin CY, Iyengar G (2003) Discovery and fusion of salient multimodal features toward news story segmentation. In: Electronic Imaging 2004, International Society for Optics and Photonics, pp 244–258
Huijbregts M, McLaren M, van Leeuwen D (2011) Unsupervised acoustic sub-word unit detection for query-by-example spoken term detection. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4436–4439
Jansen A, Church K (2011) Towards unsupervised training of speaker independent acoustic models. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), ISCA, pp 1693–1692
Jansen A, Thomas S, Hermansky H (2012) Intrinsic spectral analysis for zero and high resource speech recognition. In: Proceedings of the annual conference on international speech communication association (INTERSPEECH), ISCA, pp 878–881
Kintzley K, Jansen A, Church K, Hermansky H (2012) Inverting the point process model for fast phonetic keyword search. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), ISCA, pp 2437–2440
Kozima H (1993) Text segmentation based on similarity between words. In: Proceedings of the 31st annual meeting on Association for computational linguistics, pp 286–288
Lee LS, Chen B (2005) Spoken document understanding and organization. IEEE Signal Process Mag 22(5):42–60
Article MathSciNet Google Scholar
Lin YL, Jiang T, Chao KM (2002) Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequence analysis. J Comput Syst Sci 65(3):570–586
Article MATH MathSciNet Google Scholar
Liu Z, Xie L, Feng W (2010) Maximum lexical cohesion for fine-grained news story segmentation. In: Proceedings of the annual conference on international speech communication association (INTERSPEECH), ISCA, pp 1301–1304
Lu M, Leung CC, Xie L, Ma B, Li H (2011) Probabilistic latent semantic analysis for broadcast news story segmentation. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), ISCA, pp 1109–1112
Lu X, Leung CC, Xie L, Ma B, Li H (2013) Broadcast news story segmentation using latent topics on data manifold. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 8465–8469
Malioutov I, Barzilay R (2006) Minimum cut model for spoken lecture segmentation. Proceedings of the annual meeting on association for computational linguistics, ACL, pp 25–32
Malioutov I, Park A, Barzilay R, Glass J (2007) Making sense of sound: Unsupervised topic segmentation over acoustic input. In: Proceedings of the annual meeting on association for computational linguistics, ACL, vol 45, p 504
Newman ME (2005) Power laws, pareto distributions and zipf’s law. Contemp Phys 46(5):323–351
Article Google Scholar
Park AS, Glass JR (2008) Unsupervised pattern discovery in speech. IEEE Trans Audio Speech Language Process 16(1):186–197
Article Google Scholar
Peng Yang LX, Chen H (2013) Speech pattern discovery using segmental dynamic time warping and posteriorgram features. J Tsinghua University (Sci and Technol) 53(6):903–907
Google Scholar
Petr Sch AVP (2008) Phoneme recognition based on long temporal context. PhD thesis, Brno University of Technology, Faculty of Information Technology
Ponte JM, Croft WB (1997) Text segmentation by topic. In: Proceedings of the 1st European conference on research and advanced technology for digital libraries, Springer, vol 1324, pp 113–125
Reynar JC (1994) An automatic method of finding topic boundaries. Proceedings of the annual meeting of the association for computational linguistics, ACL, pp 331–333
Rosenberg A, Hirschberg J (2006) Story segmentation of broadcast news in English, Mandarin and Arabic. In: Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics (HLT-NAACL), ACL, pp 125–128
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Stokes N, Carthy J, Smeaton A (2004) SeLeCT: a lexical cohesion based news story segmentation system. J AI Commun 17(1):3–12
MATH MathSciNet Google Scholar
TDT2 (1998) The topic detection and tracking phase 2 (tdt2) evaluation plan. http://projects.ldc.upenn.edu/TDT2
Ten Bosch L, Cranen B (2007) A computational model for unsupervised word discovery. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), ISCA, pp 1481–1484
Tür G, Hakkani-Tür D, Stolcke A, Shriberg E (2001) Integrating prosodic and lexical cues for automatic topic segmentation. Comput linguist 27(1):31–57
Article Google Scholar
Utiyama M, Isahara H (2001) A statistical model for domain-independent text segmentation. In: Proceedings of the 39th annual meeting of the association for computational linguistics, ACL, pp 499–506
Wang H, Leung CC, Lee T, Ma B, Li H (2012a) An acoustic segment modeling approach to query-by-example spoken term detection. In: Proceedings of the international conference on acoustics, speech and signal processing. (ICASSP), IEEE, pp 5157–5160
Wang X, Xie L, Ma B, Chng ES, Li H (2012b) Broadcast news story segmentation using conditional random fields and multi-modal features. IEICE Trans Inform Syst E95-D:1206–1215
Wang H, Lee T, Leung CC, Ma B, Li H (2013) Unsupervised mining of acoustic subword units with segment-level gaussian posteriorgrams. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), ISCA, pp 2297–2301
Wang X, Xie L, Ma B, Chng ES, Li H (2010) Modeling broadcast news prosody using conditional random fields for story segmentation. In: Proceedings of the Asia-Pacific signal and information processing association annual summit conference (APSIPA ASC), APSIPA, pp 253–256
Xie L, Liu C, Meng H (2007) Combined use of speaker-and tone-normalized pitch reset with pause duration for automatic story segmentation in mandarin broadcast news. In: Proceedings of the human language technologies: the conference of the North American chapter of the association for computational linguistics (HLT-NAACL), ACL, pp 193–196
Xie L, Yang Y, Liu ZQ (2011) On the effectiveness of subwords for lexical cohesion based story segmentation of Chinese broadcast news. Inform Sci 181:2873–2891
Article Google Scholar
Xie L, Zheng L, Liu Z, Zhang Y (2012) Laplacian eigenmaps for automatic story segmentation of broadcast news. IEEE Trans Audio Speech Language Process 20(1):264–277
Article Google Scholar
Yamron J, carp I, Gillick L, Mulbregt P (1998) A hidden markov model approach to text segmentation and event tracking. In: Proceedings of the international conference on acoustics, speech and signal process. (ICASSP), IEEE, pp 333–336
Zhang J, Xie L, Feng W, Zhang Y (2009) A subword normalized cut approach to automatic story segmentation of chinese broadcast news. Inform Retrieval Technol. Springer, pp 136–148
Zhang Y, Glass JR (2009) Unsupervised spoken keyword spotting via segmental dtw on gaussian posteriorgrams. In: Proceedings of the workshop on automatic Speech Recognitition and understanding (ASRU), IEEE, pp 398–403
Zhang Y, Glass JR (2010) Towards multi-speaker unsupervised speech pattern discovery. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), IEEE, pp 4366–4369
Zhang Y, Salakhutdinov R, Chang HA, Glass J (2012) Resource configurable spoken query detection using deep boltzmann machines. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), IEEE, pp 5161–5164
Zheng L, Leung CC, Xie L, Ma B, Li H (2012) Acoustic texttiling for story segmentation of spoken documents. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), IEEE, pp 5121–5124

Download references

Author information

Authors and Affiliations

School of Computer Science, Northwestern Polytechnical University, Xi’an, Shaanxi, People’s Republic of China
Hongjie Chen, Lei Xie, Lilei Zheng & Yanning Zhang
School of Computer Science, Tianjin University, Tianjin, People’s Republic of China
Wei Feng

Authors

Hongjie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Wei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Lilei Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yanning Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongjie Chen.

Additional information

Communicated by L. Xie.

This work was supported by a grant from the National Natural Science Foundation of China (61175018) and a grant from the Fok Ying Tung Education Foundation (131059).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, H., Xie, L., Feng, W. et al. Topic segmentation on spoken documents using self-validated acoustic cuts. Soft Comput 19, 47–59 (2015). https://doi.org/10.1007/s00500-014-1383-9

Download citation

Published: 25 July 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s00500-014-1383-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topic segmentation on spoken documents using self-validated acoustic cuts

Abstract

Access this article

Similar content being viewed by others

Topic Segmentation of Educational Video Lectures Using Audio and Text

Speech Keyword Spotting with Rule Based Segmentation

Automatic Speech Recognition Texts Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Topic segmentation on spoken documents using self-validated acoustic cuts

Abstract

Access this article

Similar content being viewed by others

Topic Segmentation of Educational Video Lectures Using Audio and Text

Speech Keyword Spotting with Rule Based Segmentation

Automatic Speech Recognition Texts Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation