Skip to main content
Log in

Topic segmentation on spoken documents using self-validated acoustic cuts

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Topic segmentation serves as a necessary prerequisite for multimedia content analysis and management. The normalized cuts (NCuts) approach has shown superior performance in topic segmentation of spoken document. However, in this method, the number of topics in a document has to be known prior to segmentation. This is impractical for real-world applications with exponential growth of multimedia data. On the other hand, previous lexical-based spoken document segmentation approaches, including NCuts, work on text transcripts generated by a large vocabulary continuous speech recognizer (LVCSR). As we know, training such a recognizer requires a large amount of transcribed speech data and language-specific knowledges. Moreover, inevitable speech recognition errors and the out-of-vocabulary (OOV) problem apparently affect the segmentation performance. This paper addresses these problems by a self-validated acoustic normalized cuts approach, namely SACuts. First, as compared with NCuts, our approach can determine the topic number in a spoken document automatically without extra computation load. Second, as compared with lexical approaches that rely on a high-resource speech recognizer, our approach can achieve comparable and even better segmentation performance using only acoustic-level information. Evaluation on a broadcast news topic segmentation task shows the superiority of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Bamberg P, Chow Yl, Gillick L, Roth R, Sturtevant D (1990) The dragon continuous speech recognition system: a real-time implementation. In: Proceedings of DARPA Speech and Natural Language Workshop, pp 78–81

  • Banerjee S, Rudnicky IA (2006) A texttiling based approach to topic boundary detection in meetings. In: Proceedings of annual conference of the International Speech Communication Association (INTERSPEECH), ISCA, pp 57–60

  • Beeferman D, Berger A, Lafferty J (1997) Text segmentation using exponential models. In: Proceedings of 2nd Conference of Empirical methods on natural language process. (EMNLP), pp 35–46

  • Beeferman D, Berger A, Lafferty J (1999) Statistical models for text segmentation. Mach Learn 34(1–3):177–210

    Article  MATH  Google Scholar 

  • Blei DM, Moreno PJ (2001) Topic segmentation with an aspect hidden markov model. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 343–348

  • Brants T, Chen F, Tsochantaridis I (2002) Topic-based document segmentation with probabilistic latent semantic analysis. In: Proceedings of 12th International conference on information and knowledge management (CIKM), ACM, pp 211–218

  • Chan SK, Xie L, Meng H (2007) Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation. In: Proceedings of the annual conference of the International Speech Communication Association (INTERSPEECH), ISCA, pp 2581–2584

  • Choi FYY (2000) Advances in domain independent linear text segmentation. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics (NAACL)., ACL, pp 26–33

  • Clauset A, Shalizi CR, Newman ME (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703

    Article  MATH  MathSciNet  Google Scholar 

  • Dharanipragada S, Franz M, Mccarley J, Roukos S, Ward T (1999) Story segmentation and topic detection in the broadcast news domain. In: Proceedings of the DARPA Broadcast News Workshop, pp 65–68

  • Dredze M, Jansen A, Coppersmith G, Church K (2010) Nlp on spoken documents without asr. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), ACL, pp 460–470

  • Eisenstein J, Barzilay R (2008) Bayesian unsupervised topic segmentation. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), ACL, pp 334–343

  • Feng W, Liu ZQ (2006) Self-validated and spatially coherent clustering with net-structured mrf and graph cuts. In: Proceedings of the international conference on pattern recognitition, vol 4, pp 37–40

  • Feng W, Jia J, Liu ZQ (2010) Self-validated labeling of markov random fields for image segmentation. IEEE Trans Pattern Anal Mach Intell 32(10):1871–1887

    Article  Google Scholar 

  • Fragkou P, Petridis V, Kehagias A (2004) A dynamic programming algorithm for linear text segmentation. J Intell Inform Syst 23(2):179–197

    Article  MATH  MathSciNet  Google Scholar 

  • Harwath DF, Hazen TJ, Glass JR (2013) Zero resource spoken audio corpus analysis. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 8555–8559

  • Hazen TJ, Shen W, White C (2009) Query-by-example spoken term detection using phonetic posteriorgram templates. In: Proceedings of the Workshop Automatic speech recognition and understanding Workshop (ASRU), IEEE, pp 421–426

  • Hearst M (1997) TextTiling: segmenting text into multi-paragraph subtopic passages. Comput Linguist 23(1):33–64

    Google Scholar 

  • Heinonen O (1998) Optimal multi-paragraph text segmentation by dynamic programming. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics (COLING-ACL), Morgan Kaufmann Publishers/ACL, pp 1484–1486

  • Hsu W, Chang SF, Huang CW, Kennedy L, Lin CY, Iyengar G (2003) Discovery and fusion of salient multimodal features toward news story segmentation. In: Electronic Imaging 2004, International Society for Optics and Photonics, pp 244–258

  • Huijbregts M, McLaren M, van Leeuwen D (2011) Unsupervised acoustic sub-word unit detection for query-by-example spoken term detection. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4436–4439

  • Jansen A, Church K (2011) Towards unsupervised training of speaker independent acoustic models. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), ISCA, pp 1693–1692

  • Jansen A, Thomas S, Hermansky H (2012) Intrinsic spectral analysis for zero and high resource speech recognition. In: Proceedings of the annual conference on international speech communication association (INTERSPEECH), ISCA, pp 878–881

  • Kintzley K, Jansen A, Church K, Hermansky H (2012) Inverting the point process model for fast phonetic keyword search. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), ISCA, pp 2437–2440

  • Kozima H (1993) Text segmentation based on similarity between words. In: Proceedings of the 31st annual meeting on Association for computational linguistics, pp 286–288

  • Lee LS, Chen B (2005) Spoken document understanding and organization. IEEE Signal Process Mag 22(5):42–60

    Article  MathSciNet  Google Scholar 

  • Lin YL, Jiang T, Chao KM (2002) Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequence analysis. J Comput Syst Sci 65(3):570–586

    Article  MATH  MathSciNet  Google Scholar 

  • Liu Z, Xie L, Feng W (2010) Maximum lexical cohesion for fine-grained news story segmentation. In: Proceedings of the annual conference on international speech communication association (INTERSPEECH), ISCA, pp 1301–1304

  • Lu M, Leung CC, Xie L, Ma B, Li H (2011) Probabilistic latent semantic analysis for broadcast news story segmentation. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), ISCA, pp 1109–1112

  • Lu X, Leung CC, Xie L, Ma B, Li H (2013) Broadcast news story segmentation using latent topics on data manifold. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 8465–8469

  • Malioutov I, Barzilay R (2006) Minimum cut model for spoken lecture segmentation. Proceedings of the annual meeting on association for computational linguistics, ACL, pp 25–32

  • Malioutov I, Park A, Barzilay R, Glass J (2007) Making sense of sound: Unsupervised topic segmentation over acoustic input. In: Proceedings of the annual meeting on association for computational linguistics, ACL, vol 45, p 504

  • Newman ME (2005) Power laws, pareto distributions and zipf’s law. Contemp Phys 46(5):323–351

    Article  Google Scholar 

  • Park AS, Glass JR (2008) Unsupervised pattern discovery in speech. IEEE Trans Audio Speech Language Process 16(1):186–197

    Article  Google Scholar 

  • Peng Yang LX, Chen H (2013) Speech pattern discovery using segmental dynamic time warping and posteriorgram features. J Tsinghua University (Sci and Technol) 53(6):903–907

    Google Scholar 

  • Petr Sch AVP (2008) Phoneme recognition based on long temporal context. PhD thesis, Brno University of Technology, Faculty of Information Technology

  • Ponte JM, Croft WB (1997) Text segmentation by topic. In: Proceedings of the 1st European conference on research and advanced technology for digital libraries, Springer, vol 1324, pp 113–125

  • Reynar JC (1994) An automatic method of finding topic boundaries. Proceedings of the annual meeting of the association for computational linguistics, ACL, pp 331–333

  • Rosenberg A, Hirschberg J (2006) Story segmentation of broadcast news in English, Mandarin and Arabic. In: Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics (HLT-NAACL), ACL, pp 125–128

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  • Stokes N, Carthy J, Smeaton A (2004) SeLeCT: a lexical cohesion based news story segmentation system. J AI Commun 17(1):3–12

    MATH  MathSciNet  Google Scholar 

  • TDT2 (1998) The topic detection and tracking phase 2 (tdt2) evaluation plan. http://projects.ldc.upenn.edu/TDT2

  • Ten Bosch L, Cranen B (2007) A computational model for unsupervised word discovery. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), ISCA, pp 1481–1484

  • Tür G, Hakkani-Tür D, Stolcke A, Shriberg E (2001) Integrating prosodic and lexical cues for automatic topic segmentation. Comput linguist 27(1):31–57

    Article  Google Scholar 

  • Utiyama M, Isahara H (2001) A statistical model for domain-independent text segmentation. In: Proceedings of the 39th annual meeting of the association for computational linguistics, ACL, pp 499–506

  • Wang H, Leung CC, Lee T, Ma B, Li H (2012a) An acoustic segment modeling approach to query-by-example spoken term detection. In: Proceedings of the international conference on acoustics, speech and signal processing. (ICASSP), IEEE, pp 5157–5160

  • Wang X, Xie L, Ma B, Chng ES, Li H (2012b) Broadcast news story segmentation using conditional random fields and multi-modal features. IEICE Trans Inform Syst E95-D:1206–1215

  • Wang H, Lee T, Leung CC, Ma B, Li H (2013) Unsupervised mining of acoustic subword units with segment-level gaussian posteriorgrams. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), ISCA, pp 2297–2301

  • Wang X, Xie L, Ma B, Chng ES, Li H (2010) Modeling broadcast news prosody using conditional random fields for story segmentation. In: Proceedings of the Asia-Pacific signal and information processing association annual summit conference (APSIPA ASC), APSIPA, pp 253–256

  • Xie L, Liu C, Meng H (2007) Combined use of speaker-and tone-normalized pitch reset with pause duration for automatic story segmentation in mandarin broadcast news. In: Proceedings of the human language technologies: the conference of the North American chapter of the association for computational linguistics (HLT-NAACL), ACL, pp 193–196

  • Xie L, Yang Y, Liu ZQ (2011) On the effectiveness of subwords for lexical cohesion based story segmentation of Chinese broadcast news. Inform Sci 181:2873–2891

    Article  Google Scholar 

  • Xie L, Zheng L, Liu Z, Zhang Y (2012) Laplacian eigenmaps for automatic story segmentation of broadcast news. IEEE Trans Audio Speech Language Process 20(1):264–277

    Article  Google Scholar 

  • Yamron J, carp I, Gillick L, Mulbregt P (1998) A hidden markov model approach to text segmentation and event tracking. In: Proceedings of the international conference on acoustics, speech and signal process. (ICASSP), IEEE, pp 333–336

  • Zhang J, Xie L, Feng W, Zhang Y (2009) A subword normalized cut approach to automatic story segmentation of chinese broadcast news. Inform Retrieval Technol. Springer, pp 136–148

  • Zhang Y, Glass JR (2009) Unsupervised spoken keyword spotting via segmental dtw on gaussian posteriorgrams. In: Proceedings of the workshop on automatic Speech Recognitition and understanding (ASRU), IEEE, pp 398–403

  • Zhang Y, Glass JR (2010) Towards multi-speaker unsupervised speech pattern discovery. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), IEEE, pp 4366–4369

  • Zhang Y, Salakhutdinov R, Chang HA, Glass J (2012) Resource configurable spoken query detection using deep boltzmann machines. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), IEEE, pp 5161–5164

  • Zheng L, Leung CC, Xie L, Ma B, Li H (2012) Acoustic texttiling for story segmentation of spoken documents. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), IEEE, pp 5121–5124

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongjie Chen.

Additional information

Communicated by L. Xie.

This work was supported by a grant from the National Natural Science Foundation of China (61175018) and a grant from the Fok Ying Tung Education Foundation (131059).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Xie, L., Feng, W. et al. Topic segmentation on spoken documents using self-validated acoustic cuts. Soft Comput 19, 47–59 (2015). https://doi.org/10.1007/s00500-014-1383-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-014-1383-9

Keywords

Navigation