Skip to main content
Log in

Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the expressions of pitch declination and reset. Our data-oriented study shows that story boundaries cannot be clearly discriminated from utterance boundaries by speaker normalized pitch features due to their large variations across different Mandarin syllable tones. We thus propose to use speaker- and tone-normalized pitch features that can provide clear separations between utterance and story boundaries. Our study also shows that speaker-normalized pause duration is quite effective to separate between story and utterance boundaries, while speaker-normalized speech energy and syllable duration are not effective. Experiments using decision trees for story boundary detection reinforce the difference between English and Chinese, i.e., speaker- and tone-normalized pitch features should be favorably adopted in Mandarin story segmentation. We show that the combination of different prosodic cues can achieve a very high F-measure of 93.04% due to the complementarity between pause, pitch and energy. Analysis of the decision tree uncovered five major heuristics that show how speakers jointly utilize pause duration and pitch to separate speech into stories.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. http://projects.ldc.upenn.edu/TDT2/

  2. Blei, D.M., Moreno, P.J.: Topic segmentation with an aspect hidden markov model. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 343–348 (2001)

  3. Buntine W., Caruana R.: Introduction to IND version 2.1 and recursive partitioning. NASA Ames Research Center, USA 3, 1157–1182 (1992)

    Google Scholar 

  4. Chan, S.K., Xie, L., Meng, H.: Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation. In: Proceedings of Interspeech. Anterwerp, Belgium (2007)

  5. Chen S.H., Tseng C., Wang H.M.: Tone modeling for speech synthesis. In: Lee, C.H., Li, H.Z., Lee, L.S., Wang, R.H., Huo Q., (eds) Advances in Chinese Spoken Language Processing., pp. 77–98. World Scientific, USA (2006)

    Google Scholar 

  6. de Cheveigné A., Kawahara H.: Yin, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4), 1917–1930 (2002)

    Article  Google Scholar 

  7. Dharanipragada, S., Franz, M., McCarley, J.S., Zhu, W.J.: Segmentation and detection at IBM: hybrid statistical models and two-tiered clustering. In: Topic detection and tracking: event-based information organization, The Kluwer International Series on Information Retrieval, pp. 135–148 (1999)

  8. Gotoh, Y., Renals, S.: Sentence boundary detection in broadcast speech transcripts. In: Proceedings of the International Speech Communication Association Workshop: Automatic Speech Recognition: Challenges for the New Millennium (2000)

  9. Hauptmann, A.G., Witbrock, M.J.: Story segmentation and detection of commercials in broadcast news video. In: Advances in Digital Libraries (1998)

  10. Hearst, M.: Textiling: segmenting text info multiparagraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)

    Google Scholar 

  11. Hsu, W., Chang, S.F., Huang, C.W., Kennedy, L., Lin, C.Y., Iyengar, G.: Discovery and fusion of salient multimodal features toward news story segmentation. In: Storage and Retrieval Methods and Applications for Multimedia 2004, Proceedings of the SPIE, vol. 5307, pp. 244–258 (2003)

  12. Huang, J.T., Lee, L.S.: Prosodic modeling in large vocabulary mandarin speech recognition. In: Proceedings of Interspeech, pp. 1241–1244, Pittsburgh, USA (2006)

  13. Hui, P.Y., Tang, X., Meng, H., Lam, W., Gao, X.: Automatic story segmentation for spoken document retrieval. In: Proceedings of the IEEE Fuzzy Conference, Melbourne (2001)

  14. Kolar, J., Shriberg, E., Liu, Y.: Using prosody for automatic sentence segmentation of multi-party meetings. In: Proceeding of the International Conference on Text, Speech, and Dialogue (TSD). Czech Republic (2006)

  15. Kozima, H.: Text segmenation based on similarity between words. In: Proc. ACL, pp. 286–288 (1993)

  16. Lee C.H., Li H., Lee L.S., Wang R.H., Huo Q.: Advances in Chinese Spoken Language Processing. World Scientific, USA (2007)

    Google Scholar 

  17. Lee L.S., Chen B.: Spoken document understanding and orgnization. IEEE Signal Process. Magazine 22(5), 42–60 (2005)

    Article  MathSciNet  Google Scholar 

  18. Lee T., Lau W., Wong Y.W., Ching P.C.: Using tone information in cantonese continuous speech recognition. ACM Trans. Asian Lang. Inform. Process. (TALIP) 1(1), 83–102 (2002)

    Article  Google Scholar 

  19. Levow, G.A.: Prosody-based topic segmentation for mandarin broadcast news. In: Proc. HLT-HAACL, pp. 137–140 (2004)

  20. Liu, Y., Shriberg, E., Stolcke, A., Harper, M.: Using machine learning to cope with imbalanced classes in natural speech: Evidence from sentence boundary and disfluency detection. In: Proceedings of ICSLP (2004)

  21. Lu, L., Zhang, H.J.: Speaker change detection and tracking in real-time news broadcasting analysis. In: Proceedings of the tenth ACM international conference on Multimedia, pp. 602–610 (2002)

  22. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1992)

  23. Rosenberg, A., Sharifi, M., Hirschberg, J.: Story segmentation of broadcast news in english, mandarin and arabic. In: HLT-NAACL, pp. 125–128 (2006)

  24. Rosenberg, A., Sharifi, M., Hirschberg, J.: Varying input segmentation for story boundary detection in english, arabic and Mandarin broadcast news. In: Interspeech, pp. 2589–2592 (2007)

  25. Shriberg, E., Bates, R., Stolcke, A.: A prosody-only decision-tree model for disfluency detection. In: Proc. Eurospeech, pp. 2383–2386, Rhodes, Greece (1997)

  26. Shriberg, E., Stolcke, A., Hakkani-Tür, D., Tür, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. 32(1–2), 127–154 (2000)

    Google Scholar 

  27. Sonmez, K., Shriberg, E., Heck, L., Weintraub, M.: Modeling dynamic prosodic variation for speaker verification. In: Proc. ICSLP, vol. 7, pp. 3189–3192 (1998)

  28. Stokes N., Carthy J., Smeaton A.F.: Select: a lexical cohesion based news story segmentation system. J. AI Commun. 17(1), 3–12 (2004)

    MATH  MathSciNet  Google Scholar 

  29. Stolcke, A., Shriberg, E., Hakkani-Tür, D., Tür, G., Rivlin, Z., Sonmez, K.: Combining words and speech prosody for automatic topic segmentation. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (1999)

  30. Swerts, M.: Prosodic features at discourse boundaries of different strength. Journal of the Acoustical Society of America vol. 101, pp. 514–521 (1997)

  31. Swerts, M., Geluykens, R., Terken, J.: Prosodic correlates of discourse units in spontaneous speech. In: Proceedings of the International Conference on Spoken Language Processing, pp. 421–424, Canada (2006)

  32. Tang X., Gao X., Liu J., Zhang H.: A spatial–temporal approach for video caption detection and recognition. IEEE Trans. Neural Netw. 13(4), 961–971 (2002)

    Article  Google Scholar 

  33. Tseng C.Y., Pin S.H., Lee Y., Wang H.M., Chen Y.C.: Fluent speech prosody: framework and modelling. Speech Commun. 46, 284–309 (2005)

    Article  Google Scholar 

  34. Tür G., Hakkani-Tür D.: Integrating prosodic and lexical cues for automatic topic segmentation. Comput. Linguist. 27(1), 31–57 (2001)

    Article  Google Scholar 

  35. Vaissiére, J.: Language-independent prosodic features. In: Cutler, A., Ladd, D.R. (eds.) Prosody: Models andMeasurements, pp. 53–66. Springer, Berlin (1983)

  36. Xie, L., Zeng, J., Feng, W.: Multi-scale TextTiling for automatic sroty segmentation in chinese broascast news. In: Proc. Asia Information Retrieval Symposium, pp. 345–355 (2007)

  37. Yamron, J., Carp, I., Gillick, L., Lowe, S., van Mulbregt, P.: A hidden markov model approach to text segmentation and event tracking. In: Proc. ICASSP, vol. 1, pp. 333–336 (1998)

  38. Zimmerman, M., Hakkani-Tür, D., Fung, J., Mirghafori, N., Gottlieb L., Shriberg, E., Liu, Y.: The ICSI+ multilingual sentence segmentation system. In: Proc. Interspeech, pp. 117–120 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Xie.

Additional information

Communicated by Changsheng Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, L. Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news. Multimedia Systems 14, 237–253 (2008). https://doi.org/10.1007/s00530-008-0141-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-008-0141-1

Keywords

Navigation