Skip to main content

A Subword Normalized Cut Approach to Automatic Story Segmentation of Chinese Broadcast News

  • Conference paper
Information Retrieval Technology (AIRS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5839))

Included in the following conference series:

Abstract

This paper presents a subword normalized cut (N-cut) approach to automatic story segmentation of Chinese broadcast news (BN). We represent a speech recognition transcript using a weighted undirected graph, where the nodes correspond to sentences and the weights of edges describe inter-sentence similarities. Story segmentation is formalized as a graph-partitioning problem under the N-cut criterion, which simultaneously minimizes the similarity across different partitions and maximizes the similarity within each partition. We measure inter-sentence similarities and perform N-cut segmentation on the character/syllable (i.e. subword units) overlapping n-gram sequences. Our method works at the subword levels because subword matching is robust to speech recognition errors and out-of-vocabulary words. Experiments on the TDT2 Mandarin BN corpus show that syllable-bigram-based N-cut achieves the best F1-measure of 0.6911 with relative improvement of 11.52% over previous word-based N-cut that has an F1-measure of 0.6197. N-cut at the subword levels is more effective than the word level for story segmentation of noisy Chinese BN transcripts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Hsu, W., Chang, S., Huang, C., Kennedy, L., Lin, C., Iyengar, G.: Discovery and fusion of salient multi-modal features towards news story segmentation. In: SPIE Electronic Imaging (2004)

    Google Scholar 

  2. Xie, L., Liu, C., Meng, H.: Combined use of speaker-and tone-normalized pitch reset with pause duration for automatic story segmentation in Mandarin broadcast news. In: Proc. HLT-NAACL, pp. 193–196 (2007)

    Google Scholar 

  3. Hearst, M.: TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1), 33–64 (1997)

    Google Scholar 

  4. Dharanipragada, S., Franz, M., Mccarley, J., Roukos, S., Ward, T.: Story segmentation and topic detection in the broadcast news domain. In: Proc. DARPA Broadcast News Workshop (1999)

    Google Scholar 

  5. Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proc. NAACL, pp. 26–33 (2000)

    Google Scholar 

  6. Malioutov, I., Barzilay, R.: Minimum cut model for spoken lecture segmentation. In: Proc. ACL, pp. 25–32 (2006)

    Google Scholar 

  7. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)

    Article  Google Scholar 

  8. Choi, F., Wiemer-Hastings, P., Moore, J.: Latent semantic analysis for text segmentation. In: Proc. EMNLP (2001)

    Google Scholar 

  9. Ng, K., Zue, V.W.: Subword-based approaches for spoken document retrieval. Speech Communication 32(3), 157–186 (2000)

    Article  Google Scholar 

  10. Xie, L., Zeng, J., Feng, W.: Multi-scale TextTiling for Automatic Story Segmentation in Chinese Broadcast News. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 345–355. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Yang, Y., Xie, L.: Subword latent semantic analysis for texttiling-based automatic story segmentation of chinese broadcast news. In: Proc. ISCSLP, pp. 358–361 (2008)

    Google Scholar 

  12. Stokes, N., Carthy, J., Smeaton, A.: Select: A lexical cohesion based news story segmentation system. Journal of AI Communication 17(1), 3–12 (2004)

    MathSciNet  MATH  Google Scholar 

  13. Feng, W., Liu, Z.Q.: Self-validated and spatially coherent clustering with net-structured MRF and graph cuts. In: Proc. ICPR, vol. 4, pp. 37–40 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, J., Xie, L., Feng, W., Zhang, Y. (2009). A Subword Normalized Cut Approach to Automatic Story Segmentation of Chinese Broadcast News. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04769-5_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04768-8

  • Online ISBN: 978-3-642-04769-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics