Using Multiple Discriminant Analysis Approach for Linear Text Segmentation

Jingbo, Zhu; Na, Ye; Xinzhi, Chang; Wenliang, Chen; Tsou, Benjamin K

doi:10.1007/11562214_26

Zhu Jingbo²²,
Ye Na²²,
Chang Xinzhi²²,
Chen Wenliang²² &
…
Benjamin K Tsou²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

International Conference on Natural Language Processing

1664 Accesses

Abstract

Research on linear text segmentation has been an on-going focus in NLP for the last decade, and it has great potential for a wide range of applications such as document summarization, information retrieval and text understanding. However, for linear text segmentation, there are two critical problems involving automatic boundary detection and automatic determination of the number of segments in a document. In this paper, we propose a new domain-independent statistical model for linear text segmentation. In our model, Multiple Discriminant Analysis (MDA) criterion function is used to achieve global optimization in finding the best segmentation by means of the largest word similarity within a segment and the smallest word similarity between segments. To alleviate the high computational complexity problem introduced by the model, genetic algorithms (GAs) are used. Comparative experimental results show that our method based on MDA criterion functions has achieved higher P_k measure (Beeferman) than that of the baseline system using TextTiling algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Performance Comparison of Segmentation Techniques for the Urdu Text

Text segmentation by integrating hybrid strategy and non-text filtering

Article 16 June 2022

Text Segmentation Techniques: A Critical Review

References

Salton, G., Singhal, A., Buckley, C., Mitra, M.: Automatic text decomposition using text segments and text themes. In: Proceedings of the seventh ACM conference on Hypertext, Bethesda, Maryland, United States, pp. 53–65 (1996)
Google Scholar
Hearst, M.A.: Multi-paragraph segmentation of expository text. In: Proceedings of the 32th Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, pp. 9–16 (1994)
Google Scholar
Hearst, M.A.: TextTiling: segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1), 33–64 (1997)
Google Scholar
Youmans, G.: A new tool for discourse analysis: The vocabulary management profile. Language 67(4), 763–789 (1991)
Article Google Scholar
Morris, J., Hirst, G.: Lexical cohesion computed by thesauri relations as an indicator of the structure of text. Computational Linguistics 17(1), 21–42 (1991)
Google Scholar
Kozima, H.: Text segmentation based on similarity between words. In: Proceedings of the 31th Annual Meeting of the Association for Computational Linguistics, Student Session, pp. 286–288 (1993)
Google Scholar
Reynar, J.C.: An automatic method of finding topic boundaries. In: Proceedings of the 32 nd Annual Meeting of the Association for Computational Linguistics, Student Session, Las Cruces, New Mexico, pp. 331–333 (1994)
Google Scholar
Beeferman, D., Berger, A., Lafferty, J.: Text segmentation using exponential models. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages, Providence, Rhode Island, pp. 35–46 (1997)
Google Scholar
Passoneau, R., Litman, D.J.: Intention-based segmentation: Human reliability and correlation with linguistic cues. In: Proceedings of the 31st Meeting of the Association for Computational Linguistics, pp. 148–155 (1993)
Google Scholar
Ponte, J.M., Croft, B.W.: Text segmentation by topic. In: Proceeding of the first European conference on research and advanced technology for digital libraries. U.Mass. Computer Science Technical Report TR97-18 (1997)
Google Scholar
Reynar, J.C.: Statistical models for topic segmentation. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 357–364 (1999)
Google Scholar
Hirschberg, J., Grosz, B.: Intentional features of local and global discourse. In: Proceedings of the Workshop on Spoken Language Systems, pp. 441–446 (1992)
Google Scholar
Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proc. of NAACL-2000 (2000)
Google Scholar
Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J.: Latent semantic analysis for text segmentation. In: Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing, pp. 109–117 (2001)
Google Scholar
Blei, D.M., Moreno, P.J.: Topic segmentation with an aspect hidden Markov model. Tech. Rep. CRL 2001-07, COMPAQ Cambridge Research Lab (2001)
Google Scholar
Yaari, Y.: Segmentation of expository texts by hierarchical agglomerative clustering. In: Proceedings of the conference on recent advances in natural language processing, pp. 59–65 (1997)
Google Scholar
Heinonen, O.: Optimal multi-paragraph text segmentation by dynamic programming. In: Proceedings of 17^th international conference on computational linguistics, pp. 1484–1486 (1998)
Google Scholar
Utiyama, M., Isahara, H.: A statistical model for domain-independent text segmentation. In: Proceedings of the 9^th conference of the European chapter of the association for computational linguistics, pp. 491–498 (2001)
Google Scholar
Kehagias, A., Fragkou, P., Petridis, V.: Linear Text Segmentation using a Dynamic Programming Algorithm. In: Proceedings of 10th Conference of European chapter of the association for computational linguistics (2003)
Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley & Sons, Chichester (2001)
MATH Google Scholar
Tol, J.T., Gonzaiez, R.C.: Pattern recognition principles. Addison-Wesley Publishing Company, Reading (1974)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Tianshun, Y., Jingbo, Z., li, Z., Ying, Y.: Natural language processing-research on making computers understand human languages. Tsinghua university press (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Natural Language Processing Laboratory, Institute of Computer Software and Theory, Northeastern University, Shenyang, P.R. China
Zhu Jingbo, Ye Na, Chang Xinzhi & Chen Wenliang
Language Information Sciences Research Centre, City University of Hong Kong, HK
Benjamin K Tsou

Authors

Zhu Jingbo
View author publications
You can also search for this author in PubMed Google Scholar
Ye Na
View author publications
You can also search for this author in PubMed Google Scholar
Chang Xinzhi
View author publications
You can also search for this author in PubMed Google Scholar
Chen Wenliang
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin K Tsou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Language Technology, Macquarie University, 2019, Sydney, NSW, Australia
Robert Dale
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
Institute for Infocomm Research, 21, Heng Mui Keng Terrace, 119613, Singapore
Jian Su
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jingbo, Z., Na, Y., Xinzhi, C., Wenliang, C., Tsou, B.K. (2005). Using Multiple Discriminant Analysis Approach for Linear Text Segmentation. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_26

Download citation

DOI: https://doi.org/10.1007/11562214_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics