Abstract
A method is proposed for use in summarization of text-based documents. By means of the method it is possible to discover latent topical sections and information-rich sentences. The underlying basis of the method — clustering of sentences — is formulated mathematically in the form of a problem of quadratic-type integer programming. An algorithm that makes it possible to determine with specified precision the optimal number of clusters is developed. The synthesis of a neural network is described for the purpose of solving a problem of integer quadratic programming.
Similar content being viewed by others
References
Mani, I. and Maybury, M.T., Advances in Automated Text Summarization, Cambridge: MIT Press, 1999.
Salton, G., Singhal, A., Mitra, M., and Buckley, C., Automated Text Structuring and Summarization, Inf. Process. Manage., 1997, vol. 33, no. 2, pp. 193–207.
Mitra, M., Singhal, A., and Buckley, C., Automatic Text Summarization by Paragraph Extraction, Proc. ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization, Madrid, July 7–12, 1997, pp. 39–46.
Kruengkrai, C. and Jaruskulchai, C., Generic Text Summarization Using Local and Global Properties of Sentences, Proc. IEEE/WIC Intern. Conf. Web Intelligence (WI’03), Halifax, Canada, October 13–17, 2003, pp. 201–206.
Yeh, J.-Y., Ke, H.-R., Yang, W.-P., and Meng, I.-H., Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis, Inf. Process. Manage., 2005, vol. 41, no. 1, pp. 75–95.
Goldstein, J., Kantrowitz, M., Mitral, V., and Carbonell, J., Summarization of Text Documents: Sentence Selection and Evaluation Metrics, Proc. 22nd Annual International ACM SIGIR Conf. Res. Develop. in Information Retrieval (SIGIR’99), Berkeley, USA, August 15–19, 1999, pp. 121–128.
Gong, Y. and Liu, X., Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis, Proc. 24th Annual Intern. ACM SIGIR Conf. Res. Develop. in Information Retrieval, New Orleans, USA, 2001, pp. 19–25.
Hu, P., He, T., Ji, D., and Wang, M. A Study of Chinese Text Summarization Using Adaptive Clustering of Paragraphs, Proc. 4th Intern. Conf. Computers and Information Technology (CIT’04), Wuhan, China, September 14–16, 2004, pp. 1159–1164.
Shen, D., Chen, Z., Yang, Q., Zeng, H.J., Zhang, B., Lu, Y, and Ma, W.Y., Web-Page Classification Through Summarization, Proc. 27th Annual Intern. Conf. Res. Develop. Information Retrieval, Sheffield, UK, July 25–29, 2004, pp. 242–249.
Delort, J.-Y., Bouchon-Meuniere, B., and Rifqi, M., Enhanced Web Document Summarization Using Hyperlinks, Proc. 14th ACM Conf. Hypertext and Hypermedia, Nottingham, UK, August 26–30, 2003, pp. 208–215.
Luhn, H.P., The Automatic Creation of Literature Abstracts, IBM J. Res. Develop., 1958, vol. 2, no. 2, pp. 159–165.
Banko, M., Mitral, V., Kantrowitz, M., and Goldstein, J., Generating Extraction-Based Summaries from Hand-Written Summaries by Aligning Text Spans, Proc. 14th Conf. Pacific Assoc. Computational Linguistics (PACLING’99), Waterloo, Canada, August 25–28, 1999, pp. 36–40.
Grabmeier, J. and Rudolph, A., Techniques of Cluster Algorithms in Data Mining, Data Mining Knowledge Discovery, 2002, vol. 6, no. 4, pp. 303–360.
Bradley, P.S., Fayyad, U.M., and Mangasarian, O.L., Mathematical Programming for Data Mining: Formulations and Challenges, INFORMS J. Comput., 1999, vol. 11, no. 3, pp. 217–238.
Bagirov, A.M., Ferguson, B., Ivkovic, S., Saunders, G., and Yearwood, J., New Algorithms for Multi-class Diagnosis Using Tumor Gene Expression Signature, Bioinformatics, 2003, vol. 19, no. 14, pp. 1800–1807.
Alguliev, R.M., Alyguliev, R.M., and Alekperov, R.K., An Approach to Optimal Assignment of Tasks in a Distributed System, Avtom. Vychisl. Tekh., 2004, no. 5, pp. 55–61.
Neyromatematika. Kniga 6. Uchebnoe posobie dlya vuzov (Neuro-Mathematics. Vol. 6. A Textbook for Post-Secondary Educational Institutions), Galushkin, A.I., Ed., Moscow: IPRZhR, 2002.
Kim, D.-W., Lee, K.H., and Lee, D., On Cluster Validity Index for Estimation of the Optimal Number of Fuzzy Clusters, Pattern Recognition, 2004, vol. 37, no. 10, pp. 2009–2025.
Author information
Authors and Affiliations
Additional information
Original Russian Text © R.M. Alguliev, R.M. Alyguliev, 2007, published in Avtomatika i Vychislitel’naya Tekhnika, 2007, No. 3, pp. 21–32.
About this article
Cite this article
Alguliev, R.M., Alyguliev, R.M. Summarization of text-based documents with a determination of latent topical sections and information-rich sentences. Aut. Conrol Comp. Sci. 41, 132–140 (2007). https://doi.org/10.3103/S0146411607030030
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3103/S0146411607030030