Skip to main content
Log in

Summarization of text-based documents with a determination of latent topical sections and information-rich sentences

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

A method is proposed for use in summarization of text-based documents. By means of the method it is possible to discover latent topical sections and information-rich sentences. The underlying basis of the method — clustering of sentences — is formulated mathematically in the form of a problem of quadratic-type integer programming. An algorithm that makes it possible to determine with specified precision the optimal number of clusters is developed. The synthesis of a neural network is described for the purpose of solving a problem of integer quadratic programming.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Mani, I. and Maybury, M.T., Advances in Automated Text Summarization, Cambridge: MIT Press, 1999.

    Google Scholar 

  2. Salton, G., Singhal, A., Mitra, M., and Buckley, C., Automated Text Structuring and Summarization, Inf. Process. Manage., 1997, vol. 33, no. 2, pp. 193–207.

    Article  Google Scholar 

  3. Mitra, M., Singhal, A., and Buckley, C., Automatic Text Summarization by Paragraph Extraction, Proc. ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization, Madrid, July 7–12, 1997, pp. 39–46.

  4. Kruengkrai, C. and Jaruskulchai, C., Generic Text Summarization Using Local and Global Properties of Sentences, Proc. IEEE/WIC Intern. Conf. Web Intelligence (WI’03), Halifax, Canada, October 13–17, 2003, pp. 201–206.

  5. Yeh, J.-Y., Ke, H.-R., Yang, W.-P., and Meng, I.-H., Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis, Inf. Process. Manage., 2005, vol. 41, no. 1, pp. 75–95.

    Article  Google Scholar 

  6. Goldstein, J., Kantrowitz, M., Mitral, V., and Carbonell, J., Summarization of Text Documents: Sentence Selection and Evaluation Metrics, Proc. 22nd Annual International ACM SIGIR Conf. Res. Develop. in Information Retrieval (SIGIR’99), Berkeley, USA, August 15–19, 1999, pp. 121–128.

  7. Gong, Y. and Liu, X., Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis, Proc. 24th Annual Intern. ACM SIGIR Conf. Res. Develop. in Information Retrieval, New Orleans, USA, 2001, pp. 19–25.

  8. Hu, P., He, T., Ji, D., and Wang, M. A Study of Chinese Text Summarization Using Adaptive Clustering of Paragraphs, Proc. 4th Intern. Conf. Computers and Information Technology (CIT’04), Wuhan, China, September 14–16, 2004, pp. 1159–1164.

  9. Shen, D., Chen, Z., Yang, Q., Zeng, H.J., Zhang, B., Lu, Y, and Ma, W.Y., Web-Page Classification Through Summarization, Proc. 27th Annual Intern. Conf. Res. Develop. Information Retrieval, Sheffield, UK, July 25–29, 2004, pp. 242–249.

  10. Delort, J.-Y., Bouchon-Meuniere, B., and Rifqi, M., Enhanced Web Document Summarization Using Hyperlinks, Proc. 14th ACM Conf. Hypertext and Hypermedia, Nottingham, UK, August 26–30, 2003, pp. 208–215.

  11. Luhn, H.P., The Automatic Creation of Literature Abstracts, IBM J. Res. Develop., 1958, vol. 2, no. 2, pp. 159–165.

    Article  MathSciNet  Google Scholar 

  12. Banko, M., Mitral, V., Kantrowitz, M., and Goldstein, J., Generating Extraction-Based Summaries from Hand-Written Summaries by Aligning Text Spans, Proc. 14th Conf. Pacific Assoc. Computational Linguistics (PACLING’99), Waterloo, Canada, August 25–28, 1999, pp. 36–40.

  13. Grabmeier, J. and Rudolph, A., Techniques of Cluster Algorithms in Data Mining, Data Mining Knowledge Discovery, 2002, vol. 6, no. 4, pp. 303–360.

    Article  MathSciNet  Google Scholar 

  14. Bradley, P.S., Fayyad, U.M., and Mangasarian, O.L., Mathematical Programming for Data Mining: Formulations and Challenges, INFORMS J. Comput., 1999, vol. 11, no. 3, pp. 217–238.

    Article  MATH  MathSciNet  Google Scholar 

  15. Bagirov, A.M., Ferguson, B., Ivkovic, S., Saunders, G., and Yearwood, J., New Algorithms for Multi-class Diagnosis Using Tumor Gene Expression Signature, Bioinformatics, 2003, vol. 19, no. 14, pp. 1800–1807.

    Article  Google Scholar 

  16. Alguliev, R.M., Alyguliev, R.M., and Alekperov, R.K., An Approach to Optimal Assignment of Tasks in a Distributed System, Avtom. Vychisl. Tekh., 2004, no. 5, pp. 55–61.

  17. Neyromatematika. Kniga 6. Uchebnoe posobie dlya vuzov (Neuro-Mathematics. Vol. 6. A Textbook for Post-Secondary Educational Institutions), Galushkin, A.I., Ed., Moscow: IPRZhR, 2002.

    Google Scholar 

  18. Kim, D.-W., Lee, K.H., and Lee, D., On Cluster Validity Index for Estimation of the Optimal Number of Fuzzy Clusters, Pattern Recognition, 2004, vol. 37, no. 10, pp. 2009–2025.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Original Russian Text © R.M. Alguliev, R.M. Alyguliev, 2007, published in Avtomatika i Vychislitel’naya Tekhnika, 2007, No. 3, pp. 21–32.

About this article

Cite this article

Alguliev, R.M., Alyguliev, R.M. Summarization of text-based documents with a determination of latent topical sections and information-rich sentences. Aut. Conrol Comp. Sci. 41, 132–140 (2007). https://doi.org/10.3103/S0146411607030030

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411607030030

Key words

Navigation