Skip to main content
Log in

The two-stage unsupervised approach to multidocument summarization

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

This paper suggests an approach for creating a summary for a set of documents with revealing the topics and extracting informative sentences. The topics are determined through clustering of sentences, and the informative sentences are extracted using the ranking algorithm. The result of the summarization has been shown depends on the clustering method, the ranking algorithm, and the similarity measure. The experiments on an open benchmark datasets DUC2001 and DUC2002 have showed that the suggested clustering methods and the ranking algorithm show better results than the known k-means method and the ranking algorithms PageRank and HITS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Harabagiu, S., Hickl, A., and Lacatusu, V., Satisfying Information Needs with Multi-Document Summaries, Information Processing and Management, 2007, vol. 43, no. 6, pp. 1619–1642.

    Article  Google Scholar 

  2. Jones, K., Automatic Summarizing: the State of the Art, Information Processing and Management, 2007, vol. 43, no. 6, pp. 1449–1481.

    Article  Google Scholar 

  3. Moens, M.-F., Angheluta, R., and Dumortier, J., Generic Technologies for Single- and Multi-Document Summarization, Information Processing and Management, 2005, vol. 41, no. 3, pp. 569–586.

    Article  Google Scholar 

  4. Zajic, D., Dorr, B.J., Lin, J., and Schwartz, R., Multi-Candidate Reduction: Sentence Compression as a Tool for Document Summarization Tasks, Information Processing and Management, 2007, vol. 43, no. 6, pp. 1549–1570.

    Article  Google Scholar 

  5. Zhang, Y., Zincir-Heywood, N., and Milios, E., World Wide Web Site Summarization, International Journal of Web Intelligence and Agents Systems, 2004, vol. 2, no. 1, pp. 39–53.

    MATH  Google Scholar 

  6. Antiqueira, L., Oliveira, O., Costa, L., and Nunes, M., A Complex Network Approach to Text Summarization, Information Sciences, 2009, vol. 179, no. 5, pp. 584–599.

    Article  MATH  Google Scholar 

  7. Diao, Q. and Shan, J., A New Web Page Summarization Method, in Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06), Washington USA, 2006, pp. 639–640.

  8. Erkan, G. and Radev, D., Lexrank: Graph-Based Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research, 2004, vol. 22, pp. 457–479.

    Google Scholar 

  9. Otterbacher, J., Erkan, G., and Radev, D., Biased LexRank: Passage Retrieval Using Random Walks with Question-Based Priors, Information Processing and Management, 2009, vol. 45, no. 1, pp. 42–54.

    Article  Google Scholar 

  10. Zhang, J., Xu, H., and Cheng, X., GSPSummary: a Graph-Based Sub-Topic Partition Algorithm for Summarization, in Proceedings of the 2008 Asia Information Retrieval Symposium, Harbin, China, 2008, pp. 321–334.

  11. Liu, Y., Wang, X., Zhang, J., and Xu, H., Personalized PageRank Based Multi-Document Summarization, in Proceedings of the First IEEE International Workshop on Semantic Computing and Systems (WSCS2008), Huangshan, China, 2008, pp. 169–173.

  12. Zhang, J., Cheng, X., Wu, G., and Xu, H., AdaSum: an Adaptive Model for Summarization, in Proceedings of the ACM 17th Conference on Information and Knowledge Management (CIKM’08), Napa Valley, USA, 2008, pp. 901–909.

  13. Yeh, J.-Y., Ke, H.-R., and Yang, W.-P., iSpreadRank: Ranking Sentences for Extraction-Based Summarization Using Feature Weight Propagation in the Sentence Similarity Network, Expert Systems with Applications, 2008, vol. 35, no. 3, pp. 1451–1462.

    Article  Google Scholar 

  14. Diligenti, M, Gori, M., and Maggini, M., A Unified Probabilistic Framework for Web Page Scoring Systems, IEEE Transactions on Knowledge and Data Engineering, 2004, vol. 16, no. 1, pp. 4–16.

    Article  MathSciNet  Google Scholar 

  15. Wan, X., Yang, J., and Xiao, J., Manifold-Ranking Based Topic-Focused Multi-Document Summarization, in Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007), Hyderabad, India, 2007, pp. 2903–2908.

  16. Tarasov, S.D., The Algorithm of Ranking Connected Structures for the Task of Automatic Composition of Review Summaries of Bulletin Subjects, in Trudy konferentsii 11-oi natsional’noi konferentsii po iskusstvennomu intellektu s mezhdunarodnym uchastiyem (KII-2008) (Proceedings of the 11-th National Conference on Artificial Intellect with International Participation (KII-2008), Dubna, Russia, vol. 2, pp. 204–211.

  17. Wan, X. and Yang, J., Multi-Document Summarization Using Cluster-Based Link Analysis in Proceedings of the 31-st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08), Singapore, 2008, pp. 299–306.

  18. Aliguliyev, R.M., A New Sentence Similarity Measure and Sentence Based Extractive Technique for Automatic Text Summarization, Expert Systems with Applications, 2009, vol. 36, no. 4, pp. 7764–7772.

    Article  Google Scholar 

  19. Aliguliyev R.M., Clustering Techniques and Discrete Particle Swarm Optimization Algorithm for Multi-Document Summarization, Computational Intelligence, 2009, vol. 25, no. 4.

  20. Strehl, A. and Ghosh, J., Value-Based Customer Grouping from Large Retail Data-Sets, in Proceedings of the SPIE Conference on Data Mining and Knowledge Discovery, Orlando, USA, 2000, vol. 4057, pp. 33–42.

  21. Padmanabhan, D., Desikan, P., and Srivastava, J., WICER: a Weighted Inter-Cluster Edge Ranking for Clustered Graphs, in Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’2005), Compiegne, France, 2005, pp. 522–528.

  22. Lin, C.-Y., ROUGE: a Package for Automatic Evaluation Summaries, in Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain, 2004, pp. 74–81.

  23. http://duc.mst.gov

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. M. Alyguliyev.

Additional information

Original Russian Text © R.M. Alyguliyev, 2009, published in Avtomatika i Vychislitel’naya Tekhnika, 2009, No. 5, pp. 72–82.

About this article

Cite this article

Alyguliyev, R.M. The two-stage unsupervised approach to multidocument summarization. Aut. Conrol Comp. Sci. 43, 276–284 (2009). https://doi.org/10.3103/S0146411609050083

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411609050083

Key words

Navigation