Abstract
This paper suggests an approach for creating a summary for a set of documents with revealing the topics and extracting informative sentences. The topics are determined through clustering of sentences, and the informative sentences are extracted using the ranking algorithm. The result of the summarization has been shown depends on the clustering method, the ranking algorithm, and the similarity measure. The experiments on an open benchmark datasets DUC2001 and DUC2002 have showed that the suggested clustering methods and the ranking algorithm show better results than the known k-means method and the ranking algorithms PageRank and HITS.
Similar content being viewed by others
References
Harabagiu, S., Hickl, A., and Lacatusu, V., Satisfying Information Needs with Multi-Document Summaries, Information Processing and Management, 2007, vol. 43, no. 6, pp. 1619–1642.
Jones, K., Automatic Summarizing: the State of the Art, Information Processing and Management, 2007, vol. 43, no. 6, pp. 1449–1481.
Moens, M.-F., Angheluta, R., and Dumortier, J., Generic Technologies for Single- and Multi-Document Summarization, Information Processing and Management, 2005, vol. 41, no. 3, pp. 569–586.
Zajic, D., Dorr, B.J., Lin, J., and Schwartz, R., Multi-Candidate Reduction: Sentence Compression as a Tool for Document Summarization Tasks, Information Processing and Management, 2007, vol. 43, no. 6, pp. 1549–1570.
Zhang, Y., Zincir-Heywood, N., and Milios, E., World Wide Web Site Summarization, International Journal of Web Intelligence and Agents Systems, 2004, vol. 2, no. 1, pp. 39–53.
Antiqueira, L., Oliveira, O., Costa, L., and Nunes, M., A Complex Network Approach to Text Summarization, Information Sciences, 2009, vol. 179, no. 5, pp. 584–599.
Diao, Q. and Shan, J., A New Web Page Summarization Method, in Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06), Washington USA, 2006, pp. 639–640.
Erkan, G. and Radev, D., Lexrank: Graph-Based Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research, 2004, vol. 22, pp. 457–479.
Otterbacher, J., Erkan, G., and Radev, D., Biased LexRank: Passage Retrieval Using Random Walks with Question-Based Priors, Information Processing and Management, 2009, vol. 45, no. 1, pp. 42–54.
Zhang, J., Xu, H., and Cheng, X., GSPSummary: a Graph-Based Sub-Topic Partition Algorithm for Summarization, in Proceedings of the 2008 Asia Information Retrieval Symposium, Harbin, China, 2008, pp. 321–334.
Liu, Y., Wang, X., Zhang, J., and Xu, H., Personalized PageRank Based Multi-Document Summarization, in Proceedings of the First IEEE International Workshop on Semantic Computing and Systems (WSCS2008), Huangshan, China, 2008, pp. 169–173.
Zhang, J., Cheng, X., Wu, G., and Xu, H., AdaSum: an Adaptive Model for Summarization, in Proceedings of the ACM 17th Conference on Information and Knowledge Management (CIKM’08), Napa Valley, USA, 2008, pp. 901–909.
Yeh, J.-Y., Ke, H.-R., and Yang, W.-P., iSpreadRank: Ranking Sentences for Extraction-Based Summarization Using Feature Weight Propagation in the Sentence Similarity Network, Expert Systems with Applications, 2008, vol. 35, no. 3, pp. 1451–1462.
Diligenti, M, Gori, M., and Maggini, M., A Unified Probabilistic Framework for Web Page Scoring Systems, IEEE Transactions on Knowledge and Data Engineering, 2004, vol. 16, no. 1, pp. 4–16.
Wan, X., Yang, J., and Xiao, J., Manifold-Ranking Based Topic-Focused Multi-Document Summarization, in Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007), Hyderabad, India, 2007, pp. 2903–2908.
Tarasov, S.D., The Algorithm of Ranking Connected Structures for the Task of Automatic Composition of Review Summaries of Bulletin Subjects, in Trudy konferentsii 11-oi natsional’noi konferentsii po iskusstvennomu intellektu s mezhdunarodnym uchastiyem (KII-2008) (Proceedings of the 11-th National Conference on Artificial Intellect with International Participation (KII-2008), Dubna, Russia, vol. 2, pp. 204–211.
Wan, X. and Yang, J., Multi-Document Summarization Using Cluster-Based Link Analysis in Proceedings of the 31-st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08), Singapore, 2008, pp. 299–306.
Aliguliyev, R.M., A New Sentence Similarity Measure and Sentence Based Extractive Technique for Automatic Text Summarization, Expert Systems with Applications, 2009, vol. 36, no. 4, pp. 7764–7772.
Aliguliyev R.M., Clustering Techniques and Discrete Particle Swarm Optimization Algorithm for Multi-Document Summarization, Computational Intelligence, 2009, vol. 25, no. 4.
Strehl, A. and Ghosh, J., Value-Based Customer Grouping from Large Retail Data-Sets, in Proceedings of the SPIE Conference on Data Mining and Knowledge Discovery, Orlando, USA, 2000, vol. 4057, pp. 33–42.
Padmanabhan, D., Desikan, P., and Srivastava, J., WICER: a Weighted Inter-Cluster Edge Ranking for Clustered Graphs, in Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’2005), Compiegne, France, 2005, pp. 522–528.
Lin, C.-Y., ROUGE: a Package for Automatic Evaluation Summaries, in Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain, 2004, pp. 74–81.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © R.M. Alyguliyev, 2009, published in Avtomatika i Vychislitel’naya Tekhnika, 2009, No. 5, pp. 72–82.
About this article
Cite this article
Alyguliyev, R.M. The two-stage unsupervised approach to multidocument summarization. Aut. Conrol Comp. Sci. 43, 276–284 (2009). https://doi.org/10.3103/S0146411609050083
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0146411609050083