Abstract
Multilingual multi-document summarization is a task to generate the summary in target language from a collection of documents in multiple source languages. A straightforward approach to this task is automatically translating the non-target language documents into target language and then applying monolingual summarization methods, but the summaries generated by this method is often poorly readable due to the low quality of machine translation. To solve this problem, we propose a novel graph model based on guided edge weighting method in which both informativeness and readability of summaries are taken into consideration fully. In methodology, our model attempts to choose from the target language documents the sentences which contain important shared information across languages, and also retains the salient sentences which cannot be covered by documents in other language. The experimental results on our manually labeled dataset (It will be released to the public.) show that our method significantly outperforms other baseline methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Dalli, A., Catizone, R., Wilks, Y.: Clustering-based language independent multiple-document summarizer at MSE 2006. In: Proceedings of MSE (2006)
Daraksha Parveen, H.M.R., Strube, M.: Topical coherence for graph-based extractive summarization. In: EMNLP 2015 (2015)
Daumé III., H., Marcu, D.: Bayesian multidocument summarization at MSE. In: Proceedings of MSE (2005)
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Qiqihar Jr. Teach. Coll. 22, 2004 (2011)
Giampiccolo, D., Magnini, B., Dagan, I., Dolan, B.: The third pascal recognizing textual entailment challenge. In: ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pp. 1–9 (2007)
Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., Varma, V.: TAC 2011 multiling pilot overview. Contribution in Book/report/proceedings (2011)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Meeting on Association for Computational Linguistics, pp. 423–430 (2003)
Levy, O., Zesch, T., Dagan, I., Gurevych, I.: Recognizing partial textual entailment. In: Meeting of the Association for Computational Linguistics, pp. 451–455 (2013)
Levy, R., Manning, C.: Is it harder to parse Chinese, or the Chinese treebank? In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 439–446 (2003)
Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (2003)
Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, ACLdemo 2004 (2004)
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. UNT Scholarly Works, pp. 404–411 (2004)
Siddharthan, A., Evans, D.: Columbia University at MSE 2005 (2005)
Stern, A., Dagan, I.: BIUTEE: a modular open-source system for recognizing textual entailment. In: ACL 2012 System Demonstrations, pp. 73–78 (2012)
Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter (2005)
Wan, X.: Using bilingual information for cross-language document summarization. In: ACL 2011, pp. 1546–1555 (2011a)
Wan, X., Li, H., Xiao, J.: Cross-language document summarization based on machine translation quality prediction. In: Proceedings of the Meeting of the Association for Computational Linguistics, ACL 2010, Uppsala, Sweden, 11–16 July 2010, pp. 917–926 (2010)
Wan, X., Yang, J.: Improved affinity graph based multi-document summarization. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, USA, 4–9 June 2006, pp. 181–184 (2006a)
Wan, X., Yang, J., Xiao, J.: Using cross-document random walks for topic-focused multi-document. In: 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006), Hong Kong, China, 18–22 December 2006, pp. 1012–1018 (2006b)
Wei, X., C.Y.: The THU/PolyU system at MSE 2006: an event-relevance based approach. In: Proceedings of MSE 2006 (2006)
Yao, J.G., Wan, X., Xiao, J.: Phrase-based compressive cross-language summarization. In: Conference on Empirical Methods in Natural Language Processing, pp. 1546–1555 (2015)
Zajic, D., Dorr, B., Lin, J., Schwartz, R., Zajic, D., Dorr, B., Lin, J.: UMD/BBN at MSE 2005. In: Proceedings of MSE (2005)
Acknowledgments
The research work has been funded by the Natural Science Foundation of China under Grant No. 61333018 and supported by the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Li, H., Zhang, J., Zhou, Y., Zong, C. (2016). GuideRank: A Guided Ranking Graph Model for Multilingual Multi-document Summarization. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-50496-4_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)