LDA-Based Topic Formation and Topic-Sentence Reinforcement for Graph-Based Multi-document Summarization

Gao, Dehong; Li, Wenjie; Ouyang, You; Zhang, Renxian

doi:10.1007/978-3-642-35341-3_33

LDA-Based Topic Formation and Topic-Sentence Reinforcement for Graph-Based Multi-document Summarization

Dehong Gao²¹,
Wenjie Li²¹,
You Ouyang²² &
…
Renxian Zhang²¹

Conference paper

1319 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7675))

Abstract

In recent years graph-based ranking algorithms have attracted much attention in document summarization. This paper introduces our recent work on applying a topic model, namely LDA, in graph-based summarization. In the proposed approach, LDA is used to automatically identify a set of semantic topics from the documents to be summarized. The identified topics are then used to construct a bipartite graph to represent the documents. Topic-sentence reinforcement is implemented to calculate the salience scores of topics and sentences simultaneously. By incorporating the information embedded in the topics, the sentence ranking result can be improved. Experiments are conducted on the DUC 2004 data set to evaluate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Carbonell, J., Goldstein, J.: The use of MMR, diversity based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR 1998, pp. 335–336 (1998)
Google Scholar
DUC. Document Understanding Conference, http://www-nlpir.nist.gov/projects/duc/intro.html
Griffiths, T., Steyvers, M.: Finding Scientific Topics. Proceedings of the National Academy of Sciences 101(suppl.1), 5228–5235 (2004)
Article Google Scholar
Lin, C.Y., Hovy, E.H.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of HLT-NAACL 2003, pp. 71–78 (2003)
Google Scholar
Otterbacher, J., Erkan, G., Radev, D.R., Mihalcea, R.: Using random walks for question-focused sentence retrieval. In: Proceedings of HLT-EMNLP 2005, pp. 915–922 (2005)
Google Scholar
Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st ACM SIGIR, pp. 299–306 (2008)
Google Scholar
Zha, H.: Generic Summarization and Key Phrase Extraction using Mutual Reinforcement Principle and Sentence Clustering. In: Proceedings of the 25th ACM SIGIR 2002, pp. 113–120 (2002)
Google Scholar
Brin, S., Page, L.: The anatomy of a large scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1-7), 107–117 (1998)
Article Google Scholar
Kleinberg, J.M.: Authoritative sources in hyperlinked environment. Journal of ACM 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Erkan, G., Radev, D.R.: LexRank: Graph-based centrality as salience in text summarization. Journal of Artificial Intelligence Research 22, 457–479 (2004)
Google Scholar
Lin, C., Hovy, E.: From single to multi-document summarization: A prototype system and its evaluation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (2002)
Google Scholar
Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of ACL 2004 (2004)
Google Scholar
Mihalcea, R.: Language independent extractive summarization. In: Proceedings of ACL (2005)
Google Scholar
Radev, D.R., Jing, H., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing & Management, 919–938 (2004)
Google Scholar
Cai, X., Li, W., Ouyang, Y., et al.: Simultaneous Ranking and Clustering of Sentences: A Reinforcement Approach to Multi-Document Summarization. In: Proceedings of Coling 2010, pp. 134–142 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Hong Kong
Dehong Gao, Wenjie Li & Renxian Zhang
Miaozhen Systems, Beijing, China
You Ouyang

Authors

Dehong Gao
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Li
View author publications
You can also search for this author in PubMed Google Scholar
You Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Renxian Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of computer Science and Technology, Tianjin University, Tianjin, 300072, China
Yuexian Hou
DIRO, University of Montreal, CP. 6128, succursale Centre-ville, H3C 3J7, Montreal, QC, Canada
Jian-Yun Nie
Institute of Software, Storage & Information Retrieval Laboratory, Chinese Academy of Sciences, 100190, Beijing, China
Le Sun
School of Computer Science and Technology, Tianjin University, 300072, Tianjin, China
Bo Wang
School of Computing, Robert Gordon University, St Andrew Street, AB25 1HG, Aberdeen, UK
Peng Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, D., Li, W., Ouyang, Y., Zhang, R. (2012). LDA-Based Topic Formation and Topic-Sentence Reinforcement for Graph-Based Multi-document Summarization. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-35341-3_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35340-6
Online ISBN: 978-3-642-35341-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics