skip to main content
10.1145/2505515.2505585acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Content coverage maximization on word networks for hierarchical topic summarization

Published: 27 October 2013 Publication History

Abstract

This paper studies text summarization by extracting hierarchical topics from a given collection of documents. We propose a new approach of text modeling via network analysis. We convert documents into a word influence network, and find the words summarizing the major topics with an efficient influence maximization algorithm. Besides, the influence capability of the topic words on other words in the network reveal the relations among the topic words. Then we cluster the words and build hierarchies for the topics. Experiments on large collections of Web documents show that a simple method based on the influence analysis is effective, compared with existing generative topic modeling and random walk based ranking.

References

[1]
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. J. Machine Learning Research, 3:993--1022, 2003.
[2]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1--7):107--117, 1998.
[3]
J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998.
[4]
W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In KDD, 2009.
[5]
S.-L. Chuang and L.-F. Chien. A practical web-based approach to generating topic hierarchy for text segments. In CIKM, 2004.
[6]
P. Domingos and M. Richardson. Mining the network value of customers. In KDD, 2001.
[7]
G. Erkan and D. R. Radev. Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Int. Res., 22:457--479, 2004.
[8]
A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal. Using the wisdom of the crowds for keyword generation. In WWW, 2008.
[9]
A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities in social networks. In WSDM, 2010.
[10]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, 1999.
[11]
D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, 2003.
[12]
M. Kimura and K. Saito. Tractable models for information diffusion in social networks. In PKDD, 2006.
[13]
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. S. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420--429, 2007.
[14]
W. Li and A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations. In ICML, 2006.
[15]
X. Liu, Y. Song, S. Liu, and H. Wang. Automatic taxonomy construction from keywords. In KDD, 2012.
[16]
R. Mihalcea and P. Tarau. TextRank: Bringing Order into Texts. In Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004.
[17]
D. Mimno, W. Li, and A. McCallum. Mixtures of hierarchical topics with pachinko allocation. In ICML, 2007.
[18]
G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265--294, 1978.
[19]
S. P. Ponzetto and M. Strube. Deriving a large scale taxonomy from wikipedia. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1440--1445, 2007.
[20]
D. R. Radev. A common theory of information fusion from multiple text sources step one: cross-document structure. In Proceedings of the 1st SIGdial workshop on Discourse and dialogue - Volume 10, 2000.
[21]
D. R. Radev, H. Jing, and M. Budzikowska. Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In NAACL-ANLP 2000 Workshop on Automatic summarization - Volume 4, 2000.
[22]
M. Richardson and P. Domingos. Mining knowledge-sharing sites for viral marketing. In KDD, 2002.
[23]
M. d. B. Rodriguez, J. M. G. Hidalgo, and B. D. Agudo. Using wordnet to complement training information in text categorization. In Proc. RANLP, 1997.
[24]
M. G. Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and influence. In KDD, 2010.
[25]
J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD, 2009.
[26]
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101, 2004.
[27]
V. V. Vazirani. Approximation Algorithms. Springer, 2004.
[28]
X. Wan and J. Xiao. Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans. Inf. Syst., 28:1--34, 2010.
[29]
X. Wan, J. Yang, and J. Xiao. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In ACL, 2007.
[30]
C. Wang, W. Chen, and Y. Wang. Scalable influence maximization for independent cascade model in large-scale social networks. Data Mining and Knowledge Discovery, 25(3):545--576, 2012.
[31]
C. Wang, M. Danilevsky, N. Desai, Y. Zhang, P. Nguyen, T. Taula, and J. Han. A phrase mining framework for recursive construction of a topical hierarchy. In KDD, 2013.

Cited By

View all
  • (2022)Multidimensional Mining of Massive Text DataundefinedOnline publication date: 19-Mar-2022
  • (2019)Solving submodular text processing problems using influence graphsSocial Network Analysis and Mining10.1007/s13278-019-0559-99:1Online publication date: 7-May-2019
  • (2018)Influence Maximization ModelEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4939-7131-2_110197(1075-1082)Online publication date: 12-Jun-2018
  • Show More Cited By

Index Terms

  1. Content coverage maximization on word networks for hierarchical topic summarization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
      October 2013
      2612 pages
      ISBN:9781450322638
      DOI:10.1145/2505515
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. information coverage
      2. keyword extraction
      3. text summarization
      4. topic hierarchy

      Qualifiers

      • Research-article

      Conference

      CIKM'13
      Sponsor:
      CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
      October 27 - November 1, 2013
      California, San Francisco, USA

      Acceptance Rates

      CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Multidimensional Mining of Massive Text DataundefinedOnline publication date: 19-Mar-2022
      • (2019)Solving submodular text processing problems using influence graphsSocial Network Analysis and Mining10.1007/s13278-019-0559-99:1Online publication date: 7-May-2019
      • (2018)Influence Maximization ModelEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4939-7131-2_110197(1075-1082)Online publication date: 12-Jun-2018
      • (2017)Vector-based similarity measurements for historical figuresInformation Systems10.1016/j.is.2016.07.00164:C(163-174)Online publication date: 1-Mar-2017
      • (2017)Automated Assessment of the Quality of Peer Reviews using Natural Language Processing TechniquesInternational Journal of Artificial Intelligence in Education10.1007/s40593-016-0132-x27:3(534-581)Online publication date: 11-Jan-2017
      • (2017)Continuous Summarization over Microblog ThreadsDatabase Systems for Advanced Applications10.1007/978-3-319-55699-4_31(511-526)Online publication date: 22-Mar-2017
      • (2017)Influence Maximization ModelEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4614-7163-9_110197-1(1-8)Online publication date: 18-Aug-2017
      • (2016)Real-time topic-aware influence maximization using preprocessingComputational Social Networks10.1186/s40649-016-0033-z3:1Online publication date: 10-Nov-2016
      • (2016)Generating Incremental Length Summary Based on Hierarchical Topic Coverage MaximizationACM Transactions on Intelligent Systems and Technology10.1145/28094337:3(1-33)Online publication date: 17-Feb-2016
      • (2015)Towards Interactive Construction of Topical HierarchyProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783288(1225-1234)Online publication date: 10-Aug-2015
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media