Solving submodular text processing problems using influence graphs

Vardasbi, Ali; Faili, Heshaam; Asadpour, Masoud

doi:10.1007/s13278-019-0559-9

Solving submodular text processing problems using influence graphs

Review Article
Published: 07 May 2019

Volume 9, article number 21, (2019)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

216 Accesses
1 Citation
Explore all metrics

Abstract

Submodular functions appear in a considerable number of important natural language processing problems such as text summarization and dataset selection. Current graph-based approaches to solving such problems do not pay special attention to the submodularity and simplistically do not learn the graph model. Instead, they roughly set the edge weights in the graph proportional to the similarity of their two endpoints. We argue that such a shallow modeling needs to be replaced by a deeper approach which learns the graph edge weights. As such, we propose a new method for learning the graph model corresponding the submodular function that is going to be maximized. In a number of real-world networks, our method leads to a 50% error reduction compared to the previously used baseline methods. Furthermore, we apply our proposed method followed by an influence maximization algorithm to two NLP tasks: text summarization and k-means initialization for topic selection. Using these case studies, we experimentally show the significance of our learning method over the previous shallow methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-extractive Multi-document Summarization via Submodular Functions

Semantic WordRank: Generating Finer Single-Document Summarizations

Leveraging Non-negative Matrix Factorization for Document Summarization

References

Agirre E, Martínez D, de Lacalle OL, Soroa A (2006) Two graph-based algorithms for state-of-the-art WSD. In: Proceedings of the 2006 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 585–593
Alexandrescu A, Kirchhoff K (2007) Data-driven graph construction for semi-supervised graph-based learning in NLP. In: Proceedings of the main conference human language technologies 2007: the conference of the North American chapter of the association for computational linguistics, pp 204–211
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035
Badanidiyuru A, Mirzasoleiman B, Karbasi A, Krause A (2014) Streaming submodular maximization. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining—KDD’14, pp 671–680
Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GRAPHSUM: discovering correlations among multiple terms for graph-based summarization. Inf Sci 249:96–109
Article MathSciNet Google Scholar
Beliga S, Mestrovic A, Martincic-Ipsic S (2015) An overview of graph-based keyword extraction methods and approaches. J Inf Organ Sci 39(1):1–20
Google Scholar
Berton L, Valverde-Rebaza J, de Andrade Lopes A (2015) Link prediction in graph construction for supervised and semi-supervised learning. In: 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Article Google Scholar
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the K-means clustering algorithm. Expert Syst Appl 40(1):200–210
Article Google Scholar
Chekuri C, Jayram TS, Vondrak J (2015) On multiplicative weight updates for concave and submodular function maximization. In: Proceedings of the 2015 conference on innovations in theoretical computer science—ITCS’15, pp 201–210
Chen W, Wang C, Wang Y (2010) Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1029–1038
Cieri C, Graff D, Liberman M, Martey N, Strassel S (1999) The TDT-2 text and speech corpus. In: Proceedings of the broadcast news workshop’99, p 57
Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Article Google Scholar
Galluccio L, Michel O, Comon P, Hero AO III (2012) Graph based k-means clustering. Signal Process 92(9):1970–1984
Article Google Scholar
Granovetter MS (1973) The strength of weak ties. Am J Sociol 78(6):1360–1380
Article Google Scholar
Herings P, Van der Laan G, Talman D (2001) Measuring the power of nodes in digraphs. Technical report, Tinbergen Institute
Huang B, Yang Y, Mahmood A, Wang H (2012) Microblog topic detection based on LDA model and single-pass clustering. Int Conf Rough Sets Curr Trends Comput 2012:166–171
Article Google Scholar
Kazemi E, Zadimoghaddam M, Karbasi A (2018) Scalable deletion-robust submodular maximization: data summarization with privacy and fairness constraints. In: International conference on machine learning, pp 2549–2558
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632
Article MathSciNet Google Scholar
Klimt B, Yang Y (2004) Introducing the Enron corpus. In: CEAS
Kulesza A, Taskar B (2012) Determinantal point processes for machine learning. Found Trends® Mach Learn 5(2–3):123–286
Article Google Scholar
Leskovec J (2018) Stanford large network dataset collection. https://snap.stanford.edu/data/index.html. Accessed 1 May 2019.
Leskovec J, Grobelnik M, Milic-Frayling N (2004) Learning semantic graph mapping for document summarization. In: Proceedings of ECML/PKDD-2004 workshop on knowledge discovery and ontologies
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution. ACM Trans Knowl Discov Data 1(2):1–39
Google Scholar
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123
Article MathSciNet Google Scholar
Lewis DD (2004) Reuters-21578 text categorization test collection. http://www.daviddlewis.com/resources/testcollections/reuters21578/. Accessed 1 May 2019
Li W, Joo J, Qi H, Zhu S-C (2017) Joint image-text news topic detection and tracking by multimodal topic and-or graph. IEEE Trans Multimed 19(2):367–381
Article Google Scholar
Lin C-Y (2004) ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the workshop on text summarization branches out (WAS 2004), Barcelona, Spain, July 25–26
Lin H, Bilmes J (2010) Multi-document summarization via budgeted maximization of submodular functions. In: HLT’10 human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, pp 912–920
Lin H, Bilmes J (2011) A class of submodular functions for document summarization. Comput Linguist 1:510–520
Google Scholar
Lin H, Bilmes J (2012) Learning mixtures of submodular shells with application to document summarization. arXiv preprint arXiv:1210.4871
Matsuo Y, Sakaki T, Uchiyama K, Ishizuka M (2006) Graph-based word clustering using a web search engine. In: Proceedings of the 2006 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 542–550
Mei Q, Guo J, Radev D (2010) DivRank: the interplay of prestige and diversity in information networks. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1009–1018
Mihalcea R (2004) Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL 2004 on interactive poster and demonstration sessions. Association for Computational Linguistics, p 20
Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. Proc EMNLP 85:404–411
Google Scholar
Mirzasoleiman B, Sarkar R, Krause A (2013) Distributed submodular maximization: identifying representative elements in massive data. Adv Neural Inf Process Syst 26:2049–2057
Google Scholar
Mirzasoleiman B, Karbasi A, Sarkar R, Krause A (2016) Distributed submodular maximization. J Mach Learn Res 17(1):8330–8373
MathSciNet MATH Google Scholar
Mirzasoleiman B, Jegelka S, Krause A (2018) Streaming non-monotone submodular maximization: personalized video summarization on the fly. In: Thirty-second AAAI conference on artificial intelligence
Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions—I. Math Program 14(1):265–294
Article MathSciNet Google Scholar
Pilehvar MT, Navigli R (2015) From senses to texts: an all-in-one graph-based approach for measuring semantic similarity. Artif Intell 228:95–128
Article MathSciNet Google Scholar
Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 487–494
Spina D, Gonzalo J, Amigó E (2014) Learning similarity functions for topic detection in online reputation monitoring. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval—SIGIR’14, pp 527–536
Tang Y, Xiao X, Shi Y (2014) Influence maximization: near-optimal time complexity meets practical efficiency. Kdd 2014:75–86
Google Scholar
Tixier AJ, Meladianos P, Vazirgiannis M (2017) Combining graph degeneracy and submodularity for unsupervised extractive summarization. In: Proceedings of the workshop on new frontiers in summarization, pp 48–58
Vardasbi A, Faili H, Asadpour M (2017) SWIM: stepped weighted shell decomposition influence maximization for large-scale networks. ACM Trans Inf Syst 36(1):1–33
Article Google Scholar
Véronis J (2004) Hyperlex: lexical cartography for information retrieval. Comput Speech Lang 18(3):223–252
Article Google Scholar
Wang D, Zhu S, Li T, Gong Y (2009) Multi-document summarization using sentence-based topic models. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, pp 297–300
Wang C, Yu X, Li Y, Zhai C, Han J (2013) Content coverage maximization on word networks for hierarchical topic summarization. In: Proceedings of the 22nd ACM international conference on conference on information and knowledge management—CIKM’13, pp 249–258
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442
Article Google Scholar
Weng, J, Yao Y, Leonardi E, Lee F (2011) Event detection in Twitter. Development, pp 401–408
Xie P, Xing EP (2013) Integrating document clustering and topic modeling. arXiv preprint arXiv:1309.6874
Yang J, Leskovec J (2015) Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 42(1):181–213
Article Google Scholar
Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval—SIGIR’98, pp 28–36
Yasunaga, M, Zhang R, Meelu K, Pareek A, Srinivasan K, Radev D (2017) Graph-based neural multi-document summarization. arXiv preprint arXiv:1706.06681
Zheng M, Bu J, Chen C, Wang C, Zhang L, Qiu G, Cai D (2011) Graph regularized sparse coding for image representation. IEEE Trans Image Process 20(5):1327–1336
Article MathSciNet Google Scholar
Zhou, T, Ouyang H, Chang Y, Bilmes J, Guestrin C (2016) Scaling submodular maximization via pruned submodularity graphs. arXiv preprint arXiv:1606.00399

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
Ali Vardasbi, Heshaam Faili & Masoud Asadpour

Authors

Ali Vardasbi
View author publications
You can also search for this author in PubMed Google Scholar
Heshaam Faili
View author publications
You can also search for this author in PubMed Google Scholar
Masoud Asadpour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Vardasbi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vardasbi, A., Faili, H. & Asadpour, M. Solving submodular text processing problems using influence graphs. Soc. Netw. Anal. Min. 9, 21 (2019). https://doi.org/10.1007/s13278-019-0559-9

Download citation

Received: 02 June 2018
Revised: 09 April 2019
Accepted: 11 April 2019
Published: 07 May 2019
DOI: https://doi.org/10.1007/s13278-019-0559-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Solving submodular text processing problems using influence graphs

Abstract

Access this article

Similar content being viewed by others

Semi-extractive Multi-document Summarization via Submodular Functions

Semantic WordRank: Generating Finer Single-Document Summarizations

Leveraging Non-negative Matrix Factorization for Document Summarization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Solving submodular text processing problems using influence graphs

Abstract

Access this article

Similar content being viewed by others

Semi-extractive Multi-document Summarization via Submodular Functions

Semantic WordRank: Generating Finer Single-Document Summarizations

Leveraging Non-negative Matrix Factorization for Document Summarization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation