skip to main content
10.1145/3340531.3411899acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Top-k Graph Summarization on Hierarchical DAGs

Authors Info & Claims
Published:19 October 2020Publication History

ABSTRACT

Directed acyclic graph (DAG) is an essentially important model to represent terminologies and their hierarchical relationships, such as Disease Ontology. Due to massive terminologies and complex structures in a large DAG, it is challenging to summarize the whole hierarchical DAG.

In this paper, we study a new problem of finding k representative vertices to summarize a hierarchical DAG. To depict diverse summarization and important vertices, we design a summary score function for capturing vertices' diversity coverage and structure correlation. The studied problem is theoretically proven to be NP-hard. To efficiently tackle it, we propose a greedy algorithm with an approximation guarantee, which iteratively adds vertices with the large summary contributions into answers. To further improve answer quality, we propose a subtree extraction based method, which is proven to guarantee achieving higher-quality answers. In addition, we develop a scalable algorithm k-PCGS based on candidate pruning and DAG compression for large-scale hierarchical DAGs. Extensive experiments on large real-world datasets demonstrate both the effectiveness and efficiency of proposed algorithms.

Skip Supplemental Material Section

Supplemental Material

3340531.3411899.mp4

mp4

18.7 MB

References

  1. http://disease-ontology.org.Google ScholarGoogle Scholar
  2. https://www.snomed.org/snomed-ct/five-step-briefing.Google ScholarGoogle Scholar
  3. https://dl.acm.org/sig/sigmod.Google ScholarGoogle Scholar
  4. Leman Akoglu, Duen Horng Chau, U Kang, Danai Koutra, and Christos Faloutsos. 2012. Opavion: Mining and visualization in large graphs. In SIGMOD. 717--720.Google ScholarGoogle Scholar
  5. Ilio Catallo, Eleonora Ciceri, Piero Fraternali, Davide Martinenghi, and Marco Tagliasacchi. 2013. Top-k diversity queries over bounded regions. TODS, Vol. 38, 2 (2013), 10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. vS ejla vC ebirić, Francc ois Goasdoué, and Ioana Manolescu. 2015. Query-oriented summarization of RDF graphs. PVLDB, Vol. 8, 12 (2015), 2012--2015.Google ScholarGoogle Scholar
  7. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  8. Georgios Fakas, Zhi Cai, and Nikos Mamoulis. 2015. Diverse and proportional size-l object summaries for keyword search. In SIGMOD. 363--375.Google ScholarGoogle Scholar
  9. Georgios J Fakas, Zhi Cai, and Nikos Mamoulis. 2011. Size-l object summaries for relational keyword search. PVLDB, Vol. 5, 3 (2011), 229--240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wenfei Fan, Xin Wang, and Yinghui Wu. 2013. Diversified top-k graph pattern matching. PVLDB, Vol. 6, 13 (2013), 1510--1521.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xiangyang Gou, Lei Zou, Chenxingyu Zhao, and Tong Yang. 2019. Fast and Accurate Graph Stream Summarization. In ICDE. 1118--1129.Google ScholarGoogle Scholar
  12. Jinbin Huang, Xin Huang, Yuanyuan Zhu, and Jianliang Xu. 2019. Parameter-free Structural Diversity Search. In WISE. 677--693.Google ScholarGoogle Scholar
  13. Xin Huang, Hong Cheng, Rong-Hua Li, Lu Qin, and Jeffrey Xu Yu. 2013. Top-k structural diversity search in large networks. PVLDB, Vol. 6, 13 (2013), 1618--1629.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xin Huang, Byron Choi, Jianliang Xu, William K Cheung, Yanchun Zhang, and Jiming Liu. 2017. Ontology-based Graph Visualization for Summarized View. In CIKM. 2115--2118.Google ScholarGoogle Scholar
  15. X Jing, JJ Cimino, et al. 2014. A Complementary Graphical Method for Reducing and Analyzing Large Data Sets. Methods of information in medicine, Vol. 53, 3 (2014), 173--185.Google ScholarGoogle Scholar
  16. Xia Jing and James J Cimino. 2011. Graphical methods for reducing, visualizing and analyzing large data sets using hierarchical terminologies. In AMIA Annual Symposium Proceedings, Vol. 2011. 635.Google ScholarGoogle Scholar
  17. Richard M Karp. 1972. Reducibility among combinatorial problems. In Complexity of computer computations. 85--103.Google ScholarGoogle Scholar
  18. K Ashwin Kumar and Petros Efstathopoulos. 2018. Utility-driven graph summarization. PVLDB, Vol. 12, 4 (2018), 335--347.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci., Vol. 407, 1--3 (2008), 458--473.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Farzaneh Mahdisoltani, Joanna Biega, and Fabian M Suchanek. 2013. Yago3: A knowledge base from multilingual wikipedias.Google ScholarGoogle Scholar
  21. Lu Qin, Jeffrey Xu Yu, and Lijun Chang. 2012. Diversifying top-k results. PVLDB, Vol. 5, 11 (2012), 1124--1135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sayan Ranu, Minh Hoang, and Ambuj Singh. 2014. Answering top-k representative queries on graph databases. In SIGMOD. 1163--1174.Google ScholarGoogle Scholar
  23. Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. 4292--4293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 1998. A metric for distributions with applications to image databases. In ICCV. 59--66.Google ScholarGoogle Scholar
  25. Yufei Tao, Yuanbing Li, and Guoliang Li. 2019. Interactive graph search. In SIGMOD. 1393--1410.Google ScholarGoogle Scholar
  26. Yuanyuan Tian, Richard A Hankins, and Jignesh M Patel. 2008. Efficient aggregation for graph summarization. In SIGMOD. 567--580.Google ScholarGoogle Scholar
  27. You Wu, Junyang Gao, Pankaj K Agarwal, and Jun Yang. 2017. Finding diverse, high-value representatives on a surface of answers. PVLDB, Vol. 10, 7 (2017), 793--804.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X Yang, CM Procopiuc, and D Srivastava. 2011. Summary graphs for relational database schemas. PVLDB, Vol. 4, 11 (2011), 899--910.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhengwei Yang, Ada Wai-Chee Fu, and Ruifeng Liu. 2016. Diversified top-k subgraph querying in a large graph. In SIGMOD. 1167--1182.Google ScholarGoogle Scholar
  30. Long Yuan, Lu Qin, Xuemin Lin, Lijun Chang, and Wenjie Zhang. 2016. Diversified top-k clique search. The VLDB Journal, Vol. 25, 2 (2016), 171--196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúvs Medo, Joseph Rushton Wakeling, and Yi-Cheng Zhang. 2010. Solving the apparent diversity-accuracy dilemma of recommender systems. PNAS, Vol. 107, 10 (2010), 4511--4515.Google ScholarGoogle ScholarCross RefCross Ref
  32. Xuliang Zhu, Xin Huang, Byron Choi, Jianliang Xu, William K. Cheung, Yanchun Zhang, and Jiming Liu. 2020. Ontology-based Graph Visualization for Summarized View.Google ScholarGoogle Scholar

Index Terms

  1. Top-k Graph Summarization on Hierarchical DAGs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
          October 2020
          3619 pages
          ISBN:9781450368599
          DOI:10.1145/3340531

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 October 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader