research-article

Top-k Graph Summarization on Hierarchical DAGs

Authors:
Xuliang Zhu

Hong Kong Baptist University, Hong Kong, Hong Kong

Hong Kong Baptist University, Hong Kong, Hong Kong
View Profile

,
Xin Huang

Hong Kong Baptist University, Hong Kong, Hong Kong

Hong Kong Baptist University, Hong Kong, Hong Kong
View Profile

,
Byron Choi

Hong Kong Baptist University, Hong Kong, Hong Kong

Hong Kong Baptist University, Hong Kong, Hong Kong
View Profile

,
Jianliang Xu

Hong Kong Baptist University, Hong Kong, Hong Kong

Hong Kong Baptist University, Hong Kong, Hong Kong
View Profile

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementOctober 2020Pages 1903–1912https://doi.org/10.1145/3340531.3411899

Published:19 October 2020Publication History

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 1903–1912

ABSTRACT

Directed acyclic graph (DAG) is an essentially important model to represent terminologies and their hierarchical relationships, such as Disease Ontology. Due to massive terminologies and complex structures in a large DAG, it is challenging to summarize the whole hierarchical DAG.

In this paper, we study a new problem of finding k representative vertices to summarize a hierarchical DAG. To depict diverse summarization and important vertices, we design a summary score function for capturing vertices' diversity coverage and structure correlation. The studied problem is theoretically proven to be NP-hard. To efficiently tackle it, we propose a greedy algorithm with an approximation guarantee, which iteratively adds vertices with the large summary contributions into answers. To further improve answer quality, we propose a subtree extraction based method, which is proven to guarantee achieving higher-quality answers. In addition, we develop a scalable algorithm k-PCGS based on candidate pruning and DAG compression for large-scale hierarchical DAGs. Extensive experiments on large real-world datasets demonstrate both the effectiveness and efficiency of proposed algorithms.

Supplemental Material

3340531.3411899.mp4

mp4

18.7 MB

Download

References

http://disease-ontology.org.Google Scholar
https://www.snomed.org/snomed-ct/five-step-briefing.Google Scholar
https://dl.acm.org/sig/sigmod.Google Scholar
Leman Akoglu, Duen Horng Chau, U Kang, Danai Koutra, and Christos Faloutsos. 2012. Opavion: Mining and visualization in large graphs. In SIGMOD. 717--720.Google Scholar
Ilio Catallo, Eleonora Ciceri, Piero Fraternali, Davide Martinenghi, and Marco Tagliasacchi. 2013. Top-k diversity queries over bounded regions. TODS, Vol. 38, 2 (2013), 10.Google ScholarDigital Library
vS ejla vC ebirić, Francc ois Goasdoué, and Ioana Manolescu. 2015. Query-oriented summarization of RDF graphs. PVLDB, Vol. 8, 12 (2015), 2012--2015.Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. 248--255.Google ScholarCross Ref
Georgios Fakas, Zhi Cai, and Nikos Mamoulis. 2015. Diverse and proportional size-l object summaries for keyword search. In SIGMOD. 363--375.Google Scholar
Georgios J Fakas, Zhi Cai, and Nikos Mamoulis. 2011. Size-l object summaries for relational keyword search. PVLDB, Vol. 5, 3 (2011), 229--240.Google ScholarDigital Library
Wenfei Fan, Xin Wang, and Yinghui Wu. 2013. Diversified top-k graph pattern matching. PVLDB, Vol. 6, 13 (2013), 1510--1521.Google ScholarDigital Library
Xiangyang Gou, Lei Zou, Chenxingyu Zhao, and Tong Yang. 2019. Fast and Accurate Graph Stream Summarization. In ICDE. 1118--1129.Google Scholar
Jinbin Huang, Xin Huang, Yuanyuan Zhu, and Jianliang Xu. 2019. Parameter-free Structural Diversity Search. In WISE. 677--693.Google Scholar
Xin Huang, Hong Cheng, Rong-Hua Li, Lu Qin, and Jeffrey Xu Yu. 2013. Top-k structural diversity search in large networks. PVLDB, Vol. 6, 13 (2013), 1618--1629.Google ScholarDigital Library
Xin Huang, Byron Choi, Jianliang Xu, William K Cheung, Yanchun Zhang, and Jiming Liu. 2017. Ontology-based Graph Visualization for Summarized View. In CIKM. 2115--2118.Google Scholar
X Jing, JJ Cimino, et al. 2014. A Complementary Graphical Method for Reducing and Analyzing Large Data Sets. Methods of information in medicine, Vol. 53, 3 (2014), 173--185.Google Scholar
Xia Jing and James J Cimino. 2011. Graphical methods for reducing, visualizing and analyzing large data sets using hierarchical terminologies. In AMIA Annual Symposium Proceedings, Vol. 2011. 635.Google Scholar
Richard M Karp. 1972. Reducibility among combinatorial problems. In Complexity of computer computations. 85--103.Google Scholar
K Ashwin Kumar and Petros Efstathopoulos. 2018. Utility-driven graph summarization. PVLDB, Vol. 12, 4 (2018), 335--347.Google ScholarDigital Library
Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci., Vol. 407, 1--3 (2008), 458--473.Google ScholarDigital Library
Farzaneh Mahdisoltani, Joanna Biega, and Fabian M Suchanek. 2013. Yago3: A knowledge base from multilingual wikipedias.Google Scholar
Lu Qin, Jeffrey Xu Yu, and Lijun Chang. 2012. Diversifying top-k results. PVLDB, Vol. 5, 11 (2012), 1124--1135.Google ScholarDigital Library
Sayan Ranu, Minh Hoang, and Ambuj Singh. 2014. Answering top-k representative queries on graph databases. In SIGMOD. 1163--1174.Google Scholar
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. 4292--4293.Google ScholarDigital Library
Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 1998. A metric for distributions with applications to image databases. In ICCV. 59--66.Google Scholar
Yufei Tao, Yuanbing Li, and Guoliang Li. 2019. Interactive graph search. In SIGMOD. 1393--1410.Google Scholar
Yuanyuan Tian, Richard A Hankins, and Jignesh M Patel. 2008. Efficient aggregation for graph summarization. In SIGMOD. 567--580.Google Scholar
You Wu, Junyang Gao, Pankaj K Agarwal, and Jun Yang. 2017. Finding diverse, high-value representatives on a surface of answers. PVLDB, Vol. 10, 7 (2017), 793--804.Google ScholarDigital Library
X Yang, CM Procopiuc, and D Srivastava. 2011. Summary graphs for relational database schemas. PVLDB, Vol. 4, 11 (2011), 899--910.Google ScholarDigital Library
Zhengwei Yang, Ada Wai-Chee Fu, and Ruifeng Liu. 2016. Diversified top-k subgraph querying in a large graph. In SIGMOD. 1167--1182.Google Scholar
Long Yuan, Lu Qin, Xuemin Lin, Lijun Chang, and Wenjie Zhang. 2016. Diversified top-k clique search. The VLDB Journal, Vol. 25, 2 (2016), 171--196.Google ScholarDigital Library
Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúvs Medo, Joseph Rushton Wakeling, and Yi-Cheng Zhang. 2010. Solving the apparent diversity-accuracy dilemma of recommender systems. PNAS, Vol. 107, 10 (2010), 4511--4515.Google ScholarCross Ref
Xuliang Zhu, Xin Huang, Byron Choi, Jianliang Xu, William K. Cheung, Yanchun Zhang, and Jiming Liu. 2020. Ontology-based Graph Visualization for Summarized View.Google Scholar

Index Terms

Top-k Graph Summarization on Hierarchical DAGs
1. Information systems
  1. Data management systems
    1. Database design and models
      1. Graph-based database models
        Hierarchical data models
  2. Information retrieval
    1. Retrieval tasks and goals
      1. Summarization
2. Theory of computation
  1. Design and analysis of algorithms
    1. Graph algorithms analysis

Recommendations

XML compression via DAGs
ICDT '13: Proceedings of the 16th International Conference on Database Theory

Unranked trees can be represented using their minimal dag (directed acyclic graph). For XML this achieves high compression ratios due to their repetitive mark up. Unranked trees are often represented through first child/next sibling (fcns) encoded ...
Read More
A Graph Grammar to Transform DAGs into Graphs Describing Multithreaded Programs
WEIT '11: Proceedings of the 2011 Workshop-School on Theoretical Computer Science

The scheduling of tasks in a parallel program is an NP-complete problem, where scheduling tasks over multiple processing units requires an effective strategy to maximize the exploitation of the parallel hardware. Several studies focus on the scheduling ...
Read More
Algorithms for finding maximum transitive subtournaments

The problem of finding a maximum clique is a fundamental problem for undirected graphs, and it is natural to ask whether there are analogous computational problems for directed graphs. Such a problem is that of finding a maximum transitive subtournament ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
General Chairs:
Mathieu d'Aquin
DSI, Insight, NUI Galway, Ireland
,
Stefan Dietze
GESIS, Cologne, Germany, Heinrich-Heine-University Düsseldorf, Germany, L3S Research Center, Germany
,
Program Chairs:
Claudia Hauff
TU Delft, The Netherlands
,
Edward Curry
DSI, Insight, NUI Galway, Ireland
,
Philippe Cudre Mauroux
eXascale, University of Fribourg, Switzerland
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data summarization
directed acyclic graph
hierarchy
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 448
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Top-k Graph Summarization on Hierarchical DAGs

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

XML compression via DAGs

A Graph Grammar to Transform DAGs into Graphs Describing Multithreaded Programs

Algorithms for finding maximum transitive subtournaments