ABSTRACT
Directed acyclic graph (DAG) is an essentially important model to represent terminologies and their hierarchical relationships, such as Disease Ontology. Due to massive terminologies and complex structures in a large DAG, it is challenging to summarize the whole hierarchical DAG.
In this paper, we study a new problem of finding k representative vertices to summarize a hierarchical DAG. To depict diverse summarization and important vertices, we design a summary score function for capturing vertices' diversity coverage and structure correlation. The studied problem is theoretically proven to be NP-hard. To efficiently tackle it, we propose a greedy algorithm with an approximation guarantee, which iteratively adds vertices with the large summary contributions into answers. To further improve answer quality, we propose a subtree extraction based method, which is proven to guarantee achieving higher-quality answers. In addition, we develop a scalable algorithm k-PCGS based on candidate pruning and DAG compression for large-scale hierarchical DAGs. Extensive experiments on large real-world datasets demonstrate both the effectiveness and efficiency of proposed algorithms.
Supplemental Material
- http://disease-ontology.org.Google Scholar
- https://www.snomed.org/snomed-ct/five-step-briefing.Google Scholar
- https://dl.acm.org/sig/sigmod.Google Scholar
- Leman Akoglu, Duen Horng Chau, U Kang, Danai Koutra, and Christos Faloutsos. 2012. Opavion: Mining and visualization in large graphs. In SIGMOD. 717--720.Google Scholar
- Ilio Catallo, Eleonora Ciceri, Piero Fraternali, Davide Martinenghi, and Marco Tagliasacchi. 2013. Top-k diversity queries over bounded regions. TODS, Vol. 38, 2 (2013), 10.Google ScholarDigital Library
- vS ejla vC ebirić, Francc ois Goasdoué, and Ioana Manolescu. 2015. Query-oriented summarization of RDF graphs. PVLDB, Vol. 8, 12 (2015), 2012--2015.Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. 248--255.Google ScholarCross Ref
- Georgios Fakas, Zhi Cai, and Nikos Mamoulis. 2015. Diverse and proportional size-l object summaries for keyword search. In SIGMOD. 363--375.Google Scholar
- Georgios J Fakas, Zhi Cai, and Nikos Mamoulis. 2011. Size-l object summaries for relational keyword search. PVLDB, Vol. 5, 3 (2011), 229--240.Google ScholarDigital Library
- Wenfei Fan, Xin Wang, and Yinghui Wu. 2013. Diversified top-k graph pattern matching. PVLDB, Vol. 6, 13 (2013), 1510--1521.Google ScholarDigital Library
- Xiangyang Gou, Lei Zou, Chenxingyu Zhao, and Tong Yang. 2019. Fast and Accurate Graph Stream Summarization. In ICDE. 1118--1129.Google Scholar
- Jinbin Huang, Xin Huang, Yuanyuan Zhu, and Jianliang Xu. 2019. Parameter-free Structural Diversity Search. In WISE. 677--693.Google Scholar
- Xin Huang, Hong Cheng, Rong-Hua Li, Lu Qin, and Jeffrey Xu Yu. 2013. Top-k structural diversity search in large networks. PVLDB, Vol. 6, 13 (2013), 1618--1629.Google ScholarDigital Library
- Xin Huang, Byron Choi, Jianliang Xu, William K Cheung, Yanchun Zhang, and Jiming Liu. 2017. Ontology-based Graph Visualization for Summarized View. In CIKM. 2115--2118.Google Scholar
- X Jing, JJ Cimino, et al. 2014. A Complementary Graphical Method for Reducing and Analyzing Large Data Sets. Methods of information in medicine, Vol. 53, 3 (2014), 173--185.Google Scholar
- Xia Jing and James J Cimino. 2011. Graphical methods for reducing, visualizing and analyzing large data sets using hierarchical terminologies. In AMIA Annual Symposium Proceedings, Vol. 2011. 635.Google Scholar
- Richard M Karp. 1972. Reducibility among combinatorial problems. In Complexity of computer computations. 85--103.Google Scholar
- K Ashwin Kumar and Petros Efstathopoulos. 2018. Utility-driven graph summarization. PVLDB, Vol. 12, 4 (2018), 335--347.Google ScholarDigital Library
- Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci., Vol. 407, 1--3 (2008), 458--473.Google ScholarDigital Library
- Farzaneh Mahdisoltani, Joanna Biega, and Fabian M Suchanek. 2013. Yago3: A knowledge base from multilingual wikipedias.Google Scholar
- Lu Qin, Jeffrey Xu Yu, and Lijun Chang. 2012. Diversifying top-k results. PVLDB, Vol. 5, 11 (2012), 1124--1135.Google ScholarDigital Library
- Sayan Ranu, Minh Hoang, and Ambuj Singh. 2014. Answering top-k representative queries on graph databases. In SIGMOD. 1163--1174.Google Scholar
- Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. 4292--4293.Google ScholarDigital Library
- Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 1998. A metric for distributions with applications to image databases. In ICCV. 59--66.Google Scholar
- Yufei Tao, Yuanbing Li, and Guoliang Li. 2019. Interactive graph search. In SIGMOD. 1393--1410.Google Scholar
- Yuanyuan Tian, Richard A Hankins, and Jignesh M Patel. 2008. Efficient aggregation for graph summarization. In SIGMOD. 567--580.Google Scholar
- You Wu, Junyang Gao, Pankaj K Agarwal, and Jun Yang. 2017. Finding diverse, high-value representatives on a surface of answers. PVLDB, Vol. 10, 7 (2017), 793--804.Google ScholarDigital Library
- X Yang, CM Procopiuc, and D Srivastava. 2011. Summary graphs for relational database schemas. PVLDB, Vol. 4, 11 (2011), 899--910.Google ScholarDigital Library
- Zhengwei Yang, Ada Wai-Chee Fu, and Ruifeng Liu. 2016. Diversified top-k subgraph querying in a large graph. In SIGMOD. 1167--1182.Google Scholar
- Long Yuan, Lu Qin, Xuemin Lin, Lijun Chang, and Wenjie Zhang. 2016. Diversified top-k clique search. The VLDB Journal, Vol. 25, 2 (2016), 171--196.Google ScholarDigital Library
- Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúvs Medo, Joseph Rushton Wakeling, and Yi-Cheng Zhang. 2010. Solving the apparent diversity-accuracy dilemma of recommender systems. PNAS, Vol. 107, 10 (2010), 4511--4515.Google ScholarCross Ref
- Xuliang Zhu, Xin Huang, Byron Choi, Jianliang Xu, William K. Cheung, Yanchun Zhang, and Jiming Liu. 2020. Ontology-based Graph Visualization for Summarized View.Google Scholar
Index Terms
- Top-k Graph Summarization on Hierarchical DAGs
Recommendations
XML compression via DAGs
ICDT '13: Proceedings of the 16th International Conference on Database TheoryUnranked trees can be represented using their minimal dag (directed acyclic graph). For XML this achieves high compression ratios due to their repetitive mark up. Unranked trees are often represented through first child/next sibling (fcns) encoded ...
A Graph Grammar to Transform DAGs into Graphs Describing Multithreaded Programs
WEIT '11: Proceedings of the 2011 Workshop-School on Theoretical Computer ScienceThe scheduling of tasks in a parallel program is an NP-complete problem, where scheduling tasks over multiple processing units requires an effective strategy to maximize the exploitation of the parallel hardware. Several studies focus on the scheduling ...
Algorithms for finding maximum transitive subtournaments
The problem of finding a maximum clique is a fundamental problem for undirected graphs, and it is natural to ask whether there are analogous computational problems for directed graphs. Such a problem is that of finding a maximum transitive subtournament ...
Comments