Abstract
Graph indexing and querying mechanisms have been receiving significant attention due to their importance in analyzing the growing graph datasets in many domains. Although much work has been done in the context of simple graphs, they are not directly applicable to hypergraphs that represent more complex relationships in various applications. The key problem here is to search a given subhypergraph query in a larger hypergraph dataset. This search problem is known to be NP-hard as it is related to graph isomorphism. To solve this search problem in an efficient manner, we first create an index set by extracting the common subhypergraph structures from the given hypergraph dataset. Upon receiving a query, we use the same indexing techniques and create a query index set from the given subhypergraph. Utilizing both indices, we identify the possible locations of the query in the hypergraph dataset. We then start the subhypergraph search to verify whether the query really appears at each location by using an accelerated verification mechanism called layer-related-closure method. Through experiments on a real hypergraph dataset and random datasets, we demonstrate the efficiency and effectiveness of hypergraph indexing and our verification method.

















Similar content being viewed by others
Notes
A triangle-shaped hypergraph can represent various relationships in many other applications as well. For instance, consider a biomedical hypergraph where the different medicines are vertices and the hyperedges are the diseases related to each medicine. One may want to find the three diseases that are mutually having the common medicines for treatment.
We implemented the necessary algorithms for constructing the index and doing the matching and verification in Java. The source code of our programs and more experimental results are available at www.cs.utsa.edu/~korkmaz/research/hypergraph. All experiments are conducted on a Intel Xeon CPU(2.40 GHz) with 24GB RAM.
References
Albert R, Barabási AL (2000) Topology of evolving networks: local events and universality. Phys Rev Lett 85(24):5234–5237
Alon A, Yuster R, Zwick U (1997) Finding and counting given length cycles. Algorithmica 17(3):209–223
Berge C (1976) Graphs and hypergraphs, vol 6. Elsevier, Amsterdam
Brualdi Richard A (1979) The diagonal hypergraph of a matrix (bipartite graph). Discrete Math 27(2):127–147
Bunke H, Dickinson P, Kraetzl M (2005) Theoretical and algorithmic framework for hypergraph matching. In: Image analysis and processing—ICIAP 2005, pp 463–470
Bunke H, Dickinson P, Kraetzl M, Neuhaus M, Stettler M (2008) Matching of Hypergraphs—Algorithms, Applications, and Experiments. In: Bunke H, Kandel A, Last M (eds) Applied pattern recognition. Springer, Berlin, pp 131–154
Caldwell AE, Kahng A, Markov IL (1997) Design and implementation of move-based heuristics for VLSI hypergraph partitioning. J Exp Algorithmics (JEA) 5:5
Chu S, Cheng J (2012) Triangle listing in massive networks. ACM Trans Knowl Discov Data (TKDD) 6(4):17
Estrada E, Rodriguez-Velazquez JA (2005) Complex networks as hypergraphs. arXiv preprint physics/0505137
Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. ACM SIGCOMM Comput Commun Rev 29(4):251–262
Giugno R, Shasha D (2002) Graphgrep: a fast and universal method for querying graphs. In: Proceedings of 16th international conference on pattern recognition, IEEE, vol 2, pp 112–115
He H, Singh AK (2006) Closure-tree: An index structure for graph queries. In: Proceedings of the 22nd international conference on data engineering (ICDE), pp 38–38
Horváth T, Bringmann B, De Raedt L (2007) Frequent hypergraph mining. Inductive logic programming. Springer, Berlin
Hwang T, Tian Z, Kuang R, Kocher J-P (2008) Learning on weighted hypergraphs to integrate protein interactions and gene expressions for cancer outcome prediction. In: Eighth IEEE international conference on data mining, 2008 (ICDM’08), pp 293–302
Jiang H, Wang H, Yu PS, Zhou S (2007) Gstring: A novel approach for efficient search in graph databases. In: IEEE 23rd international conference on data engineering, 2007 (ICDE 2007), IEEE, pp 566–575
Kardes H, Gunes MH (2010) Structural graph indexing for mining complex networks. In: EEE 30th international conference on distributed computing systems workshops (ICDCSW), pp 99–104
Karypis G, Aggarwal R, Kumar V, Shekhar S (1997) Multilevel hypergraph partitioning: Application in vlsi domain. In: Proceedings of the 34th annual design automation conference, ACM, pp 526–529
Klamt S, Haus U, Theis F (2009) Hypergraphs and cellular networks. PLoS Comput Biol 5(5):e1000385
Knoke D, Yang S, Kuklinski JH (2008) Social network analysis. Sage Publications, Los Angeles 2
Konstantinova E, Skorobogatov V (2001) Application of hypergraph theory in chemistry. Discrete Math 235(1):365–383
Konstantinova E, Skorobogatov V (1995) Molecular hypergraphs: the new representation of nonclassical molecular structures with polycentric delocalized bonds. J Chem Inf Comput Sci 35(3):472–478
Lin J, Schatz M (2010) Design patterns for efficient graph algorithms in MapReduce. In: Proceedings of the eighth workshop on mining and learning with Graphs, ACM, pp 78–85
Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein J (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727
Papa DA, Markov IL (2006) Hypergraph partitioning and clustering. Approximation algorithms and metaheuristics. CRC, Boca Ratan
Ramadan E, Tarafdar A, Pothen A (2004) A hypergraph model for the yeast protein complex network. In: Proceedings of the 18th international parallel and distributed processing symposium, pp 189
Schank T, Wagner D (2005) Finding, counting and listing all triangles in large graphs, an experimental study. Experimental and efficient algorithms. Springer, Berlin
Sakr S, Al-Naymat G (2010) Graph indexing and querying: a review. Int J Web Inf Syst 6(2):101–120
Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: Proceedings of the 20th international conference on world wide web, ACM, pp 607–614
Tian Y, Patel JM (2008) Tale: A tool for approximate large graph matching. In: IEEE 24th international conference on data engineering (ICDE), pp 963–972
Tsourakakis, Charalampos E (2008) Fast counting of triangles in large real networks without counting: Algorithms and laws. In: Eighth IEEE international conference on data mining, 2008 (ICDM’08), pp 608–617
Wang X, Smalter A, Huan J, Lushington GH (2009) G-hash: towards fast kernel-based similarity search in large graph databases. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 472–480
Watts D, Strogatz S (1998) The small world problem. Collect Dyn Small-World Netw 393:440–442
Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data, pp 335–346
Zhang S, Li S, Yang J (2009) GADDI: distance index based subgraph matching in biological networks. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 192–203
Zhang S, Yang J, Jin W (2010) SAPPER: subgraph indexing and approximate matching in large graphs. Proc VLDB Endow 3(1–2):1185–1194
Ullmann Julian R (1976) An algorithm for subgraph isomorphism. Journal of the ACM (JACM) 23(1):31–42
Acknowledgments
The authors would like to thank the anonymous reviewers as well as Andrew Wichmann for their constructive comments.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The steps for generating the adjacency list from the incidence matrix transpose are shown in Algorithm 9.1.


For the hypergraph in Fig. 1, this algorithm will generate the adjacency list \(AJL\) as \(\{e1: \langle e_2, e_3 \rangle ; e_2: \langle e_1, e_3 \rangle ; e_3: \langle e_1, e_3 \rangle \}\). For each hyperedge \(e_i\), the algorithm goes over each node \(v_j\) to determine whether \(v_j \in e_i\). If so, the algorithm goes over each other hyperedge \(e_k\) to see whether \(v_j\) belongs to \(e_k\). If that is the case too, \(e_i\) and \(e_k\) are adjacent hyperedges, and thus, we add \(e_k\) to the adjacency list of \(e_i\). Accordingly, the time complexity of the algorithm generating the adjacency list would be \(O(m^2n)\).
One problem we may need to mention here is that if we have adjacency list \(\{e1: e2, e3; e2: e1, e3; e3: e1, e3\}\) as in Fig. 1, HITE algorithm in Sect. 5 would extract the same hypertriangle three times as it goes through all of the hyperedges’ possible neighbors. To eliminate such duplicates of the checking, we can modify the HALG algorithm only on the third “for” loop (form \(1 \le k\) to \(i \le k\)) as shown in Algorithm 9.2 so that HITE algorithm would not generate the same hypertriangles.
Now the adjacency list for Fig. 1 will be \(\{e_1: \langle e_2, e_3\rangle ; e_2: \langle e_3\rangle ; e_3:\langle \rangle \}\). Although we do not keep a full set of all the related hyperedges to each hyperedge, this list is easy to generate and allow HITE algorithm to visit only one of the same hypertriangles.
Rights and permissions
About this article
Cite this article
Yu, X., Korkmaz, T. Hypergraph querying using structural indexing and layer-related-closure verification. Knowl Inf Syst 46, 537–565 (2016). https://doi.org/10.1007/s10115-015-0829-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0829-4