research-article

Grand: A Fast and Accurate Graph Retrieval Framework via Knowledge Distillation

Authors:

Xiaohong GuanAuthors Info & Claims

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1639 - 1648

https://doi.org/10.1145/3626772.3657773

Published: 11 July 2024 Publication History

Abstract

Graph retrieval aims to find the most similar graphs in a graph database given a query graph, which is a fundamental problem with many real-world applications in chemical engineering, code analysis, etc. To date, existing neural graph retrieval methods generally fall into two categories: Embedding Based Paradigm (Ebp) and Matching Based Paradigm (Mbp). The Ebp models learn an individual vectorial representation for each graph and the retrieval process can be accelerated by pre-computing these representations. The Mbp models learn a neural matching function to compare graphs on a pair-by-pair basis, in which the fine-grained pairwise comparison leads to higher retrieval accuracy but severely degrades retrieval efficiency. In this paper, to combine the advantage of Ebp in retrieval efficiency with that of Mbp in retrieval accuracy, we propose a novel Graph RetrievAl framework via KNowledge Distillation, namely GRAND. The key point is to leverage the idea of knowledge distillation to transfer the fine-grained graph comparison knowledge from an Mbp model to an Ebp model, such that the Ebp model can generate better graph representations and thus yield higher retrieval accuracy. At the same time, we can still pre-compute and index the improved graph representations to retain the retrieval speed of Ebp. Towards this end, we propose to perform knowledge distillation from three perspectives: score, node, and subgraph levels. In addition, we propose to perform mutual two-way knowledge transfer between Mbp and Ebp, such that Mbp and Ebp complement and benefit each other. Extensive experiments on three real-world datasets show that GRAND improves the performance of Ebp by a large margin and the improvement is consistent for different combinations of Ebp and Mbp models. For example, GRAND achieves performance gains of mostly more than 10% and up to 16.88% in terms of Recall@K on different datasets.

References

[1]

Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, Yizhou Sun, and Wei Wang. 2019. Simgnn: A neural network approach to fast graph similarity computation. In WSDM.

Digital Library

[2]

Yunsheng Bai, Hao Ding, Ken Gu, Yizhou Sun, and Wei Wang. 2020. Learningbased efficient graph similarity computation via multi-scale convolutional set matching. In AAAI.

[3]

Horst Bunke and Kim Shearer. 1998. A graph distance metric based on the maximal common subgraph. Pattern recognition letters (1998).

[4]

Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In ICML.

[5]

Xiaoyang Chen, Hongwei Huo, Jun Huan, and Jeffrey Scott Vitter. 2019. An efficient algorithm for graph edit distance computation. Knowledge-Based Systems (2019).

[6]

Zhengdao Chen, Lei Chen, Soledad Villar, and Joan Bruna. 2020. Can graph neural networks count substructures?. In NeurIPS.

[7]

Tyler Derr, Hamid Karimi, Xiaorui Liu, Jiejun Xu, and Jiliang Tang. 2021. Deep adversarial network alignment. In CIKM.

[8]

Matthias Fey, Jan E Lenssen, Christopher Morris, Jonathan Masci, and Nils M Kriege. 2020. Deep Graph Matching Consensus. In ICLR.

[9]

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In ICML.

[10]

Kunal Goyal, Utkarsh Gupta, Abir De, and Soumen Chakrabarti. 2020. Deep Neural Matching Models for Graph Retrieval. In SIGIR.

[11]

William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NIPS.

[12]

Peter E Hart, Nils J Nilsson, and Bertram Raphael. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics (1968).

[13]

Mark Heimann, Haoming Shen, Tara Safavi, and Danai Koutra. 2018. Regal: Representation learning-based graph alignment. In CIKM.

Digital Library

[14]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[15]

Elad Hoffer and Nir Ailon. 2015. Deep metric learning using triplet network. In SIMBAD. Springer.

[16]

Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020. Embeddingbased retrieval in facebook search. In KDD.

[17]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with gpus. IEEE Transactions on Big Data (2019).

[18]

George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing (1998).

Digital Library

[19]

Jongik Kim, Dong-Hoon Choi, and Chen Li. 2019. Inves: Incremental Partitioning- Based Verification for Graph Similarity Search. In EDBT.

[20]

Thomas N Kipf and MaxWelling. 2017. Semi-supervised classification with graph convolutional networks. In ICLR.

[21]

Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. 2019. Graph matching networks for learning the similarity of graph structured objects. In ICML.

[22]

Yongjiang Liang and Peixiang Zhao. 2017. Similarity search in graph databases: A multi-layered indexing approach. In ICDE.

[23]

Xiang Ling, Lingfei Wu, Saizhuo Wang, Tengfei Ma, Fangli Xu, Alex X Liu, Chunming Wu, and Shouling Ji. 2021. Multilevel Graph Matching Networks for Deep Graph Similarity Learning. TNNLS (2021).

[24]

Xiang Ling, Lingfei Wu, Saizhuo Wang, Gaoning Pan, Tengfei Ma, Fangli Xu, Alex X Liu, Chunming Wu, and Shouling Ji. 2021. Deep Graph Matching and Searching for Semantic Code Retrieval. TKDD (2021).

[25]

Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang, and Lifeng Shang. 2020. Neural subgraph isomorphism counting. In KDD.

[26]

Michel Neuhaus and Horst Bunke. 2007. Bridging the gap between graph edit distance and kernel machines. World Scientific.

[27]

Ferran Parés, Dario Garcia Gasulla, Armand Vilalta, Jonatan Moreno, Eduard Ayguadé, Jesús Labarta, Ulises Cortés, and Toyotaro Suzumura. 2017. Fluid communities: A competitive, scalable and diverse community detection algorithm. In Complex Networks & Their Applications.

[28]

Can Qin, Handong Zhao, Lichen Wang, Huan Wang, Yulun Zhang, and Yun Fu. 2021. Slow learning and fast inference: Efficient graph similarity computation via knowledge distillation. Advances in Neural Information Processing Systems 34 (2021), 14110--14121.

[29]

Zongyue Qin, Yunsheng Bai, and Yizhou Sun. 2020. GHashing: Semantic Graph Hashing for Approximate Similarity Search in Graph Databases. In KDD.

[30]

Sayan Ranu, Minh Hoang, and Ambuj Singh. 2014. Answering top-k representative queries on graph databases. In SIGMOD.

[31]

Haichuan Shang, Xuemin Lin, Ying Zhang, Jeffrey Xu Yu, and Wei Wang. 2010. Connected substructure similarity search. In SIGMOD.

[32]

Damian Szklarczyk, Alberto Santos, Christian Von Mering, Lars Juhl Jensen, Peer Bork, and Michael Kuhn. 2016. STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic acids research (2016).

[33]

Michael Taylor, John Guiver, Stephen Robertson, and Tom Minka. 2008. Softrank: optimizing non-smooth rank metrics. In WSDM.

[34]

Yuanyuan Tian, Richard C Mceachin, Carlos Santos, David J States, and JigneshM Patel. 2007. SAGA: a subgraph matching tool for biological graphs. Bioinformatics (2007).

[35]

Esko Ukkonen. 1992. Approximate string-matching with q-grams and maximal matches. Theoretical computer science (1992).

[36]

Petar Veli?kovi?, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In ICLR.

[37]

Guoren Wang, Bin Wang, Xiaochun Yang, and Ge Yu. 2010. Efficiently indexing large sparse graphs for similarity search. TKDE (2010).

[38]

Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise approach to learning to rank: theory and algorithm. In ICML.

[39]

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How powerful are graph neural networks?. In ICLR.

[40]

Xifeng Yan, Philip S Yu, and Jiawei Han. 2005. Substructure similarity search in graph databases. In SIGMOD.

[41]

Cheng Yang, Jiawei Liu, and Chuan Shi. 2021. Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework. In Proceedings of the web conference 2021. 1227--1237.

Digital Library

[42]

Zhitao Ying, Andrew Wang, Jiaxuan You, Chengtao Wen, Arquimedes Canedo, and Jure Leskovec. 2020. Neural subgraph matching. arXiv preprint arXiv:2007.03092 (2020).

[43]

Ye Yuan, Guoren Wang, Jeffery Yu Xu, and Lei Chen. 2015. Efficient distributed subgraph similarity matching. The VLDB Journal (2015).

[44]

Zhiping Zeng, Anthony KH Tung, Jianyong Wang, Jianhua Feng, and Lizhu Zhou. 2009. Comparing stars: On approximating graph edit distance. In VLDB.

[45]

Shijie Zhang, Jiong Yang, and Wei Jin. 2010. Sapper: Subgraph indexing and approximate matching in large graphs. In VLDB.

Digital Library

[46]

Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. 2018. Deep mutual learning. In CVPR.

[47]

Zhen Zhang, Jiajun Bu, Martin Ester, Zhao Li, Chengwei Yao, Zhi Yu, and Can Wang. 2021. H2MN: Graph Similarity Learning with Hierarchical Hypergraph Matching Networks. In KDD.

[48]

Xiang Zhao, Chuan Xiao, Xuemin Lin, Qing Liu, and Wenjie Zhang. 2013. A partition-based approach to structure similarity search. In VLDB.

[49]

Xiang Zhao, Chuan Xiao, Xuemin Lin, Wei Wang, and Yoshiharu Ishikawa. 2013. Efficient processing of graph similarity queries with edit distance constraints. The VLDB Journal (2013).

[50]

Gaoping Zhu, Xuemin Lin, Ke Zhu, Wenjie Zhang, and Jeffrey Xu Yu. 2012. TreeSpan: efficiently computing similarity all-matching. In SIGMOD.

Cited By

Zhou ZWang PLiang ZZhang RBai HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)PAIR: Pre-denosing Augmented Image Retrieval Model for Defending Adversarial PatchesProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681398(5771-5779)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681398

Index Terms

Grand: A Fast and Accurate Graph Retrieval Framework via Knowledge Distillation
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Recommendations

Degree Reduction in Labeled Graph Retrieval

Within a given collection of graphs, a graph retrieval system may seek all graphs that contain a given graph, or may instead seek all graphs that are contained within a given graph. Although subgraph isomorphism is worst-case exponential, it may be ...
Hypergraph-based image retrieval for graph-based representation

In this paper, we introduce a novel method for graph indexing. We propose a hypergraph-based model for graph data sets by allowing cluster overlapping. More precisely, in this representation one graph can be assigned to more than one cluster. Using the ...
ItrievalKD: An Iterative Retrieval Framework Assisted with Knowledge Distillation for Noisy Text-to-Image Retrieval
Advances in Knowledge Discovery and Data Mining
Abstract
Benefiting from the superiority of the pretraining paradigm on large-scale multi-modal data, current cross-modal pretrained models (such as CLIP) have shown excellent performance on text-to-image retrieval. However, the current research mainly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2024

3164 pages

ISBN:9798400704314

DOI:10.1145/3626772

General Chairs:
Grace Hui Yang
Georgetown University, USA
,
Hongning Wang
Tsinghua University, China
,
Sam Han
The Washington Post, USA
,
Program Chairs:
Claudia Hauff
Spotify, Netherlands
,
Guido Zuccon
The University of Queensland, Australia
,
Yi Zhang
University of California Santa Cruz, USA

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

SIGIR 2024

Sponsor:

SIGIR

SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 14 - 18, 2024

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
256
Total Downloads

Downloads (Last 12 months)256
Downloads (Last 6 weeks)37

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou ZWang PLiang ZZhang RBai HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)PAIR: Pre-denosing Augmented Image Retrieval Model for Defending Adversarial PatchesProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681398(5771-5779)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681398

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten