Abstract
We propose a fast algorithm for approximating graph similarities. For its advantageous semantic and algorithmic properties, we define the similarity between two graphs by the Jaccard-similarity of their images in a binary feature space spanned by the set of frequent subtrees generated for some training dataset. Since the feature space embedding is computationally intractable, we use a probabilistic subtree isomorphism operator based on a small sample of random spanning trees and approximate the Jaccard-similarity by min-hash sketches. The partial order on the feature set defined by subgraph isomorphism allows for a fast calculation of the min-hash sketch, without explicitly performing the feature space embedding. Experimental results on real-world graph datasets show that our technique results in a fast algorithm. Furthermore, the approximated similarities are well-suited for classification and retrieval tasks in large graph datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
In practice, we do not store the patterns in \({\textsc {Sketch}}_{\pi _1,\ldots ,\pi _K}(G)\) explicitly. Instead, we define some arbitrary total order on \(\mathcal {F}\) and represent each pattern by its position according to this order.
References
Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences, pp. 21–29. IEEE (1997)
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)
Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. Trans. Knowl. Data Eng. 17(8), 1036–1050 (2005)
Diestel, R.: Graph Theory. Graduate Texts in Mathematics, vol. 173, 4th edn. Springer, Heidelberg (2012). http://dblp.dagstuhl.de/rec/bib/books/daglib/0030488
Geppert, H., Horváth, T., Gärtner, T., Wrobel, S., Bajorath, J.: Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2D fingerprints and multiple reference compounds. J. Chem. Inf. Model. 48(4), 742–746 (2008)
Horváth, T., Bringmann, B., Raedt, L.: Frequent hypergraph mining. In: Inoue, K., Ohwada, H., Yamamoto, A. (eds.) ILP 2006. LNCS (LNAI), vol. 4455, pp. 244–259. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73847-3_26
Horváth, T., Ramon, J.: Efficient frequent connected subgraph mining in graphs of bounded tree-width. Theor. Comput. Sci. 411(31–33), 2784–2797 (2010)
Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)
Shamir, R., Tsur, D.: Faster subtree isomorphism. J. Algorithms 33(2), 267–280 (1999). doi:10.1006/jagm.1999.1044
Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A.J., Vishwanathan, S.V.N.: Hash kernels for structured data. J. Mach. Learn. Res. 10, 2615–2637 (2009)
Teixeira, C.H.C., Silva, A., Meira Jr., W.: Min-hash fingerprints for graph kernels: a trade-off among accuracy, efficiency, and compression. J. Inf. Data Manag. 3(3), 227–242 (2012)
Welke, P., Horváth, T., Wrobel, S.: Probabilistic frequent subtree kernels. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2015. LNCS (LNAI), vol. 9607, pp. 179–193. Springer, Heidelberg (2016). doi:10.1007/978-3-319-39315-5_12
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Welke, P., Horváth, T., Wrobel, S. (2016). Min-Hashing for Probabilistic Frequent Subtree Feature Spaces. In: Calders, T., Ceci, M., Malerba, D. (eds) Discovery Science. DS 2016. Lecture Notes in Computer Science(), vol 9956. Springer, Cham. https://doi.org/10.1007/978-3-319-46307-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-46307-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46306-3
Online ISBN: 978-3-319-46307-0
eBook Packages: Computer ScienceComputer Science (R0)