A Parallel Algorithm for Frequent Subgraph Mining

Vo, Bay; Nguyen, Dang; Nguyen, Thanh-Long

doi:10.1007/978-3-319-17996-4_15

Bay Vo^5,6,
Dang Nguyen^5,6 &
Thanh-Long Nguyen⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 358))

1403 Accesses
6 Citations

Abstract

Graph mining has practical applications in many areas such as molecular substructure explorer, web link analysis, fraud detection, outlier detection, chemical molecules, and social networks. Frequent subgraph mining is an important topic of graph mining. The mining process is to find all frequent subgraphs over a collection of graphs. Numerous algorithms for mining frequent subgraphs have been proposed; most of them, however, used sequential strategies which are not scalable on large datasets. In this paper, we propose a parallel algorithm to overcome this weakness. Firstly, the multi-core processor architecture is introduced; the way to apply it to data mining is also discussed. Secondly, we present the gSpan algorithm as the basic framework of our algorithm. Finally, we develop an efficient algorithm for mining frequent subgraphs relied on parallel computing. The performance and scalability of the proposed algorithm is illustrated through extensive experiments on two datasets, chemical and compound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nijssen, S., Kok, J.: Frequent graph mining and its application to molecular databases. In: The IEEE International Conference on Systems, Man and Cybernetics (SMC 2004), pp. 4571–4577 (2004)
Google Scholar
Punin, J.R., Krishnamoorthy, M.S., Zaki, M.J.: LOGML: Log markup language for web usage mining. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WebKDD 2001. LNCS (LNAI), vol. 2356, pp. 88–112. Springer, Heidelberg (2002)
Chapter Google Scholar
Eberle, W., Holder, L.: Anomaly detection in data represented as graphs. Intelligent Data Analysis 11, 663–689 (2007)
Google Scholar
Dehaspe, L., Toivonen, H., King, R.: Finding Frequent Substructures in Chemical Compounds. In: KDD, pp. 30–36 (1998)
Google Scholar
Nettleton, D.: Data mining of social networks represented as graphs. Computer Science Review 7, 1–34 (2013)
Article MATH Google Scholar
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: The IEEE International Conference on Data Mining (ICDM 2002), pp. 721–724 (2002)
Google Scholar
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: The IEEE International Conference on Data Mining (ICDM 2003), pp. 549–552 (2003)
Google Scholar
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: The IEEE International Conference on Data Mining (ICDM 2001), pp. 313-320. (2001)
Google Scholar
Gago Alonso, A., Medina Pagola, J.E., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: Mining frequent connected subgraphs reducing the number of candidates. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 365–376. Springer, Heidelberg (2008)
Chapter Google Scholar
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Chapter Google Scholar
Casali, A., Ernst, C.: Extracting Correlated Patterns on Multicore Architectures. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 118–133. Springer, Heidelberg (2013)
Chapter Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: The 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc. (1994)
Google Scholar
Vanetik, N., Gudes, E., Shimony, S.: Computing frequent graph patterns from semistructured data. In: The IEEE International Conference on Data Mining (ICDM 2002), pp. 458–465. IEEE (2002)
Google Scholar
Nguyen, P.C., Washio, T., Ohara, K., Motoda, H.: Using a hash-based method for apriori-based graph mining. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 349–361. Springer, Heidelberg (2004)
Chapter Google Scholar
Ribeiro, P., Silva, F.: G-Tries: a data structure for storing and finding subgraphs. Data Mining and Knowledge Discovery 28, 337–377 (2014)
Article MathSciNet MATH Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD Record, pp. 1–12. ACM (2000)
Google Scholar
Schlegel, B., Karnagel, T., Kiefer, T., Lehner, W.: Scalable frequent itemset mining on many-core processors. In: The 9th International Workshop on Data Management on New Hardware, Article No. 3. ACM (2013)
Google Scholar
Nguyen, D., Vo, B., Le, B.: Efficient Strategies for Parallel Mining Class Association Rules. Expert Systems with Applications 41, 4716–4729 (2014)
Article Google Scholar
Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. The Journal of Supercomputing 66, 94–117 (2013)
Article Google Scholar
Cook, D., Holder, L., Galal, G., Maglothin, R.: Approaches to parallel graph-based knowledge discovery. Journal of Parallel and Distributed Computing 61, 427–446 (2001)
Article MATH Google Scholar
Buehrer, G., Parthasarathy, S., Nguyen, A., Kim, D., Chen, Y.-K., Dubey, P.: Parallel Graph Mining on Shared Memory Architectures. Technical report, Columbus, OH, USA (2005)
Google Scholar
Kessl, R., Talukder, N., Anchuri, P., Zaki, M.: Parallel Graph Mining with GPUs. In: The 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 1–16 (2014)
Google Scholar
Lin, W., Xiao, X., Ghinita, G.: Large-scale frequent subgraph mining in MapReduce. In: The IEEE 30th International Conference on Data Engineering (ICDE 2014), pp. 844–855. IEEE (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Data Science, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Bay Vo & Dang Nguyen
Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Bay Vo & Dang Nguyen
Center for Information Technology, Ho Chi Minh City of Food Industry, Ho Chi Minh City, Vietnam
Thanh-Long Nguyen

Authors

Bay Vo
View author publications
You can also search for this author in PubMed Google Scholar
Dang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thanh-Long Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bay Vo .

Editor information

Editors and Affiliations

LITA - UFR MIM, University of Lorraine – Metz, Metz, France
Hoai An Le Thi
Institute of Informatics & Wrocław University of Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Department of Networked Systems and Services, Budapest University of Technology and Economics, Budapest, Hungary
Tien Van Do

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vo, B., Nguyen, D., Nguyen, TL. (2015). A Parallel Algorithm for Frequent Subgraph Mining. In: Le Thi, H., Nguyen, N., Do, T. (eds) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-319-17996-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-17996-4_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17995-7
Online ISBN: 978-3-319-17996-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics