Skip to main content

A Parallel Algorithm for Frequent Subgraph Mining

  • Conference paper
Advanced Computational Methods for Knowledge Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 358))

Abstract

Graph mining has practical applications in many areas such as molecular substructure explorer, web link analysis, fraud detection, outlier detection, chemical molecules, and social networks. Frequent subgraph mining is an important topic of graph mining. The mining process is to find all frequent subgraphs over a collection of graphs. Numerous algorithms for mining frequent subgraphs have been proposed; most of them, however, used sequential strategies which are not scalable on large datasets. In this paper, we propose a parallel algorithm to overcome this weakness. Firstly, the multi-core processor architecture is introduced; the way to apply it to data mining is also discussed. Secondly, we present the gSpan algorithm as the basic framework of our algorithm. Finally, we develop an efficient algorithm for mining frequent subgraphs relied on parallel computing. The performance and scalability of the proposed algorithm is illustrated through extensive experiments on two datasets, chemical and compound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Nijssen, S., Kok, J.: Frequent graph mining and its application to molecular databases. In: The IEEE International Conference on Systems, Man and Cybernetics (SMC 2004), pp. 4571–4577 (2004)

    Google Scholar 

  • Punin, J.R., Krishnamoorthy, M.S., Zaki, M.J.: LOGML: Log markup language for web usage mining. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WebKDD 2001. LNCS (LNAI), vol. 2356, pp. 88–112. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Eberle, W., Holder, L.: Anomaly detection in data represented as graphs. Intelligent Data Analysis 11, 663–689 (2007)

    Google Scholar 

  • Dehaspe, L., Toivonen, H., King, R.: Finding Frequent Substructures in Chemical Compounds. In: KDD, pp. 30–36 (1998)

    Google Scholar 

  • Nettleton, D.: Data mining of social networks represented as graphs. Computer Science Review 7, 1–34 (2013)

    Article  MATH  Google Scholar 

  • Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: The IEEE International Conference on Data Mining (ICDM 2002), pp. 721–724 (2002)

    Google Scholar 

  • Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: The IEEE International Conference on Data Mining (ICDM 2003), pp. 549–552 (2003)

    Google Scholar 

  • Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: The IEEE International Conference on Data Mining (ICDM 2001), pp. 313-320. (2001)

    Google Scholar 

  • Gago Alonso, A., Medina Pagola, J.E., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: Mining frequent connected subgraphs reducing the number of candidates. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 365–376. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  • Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Å»ytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  • Casali, A., Ernst, C.: Extracting Correlated Patterns on Multicore Architectures. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 118–133. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  • Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: The 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc. (1994)

    Google Scholar 

  • Vanetik, N., Gudes, E., Shimony, S.: Computing frequent graph patterns from semistructured data. In: The IEEE International Conference on Data Mining (ICDM 2002), pp. 458–465. IEEE (2002)

    Google Scholar 

  • Nguyen, P.C., Washio, T., Ohara, K., Motoda, H.: Using a hash-based method for apriori-based graph mining. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 349–361. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  • Ribeiro, P., Silva, F.: G-Tries: a data structure for storing and finding subgraphs. Data Mining and Knowledge Discovery 28, 337–377 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD Record, pp. 1–12. ACM (2000)

    Google Scholar 

  • Schlegel, B., Karnagel, T., Kiefer, T., Lehner, W.: Scalable frequent itemset mining on many-core processors. In: The 9th International Workshop on Data Management on New Hardware, Article No. 3. ACM (2013)

    Google Scholar 

  • Nguyen, D., Vo, B., Le, B.: Efficient Strategies for Parallel Mining Class Association Rules. Expert Systems with Applications 41, 4716–4729 (2014)

    Article  Google Scholar 

  • Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. The Journal of Supercomputing 66, 94–117 (2013)

    Article  Google Scholar 

  • Cook, D., Holder, L., Galal, G., Maglothin, R.: Approaches to parallel graph-based knowledge discovery. Journal of Parallel and Distributed Computing 61, 427–446 (2001)

    Article  MATH  Google Scholar 

  • Buehrer, G., Parthasarathy, S., Nguyen, A., Kim, D., Chen, Y.-K., Dubey, P.: Parallel Graph Mining on Shared Memory Architectures. Technical report, Columbus, OH, USA (2005)

    Google Scholar 

  • Kessl, R., Talukder, N., Anchuri, P., Zaki, M.: Parallel Graph Mining with GPUs. In: The 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 1–16 (2014)

    Google Scholar 

  • Lin, W., Xiao, X., Ghinita, G.: Large-scale frequent subgraph mining in MapReduce. In: The IEEE 30th International Conference on Data Engineering (ICDE 2014), pp. 844–855. IEEE (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bay Vo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Vo, B., Nguyen, D., Nguyen, TL. (2015). A Parallel Algorithm for Frequent Subgraph Mining. In: Le Thi, H., Nguyen, N., Do, T. (eds) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-319-17996-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17996-4_15

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17995-7

  • Online ISBN: 978-3-319-17996-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics