Skip to main content
Log in

Mining frequent subgraphs from tremendous amount of small graphs using MapReduce

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Frequent subgraph mining from a tremendous amount of small graphs is a primitive operation for many data mining applications. Existing approaches mainly focus on centralized systems and suffer from the scalability issue. Consider the increasing volume of graph data and mining frequent subgraphs is a memory-intensive task, it is difficult to tackle this problem on a centralized machine efficiently. In this paper, we therefore propose an efficient and scalable solution, called MRFSE, using MapReduce. MRFSE adopts the breadth-first search strategy to iteratively extract frequent subgraphs, i.e., all frequent subgraphs with \(i+1\) edges are generated based on frequent subgraphs with i edges at the ith iteration. In our design, existing frequent subgraph mining techniques in centralized systems can be easily extended and integrated. More importantly, new frequent subgraphs are generated without performing any isomorphism test which is costly and imperative in existing frequent subgraph mining techniques. Besides, various optimization techniques are proposed to further reduce the communication and I/O cost. Extensive experiments conducted on our in-house clusters demonstrate the superiority of our proposed solution in terms of both scalability and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. In the remainder of this paper, an i-subgraph is referred to as a subgraph with i edges.

  2. http://dtp.nci.nih.gov/docs/aids/aids_data.html.

  3. http://www.cas.org.

  4. For every \(g \in D\), we maintain all frequent i-subgraphs associated with the corresponding embeddings.

  5. https://github.com/apache/giraph.

  6. http://pubchem.ncbi.nlm.nih.gov.

References

  1. Aridhi S, d’Orazio L, Maddouri M, Nguifo EM (2015) Density-based data partitioning strategy to approximate large-scale subgraph mining. Inf Syst 48:213–223

    Article  Google Scholar 

  2. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242

    Article  Google Scholar 

  3. Bhuiyan M, Hasan MA (2015) An iterative mapreduce based frequent subgraph mining algorithm. IEEE Trans Knowl Data Eng 27(3):608–620

    Article  Google Scholar 

  4. Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: ICDM, pp 51–58

  5. Chaoji V, Hasan MA, Salem S, Zaki MJ (2008) An integrated, generic approach to pattern mining: data mining template library. Data Min Knowl Discov 17(3):457–495

    Article  MathSciNet  Google Scholar 

  6. Cheng J, Ke Y, Ng W (2009) Efficient query processing on graph databases. ACM Trans Database Syst 34(1):2

    Article  Google Scholar 

  7. Cheng J, Ke Y, Ng W, Lu A(2007) Fg-index: towards verification-free query processing on graph databases. In: SIGMOD conference, pp 857–872

  8. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: OSDI, pp 137–150

  9. Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  10. Hill S, Srichandan B, Sunderraman R (2012) An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: BCB, pp 661–666

  11. Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: ICDM, pp 549–552

  12. Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: KDD, pp 581–586

  13. Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: PKDD, pp 13–23

  14. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30

    Article  Google Scholar 

  15. Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: ICDM, pp 313–320

  16. Lin W, Xiao X, Ghinita G (2014) Large-scale frequent subgraph mining in mapreduce. In: IEEE 30th international conference on data engineering, Chicago, ICDE 2014, IL, USA, 31 March–4 April, pp 844–855

  17. Liu Y, Jiang X, Chen H, Ma J, Zhang X (2009) Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Advanced parallel processing technologies, 8th international symposium, APPT 2009, Rapperswil, Switzerland, Proceedings, 24–25 Aug, pp 341–355

  18. Lowe DG (2001) Local feature view clustering for 3D object recognition. In: CVPR, pp 682–688

  19. Lu W, Chen G, Tung AKH, Zhao F (2013) Efficiently extracting frequent subgraphs using mapreduce. In: Proceedings of the 2013 IEEE international conference on big data, Santa Clara, CA, USA, 6–9 Oct 2013, pp 639–647

  20. National library of medicine. http://chem.sis.nlm.nih.gov/chemidplus

  21. Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: KDD, pp 647–652

  22. Petrakis EGM, Faloutsos C (1997) Similarity searching in medical image databases. IEEE Trans Knowl Data Eng 9(3):435–447

    Article  Google Scholar 

  23. Wang C, Wang W, Pei J, Zhu Y, Shi B (2004) Scalable mining of large disk-based graph databases. In: KDD, pp 316–325

  24. Yan X, Han J (2002) gspan: graph-based substructure pattern mining. In: ICDM, pp 721–724

  25. Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: SIGMOD conference, pp 335–346

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their helpful and insightful comments. This work was in part supported by the National Natural Science Foundation of China (61502504, 61502347, 61432006), the Nature Science Foundation of Hubei Province of China (2016CFB384), the Ministry of Science and Technology of China, National Key Research and Development Program (Project Number: 2016YFB1000700) and the Fundamental Research Funds for the Central Universities, the Research Funds of Renmin University of China No. 15XNLF09.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, Z., Wang, T., Lu, W. et al. Mining frequent subgraphs from tremendous amount of small graphs using MapReduce. Knowl Inf Syst 56, 663–690 (2018). https://doi.org/10.1007/s10115-017-1104-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1104-7

Keywords

Navigation