Abstract
Mining frequent subgraphs in large scale graph data sets helps reveal underlying knowledge. Since the mining approaches in centralized systems are often bottlenecked on calculation capacity, many parallelized solutions based on the MapReduce framework are proposed to scale out the mining process, which usually extracts frequent subgraphs in an iterative way. Nonetheless, the efficiency and scalability of these MapReduce based approaches are still bounded by the communication cost for passing the intermediate results and the unbalanced workload after a few iterations. In this paper, we propose an efficient and scalable framework for frequent subgraph mining by using distributed graph processing systems. It adopts a message-passing-free scheme among workers to reduce the communication cost, and utilizes a task scheduler to dynamically balance the workload. Experimental results on both synthetic and real-world data sets verify the efficacy of our proposed framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
National library of medicine. http://chem.sis.nlm.nih.gov/chemidplus
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)
Lowe, D.G.: Local feature view clustering for 3D object recognition. In: CVPR, pp. 682–688 (2001)
Petrakis, E.G.M., Faloutsos, C.: Similarity searching in medical image databases. IEEE Trans. Knowl. Data Eng. 9(3), 435–447 (1997)
Bill of materials. https://en.wikipedia.org/wiki/Bill_of_materials
Lin, W., Xiao, X., Ghinita, G.: Large-scale frequent subgraph mining in mapreduce. In: ICDE, pp. 844–855 (2014)
Bhuiyan, M., Hasan, M.A.: An iterative mapreduce based frequent subgraph mining algorithm. IEEE Trans. Knowl. Data Eng. 27(3), 608–620 (2015)
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: ICDM, pp. 721–724 (2002)
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD, pp. 335–346 (2004)
Cheng, J., Ke, Y., Ng, W., Lu, A.: FG-index: towards verification-free query processing on graph databases. In: SIGMOD, pp. 857–872 (2007)
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: KDD, pp. 647–652 (2004)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)
Wang, C., Wang, W., Pei, J., Zhu, Y., Shi, B.: Scalable mining of large disk-based graph databases. In: KDD, pp. 316–325 (2004)
Nguyen, S.N., Orlowska, M.E., Li, X.: Graph mining based on a data partitioning approach. In: ADC, pp. 31–37 (2008)
Miliaraki, I., Berberich, K., Gemulla, R., Zoupanos, S.: Mind the gap: large-scale frequent sequence mining. In: SIGMOD, pp. 797–808 (2013)
Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)
Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., Kalnis, P.: Mizan: a system for dynamic load balancing in large-scale graph processing. In: EuroSys, pp. 169–182 (2013)
Zhao, X., Chen, Y., Xiao, C., Ishikawa, Y., Tang, J.: Frequent subgraph mining based on pregel. Comput. J. 59(8), 1113–1128 (2016)
Yan, D., Cheng, J., Lu, Y., Ng, W.: Effective techniques for message reduction and load balancing in distributed graph computation. In: WWW, pp. 1307–1317 (2015)
Giraph - Welcome To Apache Giraph! http://giraph.apache.org/
Wang, Z., Gu, Y., Bao, Y., Yu, G., Yu, J.X.: Hybrid pulling/pushing for I/O-efficient distributed and iterative graph computing. In: SIGMOD, pp. 479–494 (2016)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, vol. 12, no. 1, p. 2 (2012)
Peng, Z., Wang, T., Lu, W., Huang, H., Du, X., Zhao, F., Tung, A.K.H.: Mining frequent subgraphs from tremendous amount of small graphs using MapReduce. Knowl. Inf. Syst. 1–28 (2017)
Acknowledgment
We would like to thank the anonymous reviewers for their helpful and insightful comments. This work was in part supported by the National Natural Science Foundation of China (61502504, 61732014, 61502347, U1711261), and the Technological Innovation Projects of HuBei Province (2017AAA125).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Wang, T., Huang, H., Lu, W., Peng, Z., Du, X. (2018). Efficient and Scalable Mining of Frequent Subgraphs Using Distributed Graph Processing Systems. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_57
Download citation
DOI: https://doi.org/10.1007/978-3-319-91452-7_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91451-0
Online ISBN: 978-3-319-91452-7
eBook Packages: Computer ScienceComputer Science (R0)