Abstract
Frequent pattern mining (\(\mathsf {FPM}\)) on a single large graph has been receiving increasing attention since it is crucial to applications in a variety of domains including e.g., social network analysis. The \(\mathsf {FPM}\) problem is defined as finding all the subgraphs (a.k.a. patterns) that appear frequently in a large graph according to a user-defined frequency threshold. In recent years, a host of techniques have been developed, while most of them suffers from high computational cost and inconvenient result inspection. To tackle the issues, in this paper, we propose an approach to mining top-k patterns from a single graph G under the distributed scenario. We formalize the distributed top-k pattern mining problem by incorporating viable support and interestingness metrics. We then develop a parallel algorithm, that preserves early termination property, to efficiently discover top-k patterns. Using real-life and synthetic graphs, we experimentally verify that our algorithm is rather effective and outperforms traditional counterparts in both efficiency and scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pokec social network. http://snap.stanford.edu/data/soc-pokec.html
Abdelhamid, E., Abdelaziz, I., Kalnis, P., Khayyat, Z., Jamour, F.T.: ScaleMine: scalable parallel frequent subgraph mining in a single large graph. In: West, J., Pancake, C.M. (eds.) Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC, pp. 716–727. IEEE Computer Society (2016)
Abdelhamid, E., Canim, M., Sadoghi, M., Bhattacharjee, B., Chang, Y., Kalnis, P.: Incremental frequent subgraph mining on large evolving graphs. IEEE Trans. Knowl. Data Eng. 29(12), 2710–2723 (2017)
Ashraf, N., et al.: WeFreS: weighted frequent subgraph mining in a single large graph. In: Perner, P. (ed.) Advances in Data Mining - Applications and Theoretical Aspects, 19th Industrial Conference, ICDM, pp. 201–215. IBAI publishing (2019)
Aslay, Ç., Nasir, M.A.U., Morales, G.D.F., Gionis, A.: Mining frequent patterns in evolving graphs. In: Cuzzocrea, A., et al. (eds.) Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM, pp. 923–932. ACM (2018)
Bhuiyan, M., Hasan, M.A.: An iterative MapReduce based frequent subgraph mining algorithm. IEEE Trans. Knowl. Data Eng. 27(3), 608–620 (2015)
Chen, H., Liu, M., Zhao, Y., Yan, X., Yan, D., Cheng, J.: G-miner: an efficient task-oriented graph mining system. In: Oliveira, R., Felber, P., Hu, Y.C. (eds.) Proceedings of the Thirteenth EuroSys Conference, EuroSys, pp. 32:1–32:12. ACM (2018)
Chi, Y., Xia, Y., Yang, Y., Muntz, R.R.: Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowl. Data Eng. 17(2), 190–202 (2005)
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. TPAMI 26(10), 1367–1372 (2004)
Dhifli, W., Aridhi, S., Nguifo, E.M.: MR-Simlab: scalable subgraph selection with label similarity for big data. Inf. Syst. 69, 155–163 (2017)
Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: GRAMI: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7), 517–528 (2014)
Fan, W., Wang, X., Wu, Y.: Performance guarantees for distributed reachability queries. PVLDB 5(11), 1304–1315 (2012)
Fiedler, M., Borgelt, C.: Subgraph support in a single large graph. In: Workshops Proceedings of the 7th IEEE International Conference on Data Mining, pp. 399–404. IEEE Computer Society (2007)
Gong, N.Z., et al.: Evolution of social-attribute networks: measurements, modeling, and implications using google+. In: IMC (2012)
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation (2014)
Gudes, E., Shimony, S.E., Vanetik, N.: Discovering frequent graph patterns using disjoint paths. IEEE Trans. Knowl. Data Eng. 18(11), 1441–1456 (2006)
Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: SIGKDD (2004)
Husain, M.F., Doshi, P., Khan, L., Thuraisingham, B.M.: Storage and retrieval of large RDF graph using Hadoop and MapReduce. In: CloudCom, pp. 680–686 (2009)
Kang, U., Faloutsos, C.: Big graph mining: algorithms and discoveries. SIGKDD Explor. 14(2), 29–36 (2012)
Le, N., Vo, B., Nguyen, L.B.Q., Fujita, H., Le, B.: Mining weighted subgraphs in a single large graph. Inf. Sci. 514, 149–165 (2020)
Le, T., Vo, B., Huynh, V., Nguyen, N.T., Baik, S.W.: Mining top-k frequent patterns from uncertain databases. Appl. Intell. 50(5), 1487–1497 (2020)
Rahimian, F., Payberah, A.H., Girdzijauskas, S., Jelasity, M., Haridi, S.: Ja-be-ja: a distributed algorithm for balanced graph partitioning. In: SASO (2013)
Ray, A., Holder, L., Choudhury, S.: Frequent subgraph discovery in large attributed streaming graphs. In: Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, volume 36 of JMLR Workshop and Conference Proceedings, pp. 166–181 (2014)
Rowe, M.: Interlinking distributed social graphs. In: Proceedings of Linked Data on the Web Workshop, WWW (2009)
Shao, Y., Cui, B., Chen, L., Ma, L., Yao, J., Xu, N.: Parallel subgraph listing in a large-scale graph. SIGMOD (2014)
Talukder, N., Zaki, M.J.: A distributed approach for graph mining in massive networks. Data Min. Knowl. Discov. 30(5), 1024–1052 (2016)
Teixeira, C.H.C., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Miller, E.L., Hand, S. (eds.) Proceedings of the 25th Symposium on Operating Systems Principles, pp. 425–440. ACM (2015)
van Leeuwen, M., Bie, T.D., Spyropoulou, E., Mesnage, C.: Subjective interestingness of subgraph patterns. Mach. Learn. 105(1), 41–75 (2016)
Yan, D., Qu, W., Guo, G., Wang, X.: Prefixfpm: a parallel framework for general-purpose frequent pattern mining. In: 36th IEEE International Conference on Data Engineering, ICDE, pp. 1938–1941. IEEE (2020)
Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 286–295. ACM (2003)
Zhu, F., Qu, Q., Lo, D., Yan, X., Han, J., Yu, P.: Mining top-k large structural patterns in a massive network. VLDB 4(11), 807–818 (2011)
Zhu, X., Chen, W., Zheng, W., Ma, X.: Gemini: a computation-centric distributed graph processing system. In: Keeton, K., Roscoe, T. (eds.) 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, pp. 301–316. USENIX Association (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, X. et al. (2021). Distributed Top-k Pattern Mining. In: U, L.H., Spaniol, M., Sakurai, Y., Chen, J. (eds) Web and Big Data. APWeb-WAIM 2021. Lecture Notes in Computer Science(), vol 12859. Springer, Cham. https://doi.org/10.1007/978-3-030-85899-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-85899-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85898-8
Online ISBN: 978-3-030-85899-5
eBook Packages: Computer ScienceComputer Science (R0)