Abstract
Aiming at the problem that most of the existing data mining based replication strategies cannot extract correlations between files effectively, a new decentralized replication strategy based on maximal frequent correlated patterns mining, called RSMFCP, is proposed. By translating the files access history to the binary access history, applying maximal frequent correlated patterns mining and performing replication, RSMFCP can extremely eliminate redundancy and optimize the replication performance. Data analysis and simulation results show that, comparing with other strategies like no replication, PRA, DR2 and PDDRA, RSMFCP can extract correlations more effectively and gain lower mean job execute time under different access patterns, which will provide a new option to reduce transmission delay in data grid.
D. Qin—This work is supported by the National Natural Science Foundation of China (61302074, 61501176, 61571181), Natural Science Foundation of Heilongjiang Province (QC2013C061), Modern Sensor Technology Research and Innovation Team Foundation of Heilongjiang Province (2012TD007), and Postdoctoral Research Foundation of Heilongjiang Province (LBH-Q15121).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Amornsinlaphachai, P.: Efficiency of data mining models to predict academic performance and a cooperative learning model. In: 8th International Conference on Knowledge and Smart Technology (KST), pp. 66–71 (2016)
Lee, M.C., Leu, F.Y., Chen, Y.P.: PFRF: An adaptive data replication algorithm based on star-topology data grids. Future Gener. Comput. Syst. 28(7), 1045–1057 (2012)
Saadat, N., Rahmani, A.M.: PDDRA: a new pre-fetching based dynamic data replication algorithm in data grids. Future Gener. J. Comput. Syst. 28(4), 666–681 (2012)
Agrawal, R., Imielinski, T., Swami, A.: A mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD Conference, pp. 207–216 (1993)
Taheri, J., Zomaya, A.Z., Bouvry, P., Khan, S.U.: Hopfield neural network for simultaneous job scheduling and data replication in grids. Future Gener. Comput. Syst. 29(8), 1885–1900 (2013)
Wei, H.: Correlation mining of multi-dimensional large data sets. Mod. Comput. 9(1), 3–8 (2012)
Shorfuzzaman, M., Graham, P.: Adaptive popularity-driven replica placement in hierarchical data grids. J. Supercomput. 51(3), 374–392 (2010)
Bellodi, E., Riguzzi, F., Lamma, E.: Statistical relational learning for workflow mining. Intell. Data Anal. 20(3), 515–541 (2016)
Jian, L., Wang, C., Liu, Y., Liang, S., Yi, W.: Parallel data mining techniques on graphics processing unit with compute unified device architecture (CUDA). J. Supercomput. 64(3), 942–967 (2013)
Wu, T., Chen, Y., Han, J.: Re-examination of interestingness measures in pattern Mining: a Unified framework. Data Min. Knowl. Discov. 21(3), 371–397 (2010)
Grace, R.K., Manimegalai, R.: Dynamic replica placement and selection strategies in data grids – a comprehensive survey. J. Parall. Distrib. Comput. 74(2), 2099–2108 (2014)
Ma, J., Liu, W., Glatard, T.: A classification of file placement and replication methods on grids. Future Gener. Comput. Syst. 29(6), 1395–1406 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Qin, D., Liu, R., Zhen, J., Yang, S., Wang, E. (2017). Research on Decentralized Group Replication Strategy Based on Correlated Patterns Mining in Data Grids. In: Xin-lin, H. (eds) Machine Learning and Intelligent Communications. MLICOM 2016. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 183. Springer, Cham. https://doi.org/10.1007/978-3-319-52730-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-52730-7_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52729-1
Online ISBN: 978-3-319-52730-7
eBook Packages: Computer ScienceComputer Science (R0)