Abstract
Analyses and applications of big data require special technologies to efficiently process large number of data. Mining association rules focus on obtaining relations between data. When mining association rules in big data, conventional methods encounter severe problems incurred by the tremendous cost of computing and inefficiency to achieve the goal. This study proposes an evolutionary algorithm to address these problems, namely Niche-Aided Gene Expression Programming (NGEP). The NGEP algorithm (1) divides individuals to several niches to evolve separately and fuses selected niches according to the similarities of the best individuals to ensure the dispersibility of chromosomes, and (2) adjusts the fitness function to adapt to the needs of the underlying applications. A number of experiments have been performed to compare NGEP with the FP-Growth and Apriori algorithms to evaluate the NGEP’s performance in mining association rules with a dataset of measurement for environment pressure (Iris dataset) and an Artificial Simulation Database (ASD). Experimental results indicate that NGEP can efficiently achieve more association rules (36 vs. 33 vs. 25 in Iris dataset experiments and 57 vs. 44 vs. 44 in ASD experiments) with a higher accuracy rate (74.8 vs. 53.2 vs. 50.6 % in Iris dataset experiments and 95.8 vs. 77.4 vs. 80.3 % in ASD experiments) and the time of computing is also much less than the other two methods.




Similar content being viewed by others
References
Lizhe, W., Ke, L., Peng, L., et al.: IK-SVD: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16(4), 41–52 (2014)
Barnes, J.: Data, data, everywhere. ITS Int. 20(1), 44–49 (2014)
Deng, Z., Wu, X., Wang, L., et al.: Parallel processing of dynamic continuous queries over streaming data flows. IEEE Trans. Parallel Distrib. Syst. (2014). doi:10.1109/TPDS.2014.2311811
Chen, D., Wang, L., Wu, X., et al.: Hybrid modeling and simulation of huge crowd over a hierarchical grid architecture. Future Gener. Comput. Syst. 29(5), 1309–1317 (2013)
Chen, D., Wang, L., Zomaya, A., et al.: Parallel simulation of complex evacuation scenarios with adaptive agent models. IEEE Trans. Parallel Distrib. Syst. (2014). doi:10.1109/TPDS.2014.2311805
Xue, W., Yang, C., Fu, H. et al.: Enabling and scaling a global shallow-water atmospheric model on Tianhe-2. In: Proceedings of the 28th International Parallel and Distributed Processing Symposium (2014). IEEE
Zhao, J., Wang, L., Tao, J., et al.: A security framework in G-Hadoop for big data computing across distributed cloud data centres. J. Comput. Syst. Sci. 80(5), 994–1007 (2014)
Chen, D., Turner, S.J., Cai, W., et al.: Synchronization in federation community networks. J. Parallel Distrib. Comput. 70(2), 144–159 (2010)
Ma, Y., Wang, L., Liu, D., et al.: Distributed data structure templates for data-intensive remote sensing applications. Concurr. Comput. Prac. Exper. 25(12), 1784–1797 (2013)
Ma, Y., Wang, L., Zomaya, A., et al.: Task-tree based large-scale Mosaicking for remote sensed imageries with dynamic DAG scheduling. IEEE Trans. Parallel Distrib. Syst. 25(8), 2126–2137 (2013)
Wang, L., von Laszewski, G., Younge, A., et al.: Cloud computing: a perspective study. New Gener. Comput. 28(2), 137–146 (2010)
Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W.J. (eds.) Knowledge Discovery in Databases, pp. 229–248. AAAI Press (1991)
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Record (1993)
Li, L., Xue, W., Ranjan, R., et al.: A scalable Helmholtz solver in GRAPES over large-scale multicore cluster. Concurr. Comput. Prac. Exper. 25(12), 1722–1737 (2013)
Chen, D., Li, X., Cui, D., Wang, L., Lu, D.: Global synchronization measurement of multivariate neural signals with massively parallel nonlinear interdependence analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 22(1), 33–43 (2014)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB (1994)
Duru, N.: An application of apriori algorithm on a diabetic database. In Knowledge-Based Intelligent Information and Engineering Systems, pp. 398–404. Springer, Berlin (2005)
Aflori, C., Craus, M.: Grid implementation of the Apriori algorithm. Adv. Eng. Softw. 38(5), 295–300 (2007)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2005)
Shaheen, M., Shahbaz, M., Guergachi, A.: Context based positive and negative spatio-temporal association rule mining. Knowledge-Based Syst. 37, 261–273 (2013)
Deng, Z.-H., Lv, S.-L.: Fast mining frequent itemsets using Nodesets. Exper. Syst. Appl. 41(10), 4505–4512 (2014)
Deng, Z., Wang, Z., Jiang, J.: A new algorithm for fast mining frequent itemsets using N-lists. Sci. China Inform. Sci. 55(9), 2008–2030 (2012)
Deng, Z., Wang, Z.: A new fast vertical method for mining frequent patterns. Int. J. Comput. Intell. Syst. 3(6), 733–744 (2010)
Romão, W., Freitas, A.A., Gimenes, I.M.D.S.: Discovering interesting knowledge from a science and technology database with a genetic algorithm. Appl. Soft Comput. 4(2), 121–137 (2004)
Kołodziej, J., González-Vélez, H., Wang, L.: Advances in data-intensive modelling and simulation. Future Gener. Comput. Syst. 37, 282–283 (2014)
Chen, D., Li, D., Xiong, M., et al.: GPGPU-aided ensemble empirical-mode decomposition for EEG analysis during anesthesia. IEEE Trans. Inform. Technol. Biomed. 14(6), 1417–1427 (2010)
Ferreira, C.: Gene expression programming: a new adaptive algorithm for solving problems. arXiv:cs/0102027 (2001)
Chen, Y., Chen, D., Khan, S.U., et al.: Solving symbolic regression problems with uniform design-aided gene expression programming. J. Supercomput. 66(3), 1553–1575 (2013)
Wei, W., Wang, Q., Wang, H., et al.: The feature extraction of nonparametric curves based on niche genetic algorithms and multi-population competition. Pattern Recognit. Lett. 26(10), 1483–1497 (2005)
Ferreira, C.: Mutation, transposition, and recombination: an analysis of the evolutionary dynamics. In: 4th International Workshop on Frontiers in Evolutionary Algorithms (2002)
Wang, L., Chen, D., Hu, Y., et al.: Towards enabling cyberinfrastructure as a service in clouds. Comput. Electr. Eng. 39(1), 3–14 (2013)
Freitas, A.A.: A survey of evolutionary algorithms for data mining and knowledge discovery. In Advances in Evolutionary Computing, pp. 819–845. Springer, Berlin (2003)
Noda, E., Freitas, A.A., Lopes, H.S.: Discovering interesting prediction rules with a genetic algorithm. In: Proceedings of the 1999 Congress on Evolutionary Computation (1999)
Lopes, H.S., Weinert, W.R.: EGIPSYS: an enhanced gene expression programming approach for symbolic regression problems. Int. J. Appl. Math. Comput. Sci. 14(3), 375–384 (2004)
Ferreira, C.: Function finding and the creation of numerical constants in gene expression programming. In Advances in Soft Computing: Engineering Design and Manufacturing, p. 265 (2003)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-wesley, Boston (1989)
Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press , Cambridge (1994)
Zhang, J., Huang, D.-S., Lok, T.-M., et al.: A novel adaptive sequential niche technique for multimodal function optimization. Neurocomputing 69(16), 2396–2401 (2006)
Ferreira, C.: Genetic representation and genetic neutrality in gene expression programming. Adv. Complex Syst. 5(04), 389–408 (2002)
Siwei, J., Zhihua, C., Dang, Z.: Parallel gene expression programming algorithm based on simulated annealing method. ACTA Electr. Sinica 33, 2017–2021 (2005)
Zuo, J., Tang, C., Zhang, T.: Mining predicate association rule by gene expression programming. In Advances in Web-Age Information Management, pp. 281–294. Springer, Berlin (2002)
Kuok, C.M., Fu, A., Wong, M.H.: Mining fuzzy association rules in databases. ACM Sigmod Rec. 27(1), 41–46 (1998)
Chen, D., Li, X., Wang, L., Khan, S., Wang, J., Zeng, K., Cai, C.: Fast and scalable multi-way analysis of massive neural data. IEEE Trans. Comput. (2014). doi:10.1109/TC.2013.2295806
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (Nos. 61272314, 61361120098, 61440018), the China Postdoctoral Science Foundation (2014M552112), the Hubei Natural Science Foundation (No. 2014CF-B904).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, Y., Li, F. & Fan, J. Mining association rules in big data with NGEP. Cluster Comput 18, 577–585 (2015). https://doi.org/10.1007/s10586-014-0419-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-014-0419-3