Abstract
Discovering new trends and co-occurrences in massive data is a key step when analysing social media, data coming from sensors, etc. Traditional Data Mining techniques are not able, in many occasions, to handle such amount of data. For this reason, some approaches have arisen in the last decade to develop parallel and distributed versions of previously known techniques. Frequent itemset mining is not an exception and in the literature there exist several proposals using not only parallel approximations but also Spark and Hadoop developments following the MapReduce philosophy of Big Data.
When processing fuzzy data sets or extracting fuzzy associations from crisp data the implementation of such Big Data solutions becomes crucial, since available algorithms increase their execution time and memory consumption due to the problem of not having Boolean items. In this paper, we first review existing parallel and distributed algorithms for frequent itemset and association rule mining in the crisp and fuzzy case, and afterwards we develop a preliminary proposal for mining not only frequent fuzzy itemsets but also fuzzy association rules. We also study the performance of the proposed algorithm in several datasets that have been conveniently fuzzyfied obtaining promising results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Agrawal, R., Imielinski, T., Swami, A.: Mining associations between sets of items in large databases. In: ACM-SIGMOD International Conference on Data, pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the Twentieth International Conference on Very Large Databases, Santiago, Chile, pp. 487–499 (1994)
Anastasiu, D.C., Iverson, J., Smith, S., Karypis, G.: Big data frequent pattern mining. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 225–259. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2_10
Berzal, F., Delgado, M., Sánchez, D., Vila, M.A.: Measuring accuracy and interest of association rules: a new framework. Intell. Data Anal. 6(3), 221–235 (2002)
del Río, S., López, V., Benítez, J.M., Herrera, F.: On the use of MapReduce for imbalanced big data using random forest. Inf. Sci. 285, 112–137 (2014). Processing and Mining Complex Data Streams
Delgado, M., Marín, N., Sánchez, D., Vila, M.A.: Fuzzy association rules: general model and applications. IEEE Trans. Fuzzy Syst. 11(2), 214–225 (2003)
Delgado, M., Ruiz, M.D., Sánchez, D., Serrano, J.M.: A formal model for mining fuzzy rules using the RL representation theory. Inf. Sci. 181(23), 5194–5213 (2011)
Meng, X., et al.: MLlib: machine learning in apache spark. arXiv preprint: abs/1505.06807 (2015)
Farzanyar, Z., Cercone, N.: Accelerating frequent itemset mining on the cloud: a MapReduce-based approach. In: IEEE 13th International Conference on Data Mining Workshops, pp. 592–598 (2013)
Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on MapReduce framework. In: Proceedings of the 2013 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), pp. 1183–1188 (2013)
Fernández, A., Carmona, C.J., del Jesus, M.J., Herrera, F.: A view on fuzzy systems for big data: progress and opportunities. Int. J. Comput. Intell. Syst. 9, 69–80 (2016)
Fernandez-Basso, C., Ruiz, M.D., Martin-Bautista, M.J.: Extraction of association rules using big data technologies. Int. J. Des. Nat. Ecodyn. 11(3), 178–185 (2016)
Gabroveanu, M., Cosulschi, M., Constantinescu, N.: A new approach to mining fuzzy association rules from distributed databases. Ann. Univ. Bucharest 54, 3–16 (2005)
Gabroveanu, M., Cosulschi, M., Slabu, F.: Mining fuzzy association rules using MapReduce technique. In: International Symposium on INnovations in Intelligent SysTems and Applications, INISTA, pp. 1–8 (2016)
Gabroveanu, M., Iancu, I., Cosulschi, M., Constantinescu, N.: Towards using grid services for mining fuzzy association rules. In: Proceedings of the 1st East European Workshop on Rule-Based Applications, RuleApps, pp. 507–513 (2007)
Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining - a general survey and comparison. ACM SIGKDD Explor. Newsl. 2(1), 58–64 (2000)
Hüllermeier, E., Yi, Y.: In defense of fuzzy association analysis. IEEE Trans. Syst. Man Cybern. Part B Cybern. 37(4), 1039–1043 (2007)
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, pp. 107–114. ACM (2008)
Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of Apriori algorithm based on MapReduce. In: Proceedings of the 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2012, pp. 236–241. IEEE Computer Society, Washington, D.C. (2012)
Pei, J., Yin, Y., Mao, R., Han, J.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl. Discov. 8(1), 53–87 (2004)
Qiu, H., Gu, R., Yuan, C., Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 1664–1671. IEEE (2014)
Rathee, S., Kaul, M., Kashyap, A.: R-Apriori: an efficient Apriori based algorithm on spark. In: Proceedings of the PIKM 2015, Melbourne, VIC, Australia. ACM (2015)
Ruiz, M.D., Sánchez, D., Delgado, M., Martin-Bautista, M.J.: Discovering fuzzy exception and anomalous rules. IEEE Trans. Fuzzy Syst. 24(4), 930–944 (2016)
Singh, S., Garg, R., Mishra, P.K.: Performance analysis of Apriori algorithm with different data structures on hadoop cluster. Int. J. Comput. Appl. 128(9), 45–51 (2015)
White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly, Sebastopol (2015)
Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Acknowledgements
The research reported in this paper was partially supported by the Andalusian Government (Junta de Andalucía) under projects P11-TIC-7460 and the Spanish Ministry for Economy and Competitiveness by the project grant TIN2015-64776-C3-1-R.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Fernandez-Bassso, C., Ruiz, M.D., Martin-Bautista, M.J. (2018). Fuzzy Association Rules Mining Using Spark. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 854. Springer, Cham. https://doi.org/10.1007/978-3-319-91476-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-91476-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91475-6
Online ISBN: 978-3-319-91476-3
eBook Packages: Computer ScienceComputer Science (R0)