Fuzzy Association Rules Mining Using Spark

Fernandez-Bassso, Carlos; Ruiz, M. Dolores; Martin-Bautista, Maria J.

doi:10.1007/978-3-319-91476-3_2

Carlos Fernandez-Bassso¹⁶,
M. Dolores Ruiz¹⁷ &
Maria J. Martin-Bautista¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 854))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

1131 Accesses
2 Citations

Abstract

Discovering new trends and co-occurrences in massive data is a key step when analysing social media, data coming from sensors, etc. Traditional Data Mining techniques are not able, in many occasions, to handle such amount of data. For this reason, some approaches have arisen in the last decade to develop parallel and distributed versions of previously known techniques. Frequent itemset mining is not an exception and in the literature there exist several proposals using not only parallel approximations but also Spark and Hadoop developments following the MapReduce philosophy of Big Data.

When processing fuzzy data sets or extracting fuzzy associations from crisp data the implementation of such Big Data solutions becomes crucial, since available algorithms increase their execution time and memory consumption due to the problem of not having Boolean items. In this paper, we first review existing parallel and distributed algorithms for frequent itemset and association rule mining in the crisp and fuzzy case, and afterwards we develop a preliminary proposal for mining not only frequent fuzzy itemsets but also fuzzy association rules. We also study the performance of the proposed algorithm in several datasets that have been conveniently fuzzyfied obtaining promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://archive.ics.uci.edu/ml/.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining associations between sets of items in large databases. In: ACM-SIGMOD International Conference on Data, pp. 207–216 (1993)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the Twentieth International Conference on Very Large Databases, Santiago, Chile, pp. 487–499 (1994)
Google Scholar
Anastasiu, D.C., Iverson, J., Smith, S., Karypis, G.: Big data frequent pattern mining. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 225–259. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2_10
Chapter Google Scholar
Berzal, F., Delgado, M., Sánchez, D., Vila, M.A.: Measuring accuracy and interest of association rules: a new framework. Intell. Data Anal. 6(3), 221–235 (2002)
MATH Google Scholar
del Río, S., López, V., Benítez, J.M., Herrera, F.: On the use of MapReduce for imbalanced big data using random forest. Inf. Sci. 285, 112–137 (2014). Processing and Mining Complex Data Streams
Article Google Scholar
Delgado, M., Marín, N., Sánchez, D., Vila, M.A.: Fuzzy association rules: general model and applications. IEEE Trans. Fuzzy Syst. 11(2), 214–225 (2003)
Article Google Scholar
Delgado, M., Ruiz, M.D., Sánchez, D., Serrano, J.M.: A formal model for mining fuzzy rules using the RL representation theory. Inf. Sci. 181(23), 5194–5213 (2011)
Article Google Scholar
Meng, X., et al.: MLlib: machine learning in apache spark. arXiv preprint: abs/1505.06807 (2015)
Google Scholar
Farzanyar, Z., Cercone, N.: Accelerating frequent itemset mining on the cloud: a MapReduce-based approach. In: IEEE 13th International Conference on Data Mining Workshops, pp. 592–598 (2013)
Google Scholar
Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on MapReduce framework. In: Proceedings of the 2013 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), pp. 1183–1188 (2013)
Google Scholar
Fernández, A., Carmona, C.J., del Jesus, M.J., Herrera, F.: A view on fuzzy systems for big data: progress and opportunities. Int. J. Comput. Intell. Syst. 9, 69–80 (2016)
Article Google Scholar
Fernandez-Basso, C., Ruiz, M.D., Martin-Bautista, M.J.: Extraction of association rules using big data technologies. Int. J. Des. Nat. Ecodyn. 11(3), 178–185 (2016)
Article Google Scholar
Gabroveanu, M., Cosulschi, M., Constantinescu, N.: A new approach to mining fuzzy association rules from distributed databases. Ann. Univ. Bucharest 54, 3–16 (2005)
MathSciNet MATH Google Scholar
Gabroveanu, M., Cosulschi, M., Slabu, F.: Mining fuzzy association rules using MapReduce technique. In: International Symposium on INnovations in Intelligent SysTems and Applications, INISTA, pp. 1–8 (2016)
Google Scholar
Gabroveanu, M., Iancu, I., Cosulschi, M., Constantinescu, N.: Towards using grid services for mining fuzzy association rules. In: Proceedings of the 1st East European Workshop on Rule-Based Applications, RuleApps, pp. 507–513 (2007)
Google Scholar
Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining - a general survey and comparison. ACM SIGKDD Explor. Newsl. 2(1), 58–64 (2000)
Article Google Scholar
Hüllermeier, E., Yi, Y.: In defense of fuzzy association analysis. IEEE Trans. Syst. Man Cybern. Part B Cybern. 37(4), 1039–1043 (2007)
Article Google Scholar
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, pp. 107–114. ACM (2008)
Google Scholar
Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of Apriori algorithm based on MapReduce. In: Proceedings of the 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2012, pp. 236–241. IEEE Computer Society, Washington, D.C. (2012)
Google Scholar
Pei, J., Yin, Y., Mao, R., Han, J.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl. Discov. 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Qiu, H., Gu, R., Yuan, C., Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 1664–1671. IEEE (2014)
Google Scholar
Rathee, S., Kaul, M., Kashyap, A.: R-Apriori: an efficient Apriori based algorithm on spark. In: Proceedings of the PIKM 2015, Melbourne, VIC, Australia. ACM (2015)
Google Scholar
Ruiz, M.D., Sánchez, D., Delgado, M., Martin-Bautista, M.J.: Discovering fuzzy exception and anomalous rules. IEEE Trans. Fuzzy Syst. 24(4), 930–944 (2016)
Article Google Scholar
Singh, S., Garg, R., Mishra, P.K.: Performance analysis of Apriori algorithm with different data structures on hadoop cluster. Int. J. Comput. Appl. 128(9), 45–51 (2015)
Google Scholar
White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly, Sebastopol (2015)
Google Scholar
Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)
Article Google Scholar
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Article Google Scholar

Download references

Acknowledgements

The research reported in this paper was partially supported by the Andalusian Government (Junta de Andalucía) under projects P11-TIC-7460 and the Spanish Ministry for Economy and Competitiveness by the project grant TIN2015-64776-C3-1-R.

Author information

Authors and Affiliations

Computer Science and A.I. Department, CITIC-UGR, University of Granada, Granada, Spain
Carlos Fernandez-Bassso & Maria J. Martin-Bautista
Computer Engineering Department, University of Cádiz, Cádiz, Spain
M. Dolores Ruiz

Authors

Carlos Fernandez-Bassso
View author publications
You can also search for this author in PubMed Google Scholar
M. Dolores Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Maria J. Martin-Bautista
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Dolores Ruiz .

Editor information

Editors and Affiliations

Universidad de Cádiz, Cádiz, Cadiz, Spain
Jesús Medina
Universidad de Málaga, Málaga, Málaga, Spain
Manuel Ojeda-Aciego
Universidad de Granada, Granada, Spain
José Luis Verdegay
Universidad de Granada, Granada, Spain
David A. Pelta
Universidad de Málaga, Málaga, Málaga, Spain
Inma P. Cabrera
LIP6, Université Pierre et Marie Curie, CNRS, Paris, France
Bernadette Bouchon-Meunier
Iona College, New Rochelle, New York, USA
Ronald R. Yager

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernandez-Bassso, C., Ruiz, M.D., Martin-Bautista, M.J. (2018). Fuzzy Association Rules Mining Using Spark. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 854. Springer, Cham. https://doi.org/10.1007/978-3-319-91476-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-91476-3_2
Published: 18 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91475-6
Online ISBN: 978-3-319-91476-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics