Abstract
One of the main restrictions of relational database models is their lack of support for flexible, imprecise and vague information in data representation and querying. The imprecision is pervasive in human language; hence, modeling imprecision is crucial for any system that stores and processes linguistic data. Fuzzy set theory provides an effective solution to model the imprecision inherent in the meaning of words and propositions drawn from natural language (Zadeh, Inf Control 8(3):338–353, doi:10.1016/S0019-9958(65)90241-X, 1965; IGI Global, https://books.google.com/books?id=nt-WBQAAQBAJ, 2013). Several works in the last 20 years have used fuzzy set theory to extend relational database models to permit representation and retrieval of imprecise data. However, to our knowledge, such approaches have not been designed to scale-up to very large datasets. In this paper, the MapReduce framework is used to implement flexible fuzzy queries on a large-scale dataset. We develop MapReduce algorithms to enhance the standard relational operations with fuzzy conditional predicates expressed in natural language.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
If A is a fuzzy set and \(\overline{A}\) is its complement then \(\mu _{\overline{A}}(x)=1-\mu _A(x)\).
As proven in Sect. 3.2.4 each output records is generated only once; therefore, it seems reasonable to measure the growth in terms of the number of output records.
References
Afrati FN, Sarma AD, Menestrina D, Parameswaran A, Ullman JD (2012) Fuzzy joins using mapreduce. In: 2012 IEEE 28th international conference on data engineering (ICDE). IEEE, pp 498–509
Atta F, Viglas SD, Niazi S (2011) Sand join: a skew handling join algorithm for google’s mapreduce framework. In: 2011 IEEE 14th international multitopic conference (INMIC), pp 170–175. doi:10.1109/INMIC.2011.6151466
Bosc P, Prade H (1997) An introduction to the fuzzy set and possibility theory-based treatment of flexible queries and uncertain or imprecise databases. In: Motro A, Smets P (eds) Uncertainty management in information systems. Springer, New York, pp 285–324
Buckles BP, Petry FE (1982) A fuzzy representation of data for relational databases. Fuzzy Sets Syst 7(3):213–226. doi:10.1016/0165-0114(82)90052-5
Buckley JJ, Eslami E (2002) An introduction to fuzzy logic and fuzzy sets, vol 13. Springer, New York
Chen G (1998) Fuzzy logic in data modeling: semantics, constraints, and database design. Kluwer Academic Publishers, Norwell
Das Sarma A, He Y, Chaudhuri S (2014) Clusterjoin: a similarity joins framework using map-reduce. Proc VLDB Endow 7(12):1059–1070. doi:10.14778/2732977.2732981
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Dubois D, Prade H (1986) Weighted minimum and maximum operations in fuzzy set theory. Inf Sci 39(2):205–210. doi:10.1016/0020-0255(86)90035-6
Elmeleegy K, Olston C, Reed B (2014) Spongefiles: mitigating data skew in mapreduce using distributed memory. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. SIGMOD ’14, pp. 551–562. ACM, New York. doi:10.1145/2588555.2595634
Galindo J (2005) Fuzzy databases: modeling, design and implementation: modeling, design and implementation IGI Global
Gufler B, Augsten N, Reiser A, Kemper A (2012) Load balancing in mapreduce based on scalable cardinality estimates. In: 2012 IEEE 28th international conference on data engineering (ICDE), pp 522–533. doi:10.1109/ICDE.2012.58
Hassan MAH, Bamha M, Loulergue F (2014) Handling data-skew effects in join operations using mapreduce. Proc Comput Sci 29:145–158. doi:10.1016/j.procs.2014.05.014. 2014 International conference on computational science
Klir GJ, Clair UHS, Yuan B (1997) Fuzzy set theory: foundations and applications. Prentice Hall. https://books.google.com/books?id=DNxQAAAAMAAJ
Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. SIGMOD ’12, pp. 25–36. ACM, New York. doi:10.1145/2213836.2213840
Kyritsis V, Lekeas P, Souliou D, Afrati F (2012) A new framework for join product skew. In: Lacroix Z, Vidal M (eds) Resource discovery, vol 6799., Lecture notes in computer scienceSpringer, New York, pp 1–10
Ma ZM, Yan L (2010) A literature overview of fuzzy conceptual data modeling. J Inf Sci Eng 26(2):427–441
Ma ZM, Zhang WJ, Ma WY (2000) Semantic measure of fuzzy data in extended possibility-based fuzzy relational databases. Int J Intell Syst 15(8):705–716. doi:10.1002/1098-111X(200008)15:8705::AID-INT23.0.CO;2-4
Ma ZM, Mili F (2002) Handling fuzzy information in extended possibility-based fuzzy relational databases. Int J Intell Syst 17(10):925–942. doi:10.1002/int.10057
Medina JM, Vila MA, Cubero JC, Pons O (1995) Towards the implementation of a generalized fuzzy relational database model. Fuzzy Sets Syst 75(3):273–289. doi:10.1016/0165-0114(94)00380-P
Metwally A, Faloutsos C (2012) V-smart-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors. Proc VLDB Endow 5(8):704–715
Petry FE (ed) (1997) Fuzzy databases: principles and applications. Kluwer Academic Publishers, Norwell
Prade H, Testemale C (1984) Generalizing database relational algebra for the treatment of incomplete or uncertain information and vague queries. Inf Sci 34(2):115–143. doi:10.1016/0020-0255(84)90020-3
Ramakrishnan SR, Swart G, Urmanov A (2012) Balancing reducer skew in mapreduce workloads using progressive sampling. In: Proceedings of the 3rd ACM symposium on cloud computing. SoCC ’12, pp 16–11614. ACM, New York. doi:10.1145/2391229.2391245
Shenoi S, Melton A (1989) Proximity relations in the fuzzy relational database model. Fuzzy Sets Syst 31(3):285–296. doi:10.1016/0165-0114(89)90201-7
Shenoi S, Melton A (1990) An extended version of the fuzzy relational database model. Inf Sci 52(1):35–52. doi:10.1016/0020-0255(90)90034-8
US (2016) Department of transportation. Online; accessed 23 Feb 2016. https://www.transportation.gov/
Vasant P (2013) Handbook of research on novel soft computing intelligent algorithms: theory and practical applications. Advances in computational intelligence and robotics (ACIR) book series. IGI Global. https://books.google.com/books?id=nt-WBQAAQBAJ
Vernica R, Carey MJ, Li C (2010) Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp. 495–506. ACM
Wang Y, Metwally A, Parthasarathy S (2013) Scalable all-pairs similarity search in metric spaces. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’13, pp. 829–837. ACM, New York. doi:10.1145/2487575.2487625
Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353. doi:10.1016/S0019-9958(65)90241-X
Zadeh LA (1999) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 100 Suppl 1(0):9–34. doi:10.1016/S0165-0114(99)80004-9
Zhang C, Li J, Wu L, Lin M, Liu W (2012) Sej: an even approach to multiway theta-joins using mapreduce. In: 2012 Second international conference on cloud and green computing (CGC), pp 73–80. doi:10.1109/CGC.2012.9
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This work does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Matthew Cremeens and Zhenge Zhao are contributed equally to this article.
Rights and permissions
About this article
Cite this article
Khorasani, E.S., Cremeens, M. & Zhao, Z. Implementation of scalable fuzzy relational operations in MapReduce. Soft Comput 22, 3061–3075 (2018). https://doi.org/10.1007/s00500-017-2561-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2561-3