Skip to main content

Implementation of scalable fuzzy relational operations in MapReduce

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

One of the main restrictions of relational database models is their lack of support for flexible, imprecise and vague information in data representation and querying. The imprecision is pervasive in human language; hence, modeling imprecision is crucial for any system that stores and processes linguistic data. Fuzzy set theory provides an effective solution to model the imprecision inherent in the meaning of words and propositions drawn from natural language (Zadeh, Inf Control 8(3):338–353, doi:10.1016/S0019-9958(65)90241-X, 1965; IGI Global, https://books.google.com/books?id=nt-WBQAAQBAJ, 2013). Several works in the last 20 years have used fuzzy set theory to extend relational database models to permit representation and retrieval of imprecise data. However, to our knowledge, such approaches have not been designed to scale-up to very large datasets. In this paper, the MapReduce framework is used to implement flexible fuzzy queries on a large-scale dataset. We develop MapReduce algorithms to enhance the standard relational operations with fuzzy conditional predicates expressed in natural language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. If A is a fuzzy set and \(\overline{A}\) is its complement then \(\mu _{\overline{A}}(x)=1-\mu _A(x)\).

  2. As proven in Sect. 3.2.4 each output records is generated only once; therefore, it seems reasonable to measure the growth in terms of the number of output records.

References

  • Afrati FN, Sarma AD, Menestrina D, Parameswaran A, Ullman JD (2012) Fuzzy joins using mapreduce. In: 2012 IEEE 28th international conference on data engineering (ICDE). IEEE, pp 498–509

  • Atta F, Viglas SD, Niazi S (2011) Sand join: a skew handling join algorithm for google’s mapreduce framework. In: 2011 IEEE 14th international multitopic conference (INMIC), pp 170–175. doi:10.1109/INMIC.2011.6151466

  • Bosc P, Prade H (1997) An introduction to the fuzzy set and possibility theory-based treatment of flexible queries and uncertain or imprecise databases. In: Motro A, Smets P (eds) Uncertainty management in information systems. Springer, New York, pp 285–324

    Chapter  Google Scholar 

  • Buckles BP, Petry FE (1982) A fuzzy representation of data for relational databases. Fuzzy Sets Syst 7(3):213–226. doi:10.1016/0165-0114(82)90052-5

    Article  MATH  Google Scholar 

  • Buckley JJ, Eslami E (2002) An introduction to fuzzy logic and fuzzy sets, vol 13. Springer, New York

    Book  MATH  Google Scholar 

  • Chen G (1998) Fuzzy logic in data modeling: semantics, constraints, and database design. Kluwer Academic Publishers, Norwell

    Book  MATH  Google Scholar 

  • Das Sarma A, He Y, Chaudhuri S (2014) Clusterjoin: a similarity joins framework using map-reduce. Proc VLDB Endow 7(12):1059–1070. doi:10.14778/2732977.2732981

    Article  Google Scholar 

  • Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  • Dubois D, Prade H (1986) Weighted minimum and maximum operations in fuzzy set theory. Inf Sci 39(2):205–210. doi:10.1016/0020-0255(86)90035-6

    Article  MathSciNet  MATH  Google Scholar 

  • Elmeleegy K, Olston C, Reed B (2014) Spongefiles: mitigating data skew in mapreduce using distributed memory. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. SIGMOD ’14, pp. 551–562. ACM, New York. doi:10.1145/2588555.2595634

  • Galindo J (2005) Fuzzy databases: modeling, design and implementation: modeling, design and implementation IGI Global

  • Gufler B, Augsten N, Reiser A, Kemper A (2012) Load balancing in mapreduce based on scalable cardinality estimates. In: 2012 IEEE 28th international conference on data engineering (ICDE), pp 522–533. doi:10.1109/ICDE.2012.58

  • Hassan MAH, Bamha M, Loulergue F (2014) Handling data-skew effects in join operations using mapreduce. Proc Comput Sci 29:145–158. doi:10.1016/j.procs.2014.05.014. 2014 International conference on computational science

  • Klir GJ, Clair UHS, Yuan B (1997) Fuzzy set theory: foundations and applications. Prentice Hall. https://books.google.com/books?id=DNxQAAAAMAAJ

  • Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. SIGMOD ’12, pp. 25–36. ACM, New York. doi:10.1145/2213836.2213840

  • Kyritsis V, Lekeas P, Souliou D, Afrati F (2012) A new framework for join product skew. In: Lacroix Z, Vidal M (eds) Resource discovery, vol 6799., Lecture notes in computer scienceSpringer, New York, pp 1–10

    Chapter  Google Scholar 

  • Ma ZM, Yan L (2010) A literature overview of fuzzy conceptual data modeling. J Inf Sci Eng 26(2):427–441

    Google Scholar 

  • Ma ZM, Zhang WJ, Ma WY (2000) Semantic measure of fuzzy data in extended possibility-based fuzzy relational databases. Int J Intell Syst 15(8):705–716. doi:10.1002/1098-111X(200008)15:8705::AID-INT23.0.CO;2-4

    Article  MATH  Google Scholar 

  • Ma ZM, Mili F (2002) Handling fuzzy information in extended possibility-based fuzzy relational databases. Int J Intell Syst 17(10):925–942. doi:10.1002/int.10057

    Article  MATH  Google Scholar 

  • Medina JM, Vila MA, Cubero JC, Pons O (1995) Towards the implementation of a generalized fuzzy relational database model. Fuzzy Sets Syst 75(3):273–289. doi:10.1016/0165-0114(94)00380-P

    Article  MathSciNet  MATH  Google Scholar 

  • Metwally A, Faloutsos C (2012) V-smart-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors. Proc VLDB Endow 5(8):704–715

    Article  Google Scholar 

  • Petry FE (ed) (1997) Fuzzy databases: principles and applications. Kluwer Academic Publishers, Norwell

  • Prade H, Testemale C (1984) Generalizing database relational algebra for the treatment of incomplete or uncertain information and vague queries. Inf Sci 34(2):115–143. doi:10.1016/0020-0255(84)90020-3

    Article  MathSciNet  MATH  Google Scholar 

  • Ramakrishnan SR, Swart G, Urmanov A (2012) Balancing reducer skew in mapreduce workloads using progressive sampling. In: Proceedings of the 3rd ACM symposium on cloud computing. SoCC ’12, pp 16–11614. ACM, New York. doi:10.1145/2391229.2391245

  • Shenoi S, Melton A (1989) Proximity relations in the fuzzy relational database model. Fuzzy Sets Syst 31(3):285–296. doi:10.1016/0165-0114(89)90201-7

    Article  MathSciNet  MATH  Google Scholar 

  • Shenoi S, Melton A (1990) An extended version of the fuzzy relational database model. Inf Sci 52(1):35–52. doi:10.1016/0020-0255(90)90034-8

    Article  MathSciNet  MATH  Google Scholar 

  • US (2016) Department of transportation. Online; accessed 23 Feb 2016. https://www.transportation.gov/

  • Vasant P (2013) Handbook of research on novel soft computing intelligent algorithms: theory and practical applications. Advances in computational intelligence and robotics (ACIR) book series. IGI Global. https://books.google.com/books?id=nt-WBQAAQBAJ

  • Vernica R, Carey MJ, Li C (2010) Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp. 495–506. ACM

  • Wang Y, Metwally A, Parthasarathy S (2013) Scalable all-pairs similarity search in metric spaces. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’13, pp. 829–837. ACM, New York. doi:10.1145/2487575.2487625

  • Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353. doi:10.1016/S0019-9958(65)90241-X

    Article  MATH  Google Scholar 

  • Zadeh LA (1999) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 100 Suppl 1(0):9–34. doi:10.1016/S0165-0114(99)80004-9

    Article  Google Scholar 

  • Zhang C, Li J, Wu L, Lin M, Liu W (2012) Sej: an even approach to multiway theta-joins using mapreduce. In: 2012 Second international conference on cloud and green computing (CGC), pp 73–80. doi:10.1109/CGC.2012.9

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elham S. Khorasani.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This work does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Matthew Cremeens and Zhenge Zhao are contributed equally to this article.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khorasani, E.S., Cremeens, M. & Zhao, Z. Implementation of scalable fuzzy relational operations in MapReduce. Soft Comput 22, 3061–3075 (2018). https://doi.org/10.1007/s00500-017-2561-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2561-3

Keywords