A Flexible Fuzzy Expert System for Fuzzy Duplicate Elimination in Data Cleaning

Shahri, Hamid Haidarian; Barforush, Ahmad Abdollahzadeh

doi:10.1007/978-3-540-30075-5_16

Hamid Haidarian Shahri¹⁹ &
Ahmad Abdollahzadeh Barforush¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3180))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

706 Accesses
7 Citations

Abstract

Data cleaning deals with the detection and removal of errors and inconsistencies in data, gathered from distributed sources. This process is essential for drawing correct conclusions from data in decision support systems. Eliminating fuzzy duplicate records is a fundamental part of the data cleaning process. The vagueness and uncertainty involved in detecting fuzzy duplicates make it a niche, for applying fuzzy reasoning. Although uncertainty alg ebras like fuzzy logic are known, their applicability to the problem of duplicate elimination has remained unexplored and unclear, until today. In this paper, a novel and flexible fuzzy expert system for detection and elimination of fuzzy duplicates in the process of data cleaning is devised, which circumvents the repetitive and inconvenient task of hard-coding. Some of the crucial advantages of this approach are its flexibility, ease of use, extendibility, fast development time and efficient run time, when used in various information systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bilenko, M., Mooney, R.J.: Adaptive Duplicate Detection Using Learnable String Similarity Measures. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), Washington, DC (August 2003)
Google Scholar
Cohen, W., Ravikumar, P., Fienberg, S.: A Comparison of String Distance Metrics for Name-Matching Tasks. In: IIWeb Workshop 2003 (2003)
Google Scholar
Galhardas, H., Florescu, D., et al.: Declarative Data Cleaning: Language, Model, and Algorithms. In: Proc. of the 27th VLDB Conference (2001)
Google Scholar
Haidarian Shahri, H., Barforush, A.A.: Data Mining for Removing Fuzzy Duplicates Using Fuzzy Inference. In: 23rd International Conference of the North American Fuzzy Information Processing Society (NAFIPS 2004), Banff, Alberta, Canada, June 27-30 (2004)
Google Scholar
Hernandez, M.A., Stolfo, S.J.: Real-world Data is Dirty: Data Cleansing and the Merge/Purge Problem. Data Mining and Knowledge Discovery 2(1), 9–37 (1998)
Article Google Scholar
Low, W.L., Lee, M.L., Ling, T.W.: A Knowledge-based Approach for Duplicate Elimination in Data Cleaning. Information Systems 26, 585–606 (2001)
Article MATH Google Scholar
Mamdani, E.H.: Advances in Linguistic Synthesis of Fuzzy Controllers. Int. J. Man Machine Studies 8 (1976)
Google Scholar
Monge, A.E., Elkan, P.C.: An Efficient Domain-independent Algorithm for Detecting Approximately Duplicate Database Records. In: Proceedings of the SIGMOD 1997 Workshop on Data Mining and Knowledge Discovery (May 1997)
Google Scholar
Rahm, E., Do, H.H.: Data Cleaning: Problems and Current Approaches. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Special Issue on Data Cleaning 23(4) (December 2000)
Google Scholar
Raman, V., Hellerstein, J.M.: Potter’s Wheel: An Interactive Data Cleaning System. In: Proc. of the 27th VLDB Conference (2001)
Google Scholar
Winkler, W.E.: The State of Record Linkage and Current Research Problems. Statistics of Income Division, Internal Revenue Service Publication R99/04 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Engineering and Information Technology, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Hamid Haidarian Shahri & Ahmad Abdollahzadeh Barforush

Authors

Hamid Haidarian Shahri
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Abdollahzadeh Barforush
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Zaragoza, Ciudad Universitaria, Plaza San Francisco, 50009, Zaragoza
Fernando Galindo
Seikei University, Japan
Makoto Takizawa
Institute of Informatics in Business and Government, University of Linz, Altenbergerstr. 69, 4040, Linz, Austria
Roland Traunmüller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shahri, H.H., Barforush, A.A. (2004). A Flexible Fuzzy Expert System for Fuzzy Duplicate Elimination in Data Cleaning. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-30075-5_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22936-0
Online ISBN: 978-3-540-30075-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics