Reference Hub8
An Efficient Algorithm for Data Cleaning

An Efficient Algorithm for Data Cleaning

Payal Pahwa, Rajiv Arora, Garima Thakur
Copyright: © 2011 |Volume: 1 |Issue: 4 |Pages: 16
ISSN: 2155-6393|EISSN: 2155-6407|EISBN13: 9781613508305|DOI: 10.4018/ijkbo.2011100104
Cite Article Cite Article

MLA

Pahwa, Payal, et al. "An Efficient Algorithm for Data Cleaning." IJKBO vol.1, no.4 2011: pp.56-71. http://doi.org/10.4018/ijkbo.2011100104

APA

Pahwa, P., Arora, R., & Thakur, G. (2011). An Efficient Algorithm for Data Cleaning. International Journal of Knowledge-Based Organizations (IJKBO), 1(4), 56-71. http://doi.org/10.4018/ijkbo.2011100104

Chicago

Pahwa, Payal, Rajiv Arora, and Garima Thakur. "An Efficient Algorithm for Data Cleaning," International Journal of Knowledge-Based Organizations (IJKBO) 1, no.4: 56-71. http://doi.org/10.4018/ijkbo.2011100104

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

The quality of real world data that is being fed into a data warehouse is a major concern of today. As the data comes from a variety of sources before loading the data in the data warehouse, it must be checked for errors and anomalies. There may be exact duplicate records or approximate duplicate records in the source data. The presence of incorrect or inconsistent data can significantly distort the results of analyses, often negating the potential benefits of information-driven approaches. This paper addresses issues related to detection and correction of such duplicate records. Also, it analyzes data quality and various factors that degrade it. A brief analysis of existing work is discussed, pointing out its major limitations. Thus, a new framework is proposed that is an improvement over the existing technique.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.