An Efficient Algorithm for Data Cleaning

Payal Pahwa, Rajiv Arora, Garima Thakur

Source Title: International Journal of Knowledge-Based Organizations (IJKBO)1(4)

ISSN: 2155-6393|EISSN: 2155-6407|EISBN13: 9781613508305|DOI: 10.4018/ijkbo.2011100104

MLA

Pahwa, Payal, et al. "An Efficient Algorithm for Data Cleaning." IJKBO vol.1, no.4 2011: pp.56-71. http://doi.org/10.4018/ijkbo.2011100104

APA

Pahwa, P., Arora, R., & Thakur, G. (2011). An Efficient Algorithm for Data Cleaning. International Journal of Knowledge-Based Organizations (IJKBO), 1(4), 56-71. http://doi.org/10.4018/ijkbo.2011100104

Chicago

Pahwa, Payal, Rajiv Arora, and Garima Thakur. "An Efficient Algorithm for Data Cleaning," International Journal of Knowledge-Based Organizations (IJKBO) 1, no.4: 56-71. http://doi.org/10.4018/ijkbo.2011100104

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

The quality of real world data that is being fed into a data warehouse is a major concern of today. As the data comes from a variety of sources before loading the data in the data warehouse, it must be checked for errors and anomalies. There may be exact duplicate records or approximate duplicate records in the source data. The presence of incorrect or inconsistent data can significantly distort the results of analyses, often negating the potential benefits of information-driven approaches. This paper addresses issues related to detection and correction of such duplicate records. Also, it analyzes data quality and various factors that degrade it. A brief analysis of existing work is discussed, pointing out its major limitations. Thus, a new framework is proposed that is an improvement over the existing technique.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

An Efficient Algorithm for Data Cleaning

MLA

APA

Chicago

Export Reference

Abstract

Request Access