Data Cleaning

Ganti, Venkatesh

doi:10.1007/978-0-387-39940-9_592

Data Cleaning

Venkatesh Ganti³

Reference work entry

325 Accesses

Definition

Owing to differences in conventions between the external sources and the target data warehouse as well as due to a variety of errors, data from external sources may not conform to the standards and requirements at the data warehouse. Therefore, data has to be transformed and cleaned before it is loaded into a data warehouse so that downstream data analysis is reliable and accurate. Data Cleaning is the process of standardizing data representation and eliminating errors in data. The data cleaning process often involves one or more tasks each of which is important on its own. Each of these tasks addresses a part of the overall data cleaning problem. In addition to tasks which focus on transforming and modifying data, the problem of diagnosing quality of data in a database is important. This diagnosis process, often called data profiling, can usually identify data quality issues and whether or not the data cleaning process is meeting its goals.

Historical Background

Many...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 2,500.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Author information

Authors and Affiliations

Microsoft Research, Redmond, WA, USA
Venkatesh Ganti

Authors

Venkatesh Ganti
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computing, Georgia Institute of Technology, 266 Ferst Drive, 30332-0765, Atlanta, GA, USA
LING LIU (Professor) (Professor)
Database Research Group David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, N2L 3G1, Waterloo, ON, Canada
M. TAMER ÖZSU (Professor and Director, University Research Chair) (Professor and Director, University Research Chair)

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Ganti, V. (2009). Data Cleaning. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_592

Download citation

DOI: https://doi.org/10.1007/978-0-387-39940-9_592
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-35544-3
Online ISBN: 978-0-387-39940-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Data Cleaning

Definition

Historical Background

Recommended Reading

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Definition

Historical Background

Buying options

Recommended Reading

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation