On solving the continuous data editing problem

https://doi.org/10.1016/0305-0548(96)81769-2Get rights and content

Abstract

The data editing problem is concerned with identifying the most likely source of errors in computerized data bases. Given a record that is known to fail one or more ological consistency edits, the objective is to determine the minimum (possibly weighted) number of fields that could be changed in order to correct the record. While this problem can easily be formulated as a pure fixed-charge problem, it can be extremely difficult to solve under certain data conditions. In this paper we show how a number of structural characteristics in this problem can be exploited to dramatically reduce the computational time required to solve particularly difficult data edition problems.

References (19)

There are more references available in the full text version of this article.

Cited by (10)

  • A branch-and-cut algorithm for the continuous error localization problem in data cleaning

    2007, Computers and Operations Research
    Citation Excerpt :

    Once again, the master problem is enriched with cuts generated by the subproblem from a non-feasible solution of the master problem. However, in contrast to previous approaches [1,11,6,7] these new inequalities are not set-covering cuts. At first glance, using this alternative is not apparently an advantage since it usually requires the solving of a long sequence of smaller problems.

  • Discrete models for data imputation

    2004, Discrete Applied Mathematics
  • Handbook of Statistical Data Editing and Imputation

    2011, Handbook of Statistical Data Editing and Imputation
View all citing articles on Scopus

Cliff T. Ragsdale is an Assistant Professor of Management Science at Virginia Polytechnic Institute and State University. He received his B.A. and M.B.A. degrees from the University of Central Florida and holds a Ph.D. in Management Science and Information Technology from the University of Georgia. Dr Ragsdale's primary research interests are in the areas of applied statistics, optimization, and artificial intelligence. His research has appeared in Decision Sciences, Naval Research Logistics, Computers & Operations Research, OMEGA, Operational Research Letters, Financial Services Review, and other publications. He is also author of the book Spreadsheet Modeling and Decision Analysis: A Practical Introduction to Management Science, published recently by Course Technology, Inc.

§

Patrick G. McKeown is a Professor in the Department of Management at the University of Georgia. He received his Ph.D. from the University of North Carolina at Chapel Hill and his M.S. and B.S. degrees from the Georgia Institute of Technology. Dr McKeown's primary research interests are in the areas of linear and integer programming, and algorithm development. He has authored numerous books in the areas of management science, computer programming, and information systems and technology. His research has appeared in several journals including Operations Research, Management Science, Naval Research Logistics Quarterly, Computers & Operations Research, and the SIAM Journal of Scientific and Statistical Computing.

View full text