Abstract
We consider data fusion in the case of missing object identification. As a simple example think of fusion of partial overlapping address files of customers extracted from autonomous sites or of an administrative record census. The first example is related to customer relationship management (CRM), while the last one is a substitute of a regular census. This kind of data fusion causes problems of (schema) integration, solving semantic conflicts, and object identification if global identifiers are not locally available and local heterogeneous, autonomous databases are to be accessed. The complexity of the problem is increased by the existence of errors like input or loading errors, mispellings, missing values, and, of course, duplicated entries. We develop a unified framework for such kind of data fusion. We cover the feature selection problem, and embed the data fusion problem into a supervised classification problem. For each pair of records we have to decide whether a definite decision upon matching or not is possible and if it is possible, whether the two records are linked to an identical unit (customer, citizen etc.) or not. Candidates for classification can be selected from likelihood ratio tests (record linkage), classification trees, non linear classification or state vector machines. We illustrate our approach by a running example.
Part of this work was supported by the Berlin-Brandenburg Graduate School in Distributed Information Systems (DFG grant no. GRK 316)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining association rules between sets of items in large databases. In Peter Buneman and Sushil Jajodia, editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 26–28, 1993, pages 207216. ACM Press, 1993.
Wendy Alvey and Bettye Jamerson, editors. Record Linkage Techniques —1997. Proceedings of an International Workshop and Exposition. March 20–21, 1997 in Arlington, Virginia, Washington, DC, 1997. Federal Committee on Statistical Methodology, Office of Management and Budget.
Christof Bomhóvd and Alejandro P. Buchmann. A prototype for metadata-based integration of internet sources. In Advanced Information Systems Engineering (CAiSE ‘89), pages 439–445. Springer-Verlag, 1999.
L. Breiman, J. Friedman, R Olshen, and C. Stone. Classification and regression trees. Chapman & Hall, 1984.
Michael J. A. Berry and Gordon Linoff. Data mining techniques: for marketing, sales, and customer support. John Wiley & Sons, New York, 1997.
Ivan P. Fellegi and Alan B. Sunter. A theory of record linkage. Journal of the American Statistical Association, 64: 1183–1210, 1969.
Beth Kilss and Wendy Alvey, editors. Record Linkage Techniques —1985. Proceedings of the Workshop on Exakt Matching Methodologies in Arlington, Virginia May 9–10, 1985, Internal Revenue Service Publication, Washington, DC, 1985. Department of the Treasury, Statistics of Income Division.
Alexander McFarlane Mood, Franklin A Graybill, and Duane C. Boes. Introduction to the Theory of Statistics. McGraw-Hill series in probability and statistics. McGraw-Hill, Tokyo, 1974.
Donald Michic, D. J. Spiegelhalter, and C. C. Taylor. Machine learning, neural and statistical classification. Horwood, New York, 1994.
Mattis Neiling. Data Fusion with Record Linkage. In 3. Workshop “Föderierte Datenbanken” Magdeburg 1998, 1998.
Mattis Neiling. Datenintegration durch Objekt-Identifikation. In Ralf-Detlef Kutsche, Ulf Leser, and Johann Christoph Freytag, editors, 4. Workshop Föderierte Datenbanken Berlin, Germany, 25.-26. November 1999, pages 117–143, 1999.
Mattis Neiling. Datenintegration durch Objekt-Identifikation: Die Zusammenfiihrung von Datenquellen, die keine gemeinsamen Identifizierer enthalten. In 4. Workshop “Föderierte Datenbanken” Berlin 1999, 1999.
Mattis Neiling and Hans-Joachim Lenz. The creation of register based census for germany in 2001. An application of data integration. discussion paper 1999/34, Fachbereich Wirtschaftswissenschaft der Freien Universität Berlin, 1999.
Mattis Neiling and Hans-Joachim Lenz. Data integration by means of object identification in information systems. In Hans Robert Hansen et al., editor, Proceedings of the 8th European Conference on Information Systems (ECIS 2000), Vienna, Austria, July 2000, 2000.
Mattis Neiling and Hans-Joachim Lenz. Supplement of information: Data integration by classification of pairs of records. In 24th Annual Conference of the Gesellschaft far Klass, kation, Passau, Germany, March 15–17, 2000, 2000. to appear.
J. R. Quinlan. Q4.5: Programs for Machine Learning. Morgan Kaufmann, 1991
Gio Wiederhold. Mediators in the architecture of future information systems. IEEE Computer, 25 (3): 3849, 1992.
William E. Winkler. Matching and record linkage. In B. G. Cox, editor, Business Survey Methods, pages 355–384. J. Wiley, New York, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Wien
About this chapter
Cite this chapter
Lenz, HJ., Neiling, M. (2001). Classification and Fusion. In: Della Riccia, G., Lenz, HJ., Kruse, R. (eds) Data Fusion and Perception. International Centre for Mechanical Sciences, vol 431. Springer, Vienna. https://doi.org/10.1007/978-3-7091-2580-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-7091-2580-9_6
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-83683-5
Online ISBN: 978-3-7091-2580-9
eBook Packages: Springer Book Archive