Skip to main content

Classification and Fusion

  • Chapter
Data Fusion and Perception

Part of the book series: International Centre for Mechanical Sciences ((CISM,volume 431))

  • 205 Accesses

Abstract

We consider data fusion in the case of missing object identification. As a simple example think of fusion of partial overlapping address files of customers extracted from autonomous sites or of an administrative record census. The first example is related to customer relationship management (CRM), while the last one is a substitute of a regular census. This kind of data fusion causes problems of (schema) integration, solving semantic conflicts, and object identification if global identifiers are not locally available and local heterogeneous, autonomous databases are to be accessed. The complexity of the problem is increased by the existence of errors like input or loading errors, mispellings, missing values, and, of course, duplicated entries. We develop a unified framework for such kind of data fusion. We cover the feature selection problem, and embed the data fusion problem into a supervised classification problem. For each pair of records we have to decide whether a definite decision upon matching or not is possible and if it is possible, whether the two records are linked to an identical unit (customer, citizen etc.) or not. Candidates for classification can be selected from likelihood ratio tests (record linkage), classification trees, non linear classification or state vector machines. We illustrate our approach by a running example.

Part of this work was supported by the Berlin-Brandenburg Graduate School in Distributed Information Systems (DFG grant no. GRK 316)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining association rules between sets of items in large databases. In Peter Buneman and Sushil Jajodia, editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 26–28, 1993, pages 207216. ACM Press, 1993.

    Google Scholar 

  • Wendy Alvey and Bettye Jamerson, editors. Record Linkage Techniques —1997. Proceedings of an International Workshop and Exposition. March 20–21, 1997 in Arlington, Virginia, Washington, DC, 1997. Federal Committee on Statistical Methodology, Office of Management and Budget.

    Google Scholar 

  • Christof Bomhóvd and Alejandro P. Buchmann. A prototype for metadata-based integration of internet sources. In Advanced Information Systems Engineering (CAiSE ‘89), pages 439–445. Springer-Verlag, 1999.

    Google Scholar 

  • L. Breiman, J. Friedman, R Olshen, and C. Stone. Classification and regression trees. Chapman & Hall, 1984.

    Google Scholar 

  • Michael J. A. Berry and Gordon Linoff. Data mining techniques: for marketing, sales, and customer support. John Wiley & Sons, New York, 1997.

    Google Scholar 

  • Ivan P. Fellegi and Alan B. Sunter. A theory of record linkage. Journal of the American Statistical Association, 64: 1183–1210, 1969.

    Article  Google Scholar 

  • Beth Kilss and Wendy Alvey, editors. Record Linkage Techniques —1985. Proceedings of the Workshop on Exakt Matching Methodologies in Arlington, Virginia May 9–10, 1985, Internal Revenue Service Publication, Washington, DC, 1985. Department of the Treasury, Statistics of Income Division.

    Google Scholar 

  • Alexander McFarlane Mood, Franklin A Graybill, and Duane C. Boes. Introduction to the Theory of Statistics. McGraw-Hill series in probability and statistics. McGraw-Hill, Tokyo, 1974.

    Google Scholar 

  • Donald Michic, D. J. Spiegelhalter, and C. C. Taylor. Machine learning, neural and statistical classification. Horwood, New York, 1994.

    Google Scholar 

  • Mattis Neiling. Data Fusion with Record Linkage. In 3. Workshop “Föderierte Datenbanken” Magdeburg 1998, 1998.

    Google Scholar 

  • Mattis Neiling. Datenintegration durch Objekt-Identifikation. In Ralf-Detlef Kutsche, Ulf Leser, and Johann Christoph Freytag, editors, 4. Workshop Föderierte Datenbanken Berlin, Germany, 25.-26. November 1999, pages 117–143, 1999.

    Google Scholar 

  • Mattis Neiling. Datenintegration durch Objekt-Identifikation: Die Zusammenfiihrung von Datenquellen, die keine gemeinsamen Identifizierer enthalten. In 4. Workshop “Föderierte Datenbanken” Berlin 1999, 1999.

    Google Scholar 

  • Mattis Neiling and Hans-Joachim Lenz. The creation of register based census for germany in 2001. An application of data integration. discussion paper 1999/34, Fachbereich Wirtschaftswissenschaft der Freien Universität Berlin, 1999.

    Google Scholar 

  • Mattis Neiling and Hans-Joachim Lenz. Data integration by means of object identification in information systems. In Hans Robert Hansen et al., editor, Proceedings of the 8th European Conference on Information Systems (ECIS 2000), Vienna, Austria, July 2000, 2000.

    Google Scholar 

  • Mattis Neiling and Hans-Joachim Lenz. Supplement of information: Data integration by classification of pairs of records. In 24th Annual Conference of the Gesellschaft far Klass, kation, Passau, Germany, March 15–17, 2000, 2000. to appear.

    Google Scholar 

  • J. R. Quinlan. Q4.5: Programs for Machine Learning. Morgan Kaufmann, 1991

    Google Scholar 

  • Gio Wiederhold. Mediators in the architecture of future information systems. IEEE Computer, 25 (3): 3849, 1992.

    Google Scholar 

  • William E. Winkler. Matching and record linkage. In B. G. Cox, editor, Business Survey Methods, pages 355–384. J. Wiley, New York, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Wien

About this chapter

Cite this chapter

Lenz, HJ., Neiling, M. (2001). Classification and Fusion. In: Della Riccia, G., Lenz, HJ., Kruse, R. (eds) Data Fusion and Perception. International Centre for Mechanical Sciences, vol 431. Springer, Vienna. https://doi.org/10.1007/978-3-7091-2580-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-2580-9_6

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-211-83683-5

  • Online ISBN: 978-3-7091-2580-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics