Skip to main content

Evaluating and Improving Data Fusion Accuracy

  • Chapter
  • First Online:
Information Quality in Information Fusion and Decision Making

Part of the book series: Information Fusion and Data Science ((IFDS))

Abstract

Information fusion is the process of combining different sources of information for use in a particular application. The production of almost every information product incorporates some level of data fusion. Poor implementation of data and information fusion will have an impact on many other key data processes, most particularly data quality management, data governance, and data analytics. In this chapter we focus on a particular type of data fusion process called entity-based data fusion (EBDF) and on the application of EBDF in high-risk applications where accuracy of the fusion must be very high. One of the foremost examples is in healthcare. Fusing information belonging to different patients or failing to bring together all of the information for the same patient can both have dire, even life-threatening, implications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. P. Christen, Febrl- A freely available record linkage system with a graphical user interface, in Proceedings of the Australian Workshop on Health Data and Knowledge Management (HDKM). Conferences in research and practice in information technology (CRPIT), Wollongong, January 2008, vol. 80

    Google Scholar 

  2. P. Christen, Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection (Springer, New York, 2012)

    Book  Google Scholar 

  3. A. Doan, A. Halevy, Z. Ives, Principles of Data Integration (Morgan Kaufmann, Waltham, 2012)

    Google Scholar 

  4. A. Eram, A.G. Mohammed, V. Pillai, J.R. Talburt, Comparing the effectiveness of deterministic matching with probabilistic matching for entity resolution of student enrollment records, in 22nd MIT International Conference on Information Quality (ICIQ-2017), Little Rock, 6–7 October 2017, pp. 14:1–14:12

    Google Scholar 

  5. I. Fellegi, A. Sunter, A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969)

    Article  Google Scholar 

  6. C. Fisher, E. Lauria, S. Chengalur-Smith, R. Wang, Introduction to Information Quality (MIT Information Quality Program, Cambridge, MA, 2008)

    Google Scholar 

  7. T.N. Herzog, F.J. Scheuren, W.E. Winkler, Data Quality and Record Linkage Techniques (Springer, New York, 2007)

    MATH  Google Scholar 

  8. G. Holland, J.R. Talburt, A framework for evaluating information source interactions, in 2008 Conference on Applied Research in Information Technology, ed. by C. Hu, D. Berleant (University of Central Arkansas, Conway, 2008), pp. 13–19. http://research.acxiom.com/publications.html

  9. G. Holland, J.R. Talburt, An entity-based integration framework for modeling and evaluating data enhancement products. J. Comput. Sci. Coll. 24(5), 65–73 (2010)

    Google Scholar 

  10. ISO 8000-Part 61, Data Quality Management: Process Reference Model (ISO copyright office, Geneva, 2016)

    Google Scholar 

  11. F. Kobayashi, A. Eram, J. Talburt, Entity resolution using logistic regression as an extension to the rule-based OYSTER system, in Proceedings: IEEE International Conference on Multimedia Information Processing and Retrieval (IEEE MIPR 2018), Miami, 10–12 April 2018 (accepted for publication)

    Google Scholar 

  12. E. Lawley, Building a health data hub. March 29, 2010. Nashville Post (online version, downloaded July 24, 2010)

    Google Scholar 

  13. Y.W. Lee, L.L. Pipino, J.D. Funk, R.Y. Wang, Journey to Data Quality (MIT Press, Cambridge, MA, 2006)

    Google Scholar 

  14. D. Mahata, J.R. Talburt, A framework for collecting and managing entity identity information from social media, in 19th MIT International Conference on Information Quality, Xi’an, 1–3 August, 2014, pp. 216–233

    Google Scholar 

  15. C.D. Manning, P. Raghavan, H. Schütze, An Introduction to Information Retrieval (Cambridge University Press, Cambridge, England, 2009)

    MATH  Google Scholar 

  16. E. Nelson, J.R. Talburt, Improving the quality of law enforcement information through entity resolution, in 2008 Conference on Applied Research in Information Technology, ed. by C. Hu, D. Berleant (University of Central Arkansas, Conway, 2008), pp. 113–118. http://research.acxiom.com/publications.html

  17. E. Nelson, J.R. Talburt, Entity resolution for longitudinal studies in education using OYSTER, in Proceedings: 2011 Information and Knowledge Engineering Conference (IKE 2011), Las Vegas, 18–20 July 2011, pp. 286–290

    Google Scholar 

  18. M. Penning, J.R. Talburt, Information quality assessment and improvement of student information in the university environment, in The 2012 International Conference on Information and Knowledge Engineering (IKE’12), Las Vegas, 16–29 July 2012, pp. 351–357

    Google Scholar 

  19. M. Penning, Inferred error rates for entity resolution, Doctoral Dissertation, University of Arkansas at Little Rock, Published by Proquest, 2016

    Google Scholar 

  20. D. Pullen, A system for stratified sampling of entity resolution results to assess and improve accuracy with minimal clerical review, Doctoral dissertation, University of Arkansas at Little Rock, Published by Proquest, 2017

    Google Scholar 

  21. W.M. Rand, Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  22. J.R. Talburt, R. Hashemi, A formal framework for defining entity-based, data source integration, in 2008 International Conference on Information and Knowledge Engineering, ed. by H. Arabnia, R. Hashemi (CSREA Press, Las Vegas, 2008), pp. 394–398

    Google Scholar 

  23. J.R. Talburt, Y. Zhou, Entity Information Life Cycle for Big Data: Master Data Management and Information Integrations (Morgan Kaufmann, Waltham, 2015)

    Google Scholar 

  24. J.R. Talburt, Entity Resolution and Information Quality (Morgan Kaufmann, San Francisco, 2011)

    Google Scholar 

  25. E.M. Voorhees, W. Hersh, Overview of the TREC 2012 medical records track, in The Twenty-First Text Retrieval Conference (TREC 2012) Proceedings, National Institute of Standards and Technology, 2012

    Google Scholar 

  26. P. Wang, D. Pullen, J.R. Talburt, N. Wu, Iterative approach to weight calculation in probabilistic entity resolution, in 2014 International Conference on Information Quality, Xi’an, 1–3 August 2014

    Google Scholar 

  27. R.Y. Wang, A product perspective on total data quality management. Commun. ACM 41(2), 58–65 (1998)

    Article  Google Scholar 

  28. W.E. Winkler, Automatically Estimating Record Linkage False Match Rates (Census Bureau, Statistical Research Division, Washington, DC, 2007)

    Google Scholar 

  29. E. Yilmaz, J.A. Aslam, Estimating average precision with incomplete and imperfect judgments, in Proceedings of the Fifteenth ACM International Conference on Information and Knowledge Management, ACM Press, New York, NY, 2006

    Google Scholar 

  30. E. Yilmaz, E. Kanoulas, J.A. Aslam, A simple and efficient sampling method for estimating AP and NDCG, in Proceedings of the Thirty-First Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore, 2008

    Google Scholar 

  31. Y. Zhou, A. Kooshesh, J. Talburt, Optimizing the accuracy of entity-based data integration of multiple data sources using genetic programming methods. Int. J. Bus. Intell. Res. 3(1), 72–82 (2012)

    Article  Google Scholar 

  32. Y. Zhou, J. Talburt, Y. Su, L. Yin, OYSTER: a tool for entity resolution in health information exchange, in 5th International Conference on the Cooperation and Promotion of Information Resources in Science and Technology (COINFO’10), Beijing, 27–29 November 2010, pp. 356–362

    Google Scholar 

  33. Y. Zhou, J.R. Talburt, Entity identity information management, in International Conference on Information Quality 2011, Adelaide, 18–20 November 2011, electronic proceedings at: http://iciq2011.unisa.edu.au/doc/ICIQ2011_Proceeding_Nov.zip

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John R. Talburt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Talburt, J.R., Pullen, D., Penning, M. (2019). Evaluating and Improving Data Fusion Accuracy. In: Bossé, É., Rogova, G. (eds) Information Quality in Information Fusion and Decision Making. Information Fusion and Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-03643-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03643-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03642-3

  • Online ISBN: 978-3-030-03643-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics