Abstract
Information fusion is the process of combining different sources of information for use in a particular application. The production of almost every information product incorporates some level of data fusion. Poor implementation of data and information fusion will have an impact on many other key data processes, most particularly data quality management, data governance, and data analytics. In this chapter we focus on a particular type of data fusion process called entity-based data fusion (EBDF) and on the application of EBDF in high-risk applications where accuracy of the fusion must be very high. One of the foremost examples is in healthcare. Fusing information belonging to different patients or failing to bring together all of the information for the same patient can both have dire, even life-threatening, implications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
P. Christen, Febrl- A freely available record linkage system with a graphical user interface, in Proceedings of the Australian Workshop on Health Data and Knowledge Management (HDKM). Conferences in research and practice in information technology (CRPIT), Wollongong, January 2008, vol. 80
P. Christen, Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection (Springer, New York, 2012)
A. Doan, A. Halevy, Z. Ives, Principles of Data Integration (Morgan Kaufmann, Waltham, 2012)
A. Eram, A.G. Mohammed, V. Pillai, J.R. Talburt, Comparing the effectiveness of deterministic matching with probabilistic matching for entity resolution of student enrollment records, in 22nd MIT International Conference on Information Quality (ICIQ-2017), Little Rock, 6–7 October 2017, pp. 14:1–14:12
I. Fellegi, A. Sunter, A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969)
C. Fisher, E. Lauria, S. Chengalur-Smith, R. Wang, Introduction to Information Quality (MIT Information Quality Program, Cambridge, MA, 2008)
T.N. Herzog, F.J. Scheuren, W.E. Winkler, Data Quality and Record Linkage Techniques (Springer, New York, 2007)
G. Holland, J.R. Talburt, A framework for evaluating information source interactions, in 2008 Conference on Applied Research in Information Technology, ed. by C. Hu, D. Berleant (University of Central Arkansas, Conway, 2008), pp. 13–19. http://research.acxiom.com/publications.html
G. Holland, J.R. Talburt, An entity-based integration framework for modeling and evaluating data enhancement products. J. Comput. Sci. Coll. 24(5), 65–73 (2010)
ISO 8000-Part 61, Data Quality Management: Process Reference Model (ISO copyright office, Geneva, 2016)
F. Kobayashi, A. Eram, J. Talburt, Entity resolution using logistic regression as an extension to the rule-based OYSTER system, in Proceedings: IEEE International Conference on Multimedia Information Processing and Retrieval (IEEE MIPR 2018), Miami, 10–12 April 2018 (accepted for publication)
E. Lawley, Building a health data hub. March 29, 2010. Nashville Post (online version, downloaded July 24, 2010)
Y.W. Lee, L.L. Pipino, J.D. Funk, R.Y. Wang, Journey to Data Quality (MIT Press, Cambridge, MA, 2006)
D. Mahata, J.R. Talburt, A framework for collecting and managing entity identity information from social media, in 19th MIT International Conference on Information Quality, Xi’an, 1–3 August, 2014, pp. 216–233
C.D. Manning, P. Raghavan, H. Schütze, An Introduction to Information Retrieval (Cambridge University Press, Cambridge, England, 2009)
E. Nelson, J.R. Talburt, Improving the quality of law enforcement information through entity resolution, in 2008 Conference on Applied Research in Information Technology, ed. by C. Hu, D. Berleant (University of Central Arkansas, Conway, 2008), pp. 113–118. http://research.acxiom.com/publications.html
E. Nelson, J.R. Talburt, Entity resolution for longitudinal studies in education using OYSTER, in Proceedings: 2011 Information and Knowledge Engineering Conference (IKE 2011), Las Vegas, 18–20 July 2011, pp. 286–290
M. Penning, J.R. Talburt, Information quality assessment and improvement of student information in the university environment, in The 2012 International Conference on Information and Knowledge Engineering (IKE’12), Las Vegas, 16–29 July 2012, pp. 351–357
M. Penning, Inferred error rates for entity resolution, Doctoral Dissertation, University of Arkansas at Little Rock, Published by Proquest, 2016
D. Pullen, A system for stratified sampling of entity resolution results to assess and improve accuracy with minimal clerical review, Doctoral dissertation, University of Arkansas at Little Rock, Published by Proquest, 2017
W.M. Rand, Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
J.R. Talburt, R. Hashemi, A formal framework for defining entity-based, data source integration, in 2008 International Conference on Information and Knowledge Engineering, ed. by H. Arabnia, R. Hashemi (CSREA Press, Las Vegas, 2008), pp. 394–398
J.R. Talburt, Y. Zhou, Entity Information Life Cycle for Big Data: Master Data Management and Information Integrations (Morgan Kaufmann, Waltham, 2015)
J.R. Talburt, Entity Resolution and Information Quality (Morgan Kaufmann, San Francisco, 2011)
E.M. Voorhees, W. Hersh, Overview of the TREC 2012 medical records track, in The Twenty-First Text Retrieval Conference (TREC 2012) Proceedings, National Institute of Standards and Technology, 2012
P. Wang, D. Pullen, J.R. Talburt, N. Wu, Iterative approach to weight calculation in probabilistic entity resolution, in 2014 International Conference on Information Quality, Xi’an, 1–3 August 2014
R.Y. Wang, A product perspective on total data quality management. Commun. ACM 41(2), 58–65 (1998)
W.E. Winkler, Automatically Estimating Record Linkage False Match Rates (Census Bureau, Statistical Research Division, Washington, DC, 2007)
E. Yilmaz, J.A. Aslam, Estimating average precision with incomplete and imperfect judgments, in Proceedings of the Fifteenth ACM International Conference on Information and Knowledge Management, ACM Press, New York, NY, 2006
E. Yilmaz, E. Kanoulas, J.A. Aslam, A simple and efficient sampling method for estimating AP and NDCG, in Proceedings of the Thirty-First Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore, 2008
Y. Zhou, A. Kooshesh, J. Talburt, Optimizing the accuracy of entity-based data integration of multiple data sources using genetic programming methods. Int. J. Bus. Intell. Res. 3(1), 72–82 (2012)
Y. Zhou, J. Talburt, Y. Su, L. Yin, OYSTER: a tool for entity resolution in health information exchange, in 5th International Conference on the Cooperation and Promotion of Information Resources in Science and Technology (COINFO’10), Beijing, 27–29 November 2010, pp. 356–362
Y. Zhou, J.R. Talburt, Entity identity information management, in International Conference on Information Quality 2011, Adelaide, 18–20 November 2011, electronic proceedings at: http://iciq2011.unisa.edu.au/doc/ICIQ2011_Proceeding_Nov.zip
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Talburt, J.R., Pullen, D., Penning, M. (2019). Evaluating and Improving Data Fusion Accuracy. In: Bossé, É., Rogova, G. (eds) Information Quality in Information Fusion and Decision Making. Information Fusion and Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-03643-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-03643-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03642-3
Online ISBN: 978-3-030-03643-0
eBook Packages: Computer ScienceComputer Science (R0)