Evaluating and Improving Data Fusion Accuracy

Talburt, John R.; Pullen, Daniel; Penning, Melody

doi:10.1007/978-3-030-03643-0_14

John R. Talburt⁴,
Daniel Pullen⁴ &
Melody Penning⁵

Part of the book series: Information Fusion and Data Science ((IFDS))

1266 Accesses
3 Citations

Abstract

Information fusion is the process of combining different sources of information for use in a particular application. The production of almost every information product incorporates some level of data fusion. Poor implementation of data and information fusion will have an impact on many other key data processes, most particularly data quality management, data governance, and data analytics. In this chapter we focus on a particular type of data fusion process called entity-based data fusion (EBDF) and on the application of EBDF in high-risk applications where accuracy of the fusion must be very high. One of the foremost examples is in healthcare. Fusing information belonging to different patients or failing to bring together all of the information for the same patient can both have dire, even life-threatening, implications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

P. Christen, Febrl- A freely available record linkage system with a graphical user interface, in Proceedings of the Australian Workshop on Health Data and Knowledge Management (HDKM). Conferences in research and practice in information technology (CRPIT), Wollongong, January 2008, vol. 80
Google Scholar
P. Christen, Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection (Springer, New York, 2012)
Book Google Scholar
A. Doan, A. Halevy, Z. Ives, Principles of Data Integration (Morgan Kaufmann, Waltham, 2012)
Google Scholar
A. Eram, A.G. Mohammed, V. Pillai, J.R. Talburt, Comparing the effectiveness of deterministic matching with probabilistic matching for entity resolution of student enrollment records, in 22nd MIT International Conference on Information Quality (ICIQ-2017), Little Rock, 6–7 October 2017, pp. 14:1–14:12
Google Scholar
I. Fellegi, A. Sunter, A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969)
Article Google Scholar
C. Fisher, E. Lauria, S. Chengalur-Smith, R. Wang, Introduction to Information Quality (MIT Information Quality Program, Cambridge, MA, 2008)
Google Scholar
T.N. Herzog, F.J. Scheuren, W.E. Winkler, Data Quality and Record Linkage Techniques (Springer, New York, 2007)
MATH Google Scholar
G. Holland, J.R. Talburt, A framework for evaluating information source interactions, in 2008 Conference on Applied Research in Information Technology, ed. by C. Hu, D. Berleant (University of Central Arkansas, Conway, 2008), pp. 13–19. http://research.acxiom.com/publications.html
G. Holland, J.R. Talburt, An entity-based integration framework for modeling and evaluating data enhancement products. J. Comput. Sci. Coll. 24(5), 65–73 (2010)
Google Scholar
ISO 8000-Part 61, Data Quality Management: Process Reference Model (ISO copyright office, Geneva, 2016)
Google Scholar
F. Kobayashi, A. Eram, J. Talburt, Entity resolution using logistic regression as an extension to the rule-based OYSTER system, in Proceedings: IEEE International Conference on Multimedia Information Processing and Retrieval (IEEE MIPR 2018), Miami, 10–12 April 2018 (accepted for publication)
Google Scholar
E. Lawley, Building a health data hub. March 29, 2010. Nashville Post (online version, downloaded July 24, 2010)
Google Scholar
Y.W. Lee, L.L. Pipino, J.D. Funk, R.Y. Wang, Journey to Data Quality (MIT Press, Cambridge, MA, 2006)
Google Scholar
D. Mahata, J.R. Talburt, A framework for collecting and managing entity identity information from social media, in 19th MIT International Conference on Information Quality, Xi’an, 1–3 August, 2014, pp. 216–233
Google Scholar
C.D. Manning, P. Raghavan, H. Schütze, An Introduction to Information Retrieval (Cambridge University Press, Cambridge, England, 2009)
MATH Google Scholar
E. Nelson, J.R. Talburt, Improving the quality of law enforcement information through entity resolution, in 2008 Conference on Applied Research in Information Technology, ed. by C. Hu, D. Berleant (University of Central Arkansas, Conway, 2008), pp. 113–118. http://research.acxiom.com/publications.html
E. Nelson, J.R. Talburt, Entity resolution for longitudinal studies in education using OYSTER, in Proceedings: 2011 Information and Knowledge Engineering Conference (IKE 2011), Las Vegas, 18–20 July 2011, pp. 286–290
Google Scholar
M. Penning, J.R. Talburt, Information quality assessment and improvement of student information in the university environment, in The 2012 International Conference on Information and Knowledge Engineering (IKE’12), Las Vegas, 16–29 July 2012, pp. 351–357
Google Scholar
M. Penning, Inferred error rates for entity resolution, Doctoral Dissertation, University of Arkansas at Little Rock, Published by Proquest, 2016
Google Scholar
D. Pullen, A system for stratified sampling of entity resolution results to assess and improve accuracy with minimal clerical review, Doctoral dissertation, University of Arkansas at Little Rock, Published by Proquest, 2017
Google Scholar
W.M. Rand, Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Article Google Scholar
J.R. Talburt, R. Hashemi, A formal framework for defining entity-based, data source integration, in 2008 International Conference on Information and Knowledge Engineering, ed. by H. Arabnia, R. Hashemi (CSREA Press, Las Vegas, 2008), pp. 394–398
Google Scholar
J.R. Talburt, Y. Zhou, Entity Information Life Cycle for Big Data: Master Data Management and Information Integrations (Morgan Kaufmann, Waltham, 2015)
Google Scholar
J.R. Talburt, Entity Resolution and Information Quality (Morgan Kaufmann, San Francisco, 2011)
Google Scholar
E.M. Voorhees, W. Hersh, Overview of the TREC 2012 medical records track, in The Twenty-First Text Retrieval Conference (TREC 2012) Proceedings, National Institute of Standards and Technology, 2012
Google Scholar
P. Wang, D. Pullen, J.R. Talburt, N. Wu, Iterative approach to weight calculation in probabilistic entity resolution, in 2014 International Conference on Information Quality, Xi’an, 1–3 August 2014
Google Scholar
R.Y. Wang, A product perspective on total data quality management. Commun. ACM 41(2), 58–65 (1998)
Article Google Scholar
W.E. Winkler, Automatically Estimating Record Linkage False Match Rates (Census Bureau, Statistical Research Division, Washington, DC, 2007)
Google Scholar
E. Yilmaz, J.A. Aslam, Estimating average precision with incomplete and imperfect judgments, in Proceedings of the Fifteenth ACM International Conference on Information and Knowledge Management, ACM Press, New York, NY, 2006
Google Scholar
E. Yilmaz, E. Kanoulas, J.A. Aslam, A simple and efficient sampling method for estimating AP and NDCG, in Proceedings of the Thirty-First Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore, 2008
Google Scholar
Y. Zhou, A. Kooshesh, J. Talburt, Optimizing the accuracy of entity-based data integration of multiple data sources using genetic programming methods. Int. J. Bus. Intell. Res. 3(1), 72–82 (2012)
Article Google Scholar
Y. Zhou, J. Talburt, Y. Su, L. Yin, OYSTER: a tool for entity resolution in health information exchange, in 5th International Conference on the Cooperation and Promotion of Information Resources in Science and Technology (COINFO’10), Beijing, 27–29 November 2010, pp. 356–362
Google Scholar
Y. Zhou, J.R. Talburt, Entity identity information management, in International Conference on Information Quality 2011, Adelaide, 18–20 November 2011, electronic proceedings at: http://iciq2011.unisa.edu.au/doc/ICIQ2011_Proceeding_Nov.zip

Download references

Author information

Authors and Affiliations

University of Arkansas at Little Rock, Little Rock, AR, USA
John R. Talburt & Daniel Pullen
University of Arkansas for Medical Sciences, Little Rock, AR, USA
Melody Penning

Authors

John R. Talburt
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Pullen
View author publications
You can also search for this author in PubMed Google Scholar
Melody Penning
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John R. Talburt .

Editor information

Editors and Affiliations

IMT-Atlantique, Brest, France
Éloi Bossé
The State University of New York at Buffalo, Buffalo, NY, USA
Galina L. Rogova

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Talburt, J.R., Pullen, D., Penning, M. (2019). Evaluating and Improving Data Fusion Accuracy. In: Bossé, É., Rogova, G. (eds) Information Quality in Information Fusion and Decision Making. Information Fusion and Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-03643-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-03643-0_14
Published: 02 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03642-3
Online ISBN: 978-3-030-03643-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics