Abstract
The Defense POW/MIA Accounting Agency (DPAA) continues to diligently locate, recover, and identify over 81,000 missing US service members from past conflicts. To fulfill this important mission, massive amounts of information must be integrated from historical records, genealogy records, anthropological data, archaeological data, odontology data, and DNA. Previously a machine learning record-linkage application was developed to integrate DNA Family Reference Samples (FRS) data systems with the DPAA’s master data. This application was shown to link large record systems with a high level of accuracy and precision. Here this work is extended to further optimize the blocking strategy used during record linkage as well as the record match alpha-level threshold for the Bayesian Classifier. Optimization of the blocking strategy was able to improve application run-time per record by 20%. After record-match alpha-level optimization, the application was found to link 89.6% of the record out-group to DPAA master data at an accuracy of 99.6%. The improved run-time efficiency and match rate of the record-linkage pipeline will greatly benefit not only the DPAA’s FRS import process but also the linking of other big data sources supporting the DPAA mission.
Similar content being viewed by others
References
Armed Forces Medical Examiner System: DNA Identification Laboratory (2020). https://health.mil/Military-Health-Topics/Combat-Support/Armed-Forces-Medical-Examiner-System
Christen, P.: Automatic record linkage using seeded nearest neighbour and support vector machine classification. In: 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 151–159. ACM, Las Vegas, Nevada (2008). https://doi.org/10.1145/1401890.1401913
Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24(9), 1537–1555 (2011). https://doi.org/10.1109/TKDE.2011.127
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964). https://doi.org/10.1145/363958.363994
de Bruin, J.: Python Record Linkage Toolkit. Tech. rep. (2019)
Defense POW MIA Accounting Agency: Defense POW/MIA Accounting Agency (2020). https://www.dpaa.mil/
Defense POW/MIA Accounting Agency: DNA FAQs (2020). https://www.dpaa.mil/Resources/Fact-Sheets/Article-View/Article/590581/armed-forces-medical-examiner-system-dna-identification-laboratory/
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29(2–3), 103–130 (1997). https://doi.org/10.1023/A:1007413511361
Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969). https://doi.org/10.2307/2286061
Herzog, T.H., Scheuren, F., Winkler, W.E.: Record linkage. WIREs Comput. Stat. 2(5), 535–543 (2010). https://doi.org/10.1002/wics.108
Jones, E., Oliphant, T., Peterson, P.: SciPy: Open source scientific tools for Python. (2001)
Kavlick, M.F., Lawrence, H.S., Merritt, R.T., Fisher, C., Isenberg, A., Robertson, J.M., Bruce, B.: Quantification of human mitochondrial DNA using synthesized DNA standards. J. Forensic Sci. 56(6), 1457–1463 (2011). https://doi.org/10.1111/j.1556-4029.2011.01871.x
Knuth, D.E.: The Art of Computer Programming: Sorting and Searching, vol. 2, 3rd edn. Addison-Wesley, Reading (1998)
Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the 20th International Conference on Machine Learning, pp. 496–503. Acapulco, MX (2003)
Roewer, L.: DNA fingerprinting in forensics: past, present, future. Investig. Genet. 4(1), 22 (2013). https://doi.org/10.1186/2041-2223-4-22
Warnke-Sommer, J.D., Damann, F.E.: Machine learning for efficient integration of record systems for missing US service members. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 561–569 (2019). https://doi.org/10.1109/DSAA.2019.00071
Wilson, D.R.: Beyond probabilistic record linkage: using neural networks and complex features to improve genealogical record linkage. In: International Joint Conference on Neural Networks, pp. 9–14. IEEE (2011). https://doi.org/10.1109/IJCNN.2011.6033192
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Warnke-Sommer, J.D., Damann, F.E. An improved machine learning application for the integration of record systems for missing US service members. Int J Data Sci Anal 11, 57–68 (2021). https://doi.org/10.1007/s41060-020-00236-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-020-00236-y