Skip to main content

Advertisement

Log in

An improved machine learning application for the integration of record systems for missing US service members

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

The Defense POW/MIA Accounting Agency (DPAA) continues to diligently locate, recover, and identify over 81,000 missing US service members from past conflicts. To fulfill this important mission, massive amounts of information must be integrated from historical records, genealogy records, anthropological data, archaeological data, odontology data, and DNA. Previously a machine learning record-linkage application was developed to integrate DNA Family Reference Samples (FRS) data systems with the DPAA’s master data. This application was shown to link large record systems with a high level of accuracy and precision. Here this work is extended to further optimize the blocking strategy used during record linkage as well as the record match alpha-level threshold for the Bayesian Classifier. Optimization of the blocking strategy was able to improve application run-time per record by 20%. After record-match alpha-level optimization, the application was found to link 89.6% of the record out-group to DPAA master data at an accuracy of 99.6%. The improved run-time efficiency and match rate of the record-linkage pipeline will greatly benefit not only the DPAA’s FRS import process but also the linking of other big data sources supporting the DPAA mission.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Armed Forces Medical Examiner System: DNA Identification Laboratory (2020). https://health.mil/Military-Health-Topics/Combat-Support/Armed-Forces-Medical-Examiner-System

  2. Christen, P.: Automatic record linkage using seeded nearest neighbour and support vector machine classification. In: 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 151–159. ACM, Las Vegas, Nevada (2008). https://doi.org/10.1145/1401890.1401913

  3. Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24(9), 1537–1555 (2011). https://doi.org/10.1109/TKDE.2011.127

    Article  Google Scholar 

  4. Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964). https://doi.org/10.1145/363958.363994

    Article  Google Scholar 

  5. de Bruin, J.: Python Record Linkage Toolkit. Tech. rep. (2019)

  6. Defense POW MIA Accounting Agency: Defense POW/MIA Accounting Agency (2020). https://www.dpaa.mil/

  7. Defense POW/MIA Accounting Agency: DNA FAQs (2020). https://www.dpaa.mil/Resources/Fact-Sheets/Article-View/Article/590581/armed-forces-medical-examiner-system-dna-identification-laboratory/

  8. Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29(2–3), 103–130 (1997). https://doi.org/10.1023/A:1007413511361

    Article  MATH  Google Scholar 

  9. Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969). https://doi.org/10.2307/2286061

    Article  MATH  Google Scholar 

  10. Herzog, T.H., Scheuren, F., Winkler, W.E.: Record linkage. WIREs Comput. Stat. 2(5), 535–543 (2010). https://doi.org/10.1002/wics.108

    Article  MATH  Google Scholar 

  11. Jones, E., Oliphant, T., Peterson, P.: SciPy: Open source scientific tools for Python. (2001)

  12. Kavlick, M.F., Lawrence, H.S., Merritt, R.T., Fisher, C., Isenberg, A., Robertson, J.M., Bruce, B.: Quantification of human mitochondrial DNA using synthesized DNA standards. J. Forensic Sci. 56(6), 1457–1463 (2011). https://doi.org/10.1111/j.1556-4029.2011.01871.x

    Article  Google Scholar 

  13. Knuth, D.E.: The Art of Computer Programming: Sorting and Searching, vol. 2, 3rd edn. Addison-Wesley, Reading (1998)

    MATH  Google Scholar 

  14. Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the 20th International Conference on Machine Learning, pp. 496–503. Acapulco, MX (2003)

  15. Roewer, L.: DNA fingerprinting in forensics: past, present, future. Investig. Genet. 4(1), 22 (2013). https://doi.org/10.1186/2041-2223-4-22

    Article  Google Scholar 

  16. Warnke-Sommer, J.D., Damann, F.E.: Machine learning for efficient integration of record systems for missing US service members. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 561–569 (2019). https://doi.org/10.1109/DSAA.2019.00071

  17. Wilson, D.R.: Beyond probabilistic record linkage: using neural networks and complex features to improve genealogical record linkage. In: International Joint Conference on Neural Networks, pp. 9–14. IEEE (2011). https://doi.org/10.1109/IJCNN.2011.6033192

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julia D. Warnke-Sommer.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1601 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Warnke-Sommer, J.D., Damann, F.E. An improved machine learning application for the integration of record systems for missing US service members. Int J Data Sci Anal 11, 57–68 (2021). https://doi.org/10.1007/s41060-020-00236-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-020-00236-y

Keywords

Navigation