Abstract
Enriching ontologies can measurably enhance research in digital curation. We support this claim by using an enriched ontology to address a well known, challenging problem: record linkage of historical records for intergenerational family reconstitution. An enriched ontology enables extraction of birth, death, and marriage records via linguistic grounding, curation of record-comprising information with pragmatic constraints and cultural normatives, and record linkage by evidential reasoning. The result is an automatic and highly accurate reconstruction of family trees. Empirical evidence shows that conceptual modeling theory can be applied to important real-world problems and yield excellent results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In [8], 880 personas of 9,279 were determined to have matches. From this training set, weights were estimated (e.g. \(4.6_{0908}\) for Birth Year, \(4.8_{9474}\) for Father’s Surname, \(0.0_{0176}\) for Birth Town). Lawson et al. argue that these weights should be universal, depending only on the chosen set of attributes. The technique for computing the weights is described by White [14].
References
Abramitzky, R., Mill, R., Perez, S.: Linking individuals across historical sources: a fully automated approach (2018). Working Paper No. 1031
Bailey, M.J., Cole, C., Henderson, M., Massey, C.: How well do automated linking methods perform? Lessons from us historical data. J. Econ. Lit. 58(4), 997–1044 (2020). https://doi.org/10.1257/jel.20191526. https://www.aeaweb.org/articles?id=10.1257/jel.20191526
Embley, D., Liddle, S., Park, J.: Increasing the quality of extracted information by reading between the lines. In: Comyn-Wattiau, I., du Mouza, C., Prat, N. (eds.) Ingénierie et management des systèmes d’information–Mélanges en l’honneur de Jacky Akoka. Éditions Cépaduès, Toulouse (2016)
Embley, D., Nagy, G.: Green interaction for extracting family information from OCR’d books. In: Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, pp. 127–132. IEEE Computer Society, Vienna, March 2018
Feigenbaum, J.: A machine learning approach to census record linking (2016). http://scholar.harvard.edu/files/jfeigenbaum/files/feigenbaumcensuslink
Friedrichs, E., Pech, A.: Familienbuch des Kirchspiels Flögeln: bestehend aus den Dörfern Flögeln und Fickmühlen; vom Beginn der Kirchenbücher 1700 bis 1900. Deutsche Ortssippenbücher. Reihe A, E. Friedrichs, Bremerhaven (2000)
Grant, F.: Index to The Register of Marriages and Baptisms in the Parish of Kilbarchan, 1649–1772. J. Skinner & Company LTD., Edinburgh (1912)
Lawson, J., White, D., Price, B., Yamagata, R.: Probabilistic record linkage for genealogical research. Brigham Young Univ. Stud. 41(2), 161–174 (2002)
Miller Funeral Home Records, 1917–1950, Greenville, Ohio. Darke County Ohio Genealogical Society, Greenville, Ohio (1990)
Nagy, G.: Green information extraction from family books. SN Comput. Sci. 1(23), 1–23 (2019). https://doi.org/10.1007/s42979-019-0024-x
Newcombe, H., Kennedy, J., Axford, S., James, A.: Automatic linkage of vital records. Science 130, 954–959 (1959)
Packer, T.L., Embley, D.W.: Cost effective ontology population with data from lists in OCRed historical documents. In: Frinken, V., Barrett, B., Manmatha, R., Märgner, V. (eds.) HIP2013 Proceedings, pp. 44–52. ACM (2013)
Vanderpoel, G.: The Ely Ancestry: Lineage of RICHARD ELY of Plymouth, England. The Calumet Press, New York (1902)
White, D.: A review of the statistics of record linkage for genealogical research. In: Record Linkage Techniques–1997: Proceedings of an International Workshop and Exposition, pp. 362–373. National Academy Press, Washington DC, USA (1999)
Wilkinson, M.D., Dumontier, M., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3, 1–9 (2016)
Woodfield, S.N., Seeger, S., Litster, S., Liddle, S.W., Grace, B., Embley, D.W.: Ontological deep data cleaning. In: Trujillo, J.C., et al. (eds.) ER 2018. LNCS, vol. 11157, pp. 100–108. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00847-5_9
Acknowledgements
We thank Emeritus Professor George Nagy, Rensselaer Polytechnic Institute, for the development of GreenQQ and gratefully acknowledge the work of Gary James (Jim) Norris, who created a complete extraction ground truth for Flögeln and developed GreenQQ templates to attain near 100% extraction accuracy.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Embley, D.W., Liddle, S.W., Lonsdale, D.W., Woodfield, S.N. (2021). Inter-Generational Family Reconstitution with Enriched Ontologies. In: Reinhartz-Berger, I., Sadiq, S. (eds) Advances in Conceptual Modeling. ER 2021. Lecture Notes in Computer Science(), vol 13012. Springer, Cham. https://doi.org/10.1007/978-3-030-88358-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-88358-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88357-7
Online ISBN: 978-3-030-88358-4
eBook Packages: Computer ScienceComputer Science (R0)