Skip to main content

Reconciling and Using Historical Person Registers as Linked Open Data in the AcademySampo Portal and Data Service

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12922))

Abstract

This paper presents a method for extracting and reassembling a genealogical network automatically from a biographical register of historical people. The method is applied to a dataset of short textual biographies about all 28 000 Finnish and Swedish academic people educated in 1640–1899 in Finland. The aim is to connect and disambiguate the relatives mentioned in the biographies in order to build a continuous, genealogical network, which can be used in Digital Humanities for data and network analysis of historical academic people and their lives. An artificial neural network approach is presented for solving a supervised learning task to disambiguate relatives mentioned in the register descriptions using basic biographical information enhanced with an ontology of vocations and additional occasionally sparse genealogical information. Evaluation results of the record linkage are promising and provide novel insights into the problem of historical people register reconciliation. The outcome of the work has been used in practise as part of the in-use AcademySampo portal and linked open data service, a new member in the Sampo series of cultural heritage applications for Digital Humanities.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The portal and its linked open data service, including a SPARQL endpoint, was released on February 5, 2021. More information about AcademySampo can be found on the project homepage: https://seco.cs.aalto.fi/projects/yo-matrikkelit/.

  2. 2.

    Cf. the project homepage https://iisg.amsterdam/en/hsn/projects/links and research papers at https://iisg.amsterdam/en/hsn/projects/links/links-publications.

  3. 3.

    http://www.sixdegreesoffrancisbacon.com.

  4. 4.

    https://ylioppilasmatrikkeli.helsinki.fi.

  5. 5.

    https://ylioppilasmatrikkeli.helsinki.fi/1853-1899.

  6. 6.

    https://en.wikipedia.org/wiki/University_of_Helsinki.

  7. 7.

    https://en.wikipedia.org/wiki/Royal_Academy_of_Turku.

  8. 8.

    This statistical result was obtained after we used the reconciled data in AcademySampo for data analysis.

  9. 9.

    https://ylioppilasmatrikkeli.helsinki.fi/henkilo.php?id=14689.

  10. 10.

    https://keras.io/api/layers/reshaping_layers/flatten/.

  11. 11.

    https://keras.io/api/layers/core_layers/dense/.

  12. 12.

    https://keras.io/api/layers/merging_layers/concatenate/.

  13. 13.

    https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly.

  14. 14.

    https://api.triplydb.com/s/IE4w29n0T.

  15. 15.

    https://www.ldf.fi/dataset/yoma.

  16. 16.

    https://jena.apache.org/documentation/fuseki2/.

  17. 17.

    https://varnish-cache.org.

  18. 18.

    https://www.docker.com.

  19. 19.

    https://yasgui.triply.cc.

  20. 20.

    https://colab.research.google.com/notebooks/intro.ipynb.

  21. 21.

    https://jupyter.org.

References

  1. Keras Documentation, Sequence. https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence. Accessed 10 Dec 2020

  2. Antonie, L., Gadgil, H., Grewal, G., Inwood, K.: Historical data integration, a study of WWI Canadian soldiers. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 186–193. IEEE (2016)

    Google Scholar 

  3. Barlaug, N., Gulla, J.A.: Neural networks for entity matching. arXiv preprint arXiv:2010.11075 (2020)

  4. ter Braake, S., Anstke Fokkens, R.S., Declerck, T., Wandl-Vogt, E. (eds.): BD2015, Biographical Data in a Digital World 2015. CEUR Workshop Proceedings, vol. 1399 (2015). http://ceur-ws.org/Vol-1272/

  5. Brownlee, J.: Machine Learning Mastery: How to Develop a Cost-Sensitive Neural Network for Imbalanced Classification. https://machinelearningmastery.com/cost-sensitive-neural-network-for-imbalanced-classification/. Accessed 10 Dec 2020

  6. Chollet, F.: Keras, The Functional API. https://keras.io/guides/functional_api/. Accessed 10 Dec 2020

  7. Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31164-2

    Book  Google Scholar 

  8. Cunningham, A.: After “it’s over over there’’: using record linkage to enable the reconstruction of World War I veterans’ demography from soldiers’ experiences to civilian populations. Historical Methods: J. Quant. Interdisc. Hist. 51, 1–27 (2018)

    Article  Google Scholar 

  9. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  10. Fokkens, A., et al.: BiographyNet: extracting relations between people and events. In: Europa baut auf Biographien, pp. 193–224. New Academic Press, Wien (2017)

    Google Scholar 

  11. Fokkens, A., ter Braake, S., Sluijter, R., Arthur, P., Wandl-Vogt, E. (eds.): BD2017 Biographical Data in a Digital World 2015. CEUR Workshop Proceedings, vol. 1399 (2017). http://ceur-ws.org/Vol-2119/

  12. Gangemi, A., Presutti, V., Recupero, D.R., Nuzzolese, A.G., Draicchio, F., Mongiovì, M.: Semantic web machine reading with FRED. Semantic Web 8, 873–893 (2017)

    Article  Google Scholar 

  13. Gu, L., Baxter, R., Vickers, D., Rainsford, C.: Record linkage: current practice and future directions. CSIRO Mathematical and Information Sciences (2003). cMIS Technical Report No. 03/83

    Google Scholar 

  14. Heino, E., et al.: Named entity linking in a complex domain: case second world war history. In: Gracia, J., Bond, F., McCrae, J.P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) LDK 2017. LNCS (LNAI), vol. 10318, pp. 120–133. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59888-8_10

    Chapter  Google Scholar 

  15. Hyvönen, E., et al.: BiographySampo – publishing and enriching biographies on the semantic web for digital humanities research. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 574–589. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_37

    Chapter  Google Scholar 

  16. Hyvönen, E., Tuominen, J., Alonen, M., Mäkelä, E.: Linked data Finland: a 7-star model and platform for publishing and re-using linked datasets. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8798, pp. 226–230. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11955-7_24

    Chapter  Google Scholar 

  17. Hyvönen, E., Leskinen, P., Rantala, H., Ikkala, E., Tuominen, J.: Akatemiasampo-portaali ja -datapalvelu henkilöiden ja henkilöryhmien historialliseen tutkimukseen (academysampo portal and data service for biographical and prosopographical research). Informaatiotutkimus (2021, in press). https://seco.cs.aalto.fi/publications/2021/hyvonen-et-al-akatemiasampo-2021.pdf

  18. Ikkala, E., Hyvönen, E., Rantala, H., Koho, M.: Sampo-UI: A full stack JavaScript framework for developing semantic portal user interfaces. Semantic Web (2021, accepted). http://www.semantic-web-journal.net/

  19. Ivie, S., Pixton, B., Giraud-Carrier, C.: Metric-based data mining model for genealogical record linkage. In: 2007 IEEE International Conference on Information Reuse and Integration, pp. 538–543. IEEE (2007)

    Google Scholar 

  20. Koho, M., Gasbarra, L., Tuominen, J., Rantala, H., Jokipii, I., Hyvönen, E.: AMMO ontology of Finnish historical occupations. In: Proceedings of the First International Workshop on Open Data and Ontologies for Cultural Heritage (ODOCH 2019), vol. 2375, pp. 91–96. CEUR Workshop Proceedings, June 2019. http://ceur-ws.org/Vol-2375/

  21. Koho, M., Leskinen, P., Hyvönen, E.: Integrating historical person registers as linked open data in the WarSampo knowledge graph. In: Blomqvist, E., et al. (eds.) SEMANTICS 2020. LNCS, vol. 12378, pp. 118–126. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59833-4_8

    Chapter  Google Scholar 

  22. Langmead, A., Otis, J., Warren, C., Weingart, S., Zilinski, L.: Towards interoperable network ontologies for the digital humanities. Int. J. Humanit. Arts Comput. 10(1), 22–35 (2016)

    Article  Google Scholar 

  23. Larson, R.: Bringing lives to light: biography in context. Final Project Report, University of Berkeley (2010). http://metadata.berkeley.edu/Biography_Final_Report.pdf

  24. Leskinen, P., Hyvönen, E.: Extracting genealogical networks of linked data from biographical texts. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11762, pp. 121–125. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32327-1_24

    Chapter  Google Scholar 

  25. Leskinen, P., Hyvönen, E.: Linked open data service about historical Finnish academic people in 1640–1899. In: DHN 2020 Digital Humanities in the Nordic Countries. Proceedings of the Digital Humanities in the Nordic Countries 5th Conference, vol. 2612, pp. 284–292. CEUR Workshop Proceedings, October 2020. http://ceur-ws.org/Vol-2612/short14.pdf

  26. Malmi, E., Gionis, A., Solin, A.: Computationally inferred genealogical networks uncover long-term trends in assortative mating. arXiv (2018). arXiv:1802.06055 [cs.SI]

  27. Pixton, B., Giraud-Carrier, C.: Using structured neural networks for record linkage. In: Proceedings of the Sixth Annual Workshop on Technology for Family History and Genealogical Research (2006)

    Google Scholar 

  28. Rietveld, L., Hoekstra, R.: The YASGUI family of SPARQL clients. Semantic Web 8(3), 373–383 (2017). https://doi.org/10.3233/SW-150197

    Article  Google Scholar 

  29. Rospocher, M., et al.: Building event-centric knowledge graphs from news. Web Semantics 37, 132–151 (2016)

    Article  Google Scholar 

  30. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  31. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to data mining, 1st edn (2005)

    Google Scholar 

  32. Thorvaldsen, G., Andersen, T., Sommerseth, H.L.: Record linkage in the historical population register for Norway. In: Bloothooft, G., Christen, P., Mandemakers, K., Schraagen, M. (eds.) Population Reconstruction, pp. 155–171. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19884-2_8

    Chapter  Google Scholar 

  33. Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., Kennedy, P.J.: Training deep neural networks on imbalanced data sets. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4368–4374. IEEE (2016)

    Google Scholar 

  34. Warren, C., Shore, D., Otis, J., Wang, L., Finegold, M., Shalizi, C.: Six degrees of Francis Bacon: a statistical method for reconstructing large historical social networks. Digit. Humanit. Q. 10(3) (2016)

    Google Scholar 

  35. Winkler, W.E.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage (1990)

    Google Scholar 

  36. Winkler, W.E.: Overview of record linkage and current research directions. Technical report, U.S. Census Bureau (2006)

    Google Scholar 

Download references

Acknowledgements

Thanks to Yrjö Kotivuori and Veli-Matti Autio for their seminal work in creating the original databases used in our work, and for making the data openly available. Discussions with Heikki Rantala, Esko Ikkala, Mikko Koho, and Jouni Tuominen are acknowledged. This work is part of the EU project InTaVia: In/Tangible European Heritage (https://intavia.eu/), and is related to the EU COST action Nexus Linguarum (https://nexuslinguarum.eu/the-action) on linguistic data science. CSC – IT Center for Science provided computational resources for the work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petri Leskinen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Leskinen, P., Hyvönen, E. (2021). Reconciling and Using Historical Person Registers as Linked Open Data in the AcademySampo Portal and Data Service. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. https://doi.org/10.1007/978-3-030-88361-4_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88361-4_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88360-7

  • Online ISBN: 978-3-030-88361-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics