Skip to main content
Log in

A linked open data architecture for the historical archives of the Getulio Vargas Foundation

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

This paper presents an architecture for historical archives maintenance based on Open Linked Data technologies and open source distributed development model and tools. The proposed architecture is being implemented for the archives of the Centro de Pesquisa e Documentação de História Contemporânea do Brasil (Center for Research and Documentation of Brazilian Contemporary History) of the Fundação Getulio Vargas (Getulio Vargas Foundation). We discuss the benefits of this initiative and suggest ways of implementing it, as well as describing the preliminary milestones already achieved. We also present some of the possibilities for extending the accessibility and usefulness of the data archives information using semantic web technologies, natural language processing, image analysis tools, and audio–textual alignment, both in progress and planned.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.biografischportaal.nl/en.

  2. This is an acronym for specifying information systems that usually implement the search, create, read, update and delete operations, http://goo.gl/33piYJ.

  3. http://en.wikipedia.org/wiki/Learning_object.

  4. http://en.wikipedia.org/wiki/Massive_open_online_course.

  5. http://www.dspace.org/.

  6. http://www.fedora-commons.org.

  7. https://en.wikipedia.org/wiki/Revision_control.

  8. In text files, all DHBB entries uses less than 50 % of the current space used by the entries saved as HTML in the database.

  9. http://git-scm.com.

  10. http://lucene.apache.org/solr/.

  11. In this application we used Jekyll, http://jekyllrb.com, but any other static site generator could be used.

  12. https://github.com/arademaker/openWordnet-PT.

  13. http://logics.emap.fgv.br/wn/.

  14. http://picasa.google.com.

  15. http://en.wikipedia.org/wiki/Regular_expression.

  16. http://linkeddata.org.

References

  1. Abreu, A.A., Lattman-Weltman, F., de Paula, C.J.: Dicionário Histórico–Biográfico Brasileiro pós-1930, 3 edn. CPDOC/FGV, Rio de Janeiro (2010)

  2. Ben-Kiki, O., Evans, C., dot Net, I.: Yaml: Yaml ain’t markup language. http://www.yaml.org/spec/1.2/spec.html

  3. Bergman, M.K.: White paper: the deep web: surfacing hidden value. J. Electron. Publ. 7(1) (2001). http://quod.lib.umich.edu/j/jep/3336451.0007.104?view=text;rgn=main

  4. Berners-Lee, T.: Relational databases on the semantic web. Tech. rep., W3C (1998). http://www.w3.org/DesignIssues/RDB-RDF.html

  5. Bizer, C., Cyganiak, R.: D2R server-publishing relational databases on the semantic web. In: 5th International Semantic Web Conference, p. 26 (2006). http://d2rq.org

  6. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia—a crystallization point for the web of data. Web Semant. 7(3), 154–165 (2009). doi:10.1016/j.websem.2009.07.002

    Article  Google Scholar 

  7. Bond, F., Paik, K.: A survey of wordnets and their licenses. In: Proceedings of the 6th Global WordNet Conference (GWC 2012), pp. 64–71. Matsue (2012). http://bit.ly/1aN0Xxd

  8. Brickley, D., Miller, L.: FOAF vocabulary specification (2010). http://xmlns.com/foaf/spec/

  9. Cafezeiro, I., Haeusler, E.H., Rademaker, A.: Ontology and context. In: IEEE International Conference on Pervasive Computing and Communications. IEEE Computer Society, Los Alamitos (2008). doi:10.1109/PERCOM.2008.21

  10. Clark, K.G., Feigenbaum, L., Torres, E.: SPARQL protocol for RDF. Tech. rep., W3C (2008)

  11. Coelho, L.M.R., Rademaker, A., de Paiva, V., de Melo, G.: Embedding NomLex-BR nominalizations into OpenWordnet-PT In: Orav, H., Fellbaum, C., Vossen, P. (eds.) Proceedings of the 7th global WordNet conference, Tartu, Estonia, pp. 378–382. http://globalwordnet.org/global-wordnetconferences-2/ (2014)

  12. Crofts, N., Doerr, M., Gill, T., Stead, S., Stiff, M.: Definition of the CIDOC conceptual reference model. Tech. Rep. 5.0.4, CIDOC CRM Special Interest Group (SIG) (2011). http://www.cidoc-crm.org/index.html

  13. da Cultura, M.: Registro aberto da cultura (r.a.c): manual do usuário (2013). http://sniic.cultura.gov.br

  14. Cyganiak, R., Bizer, C., Garbers, J., Maresch, O., Becker, C.: The D2RQ mapping language. http://d2rq.org/d2rq-language

  15. Davis, I., Galbraith, D.: BIO: a vocabulary for biographical information (2011). http://vocab.org/bio/0.1/.html

  16. Deborah L., McGuinness, F.v.H. (ed.): OWL 2 Web Ontology Language Document Overview, 2 edn. W3C Recommendation. World Wide Web Consortium (2012)

  17. de Paiva, V., Rademaker, A., de Melo, G.: Openwordnet-pt: an open brazilian wordnet for reasoning. In: Proceedings of the 24th International Conference on Computational Linguistics (2012). http://hdl.handle.net/10438/10274

  18. de Paiva, V., Oliveira, D.A.B., Higuchi, S., Rademaker, A., de Melo, G.: Exploratory information extraction from a historical dictionary. In: Proceedings of IEEE 10th International Conference on e-Science (e-Science), Sao Paulo, 20-24 Oct 2014, vol. 2, pp. 11–18 (2014)

  19. Federal, G.: Governo federal dados abertos (2013). http://dados.gov.br/

  20. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  21. Fokkens, A., ter Braake, S., Ockeloen, N., Vossen, P., Legêne, S., Schreiber, G.: Biographynet: methodological issues when nlp supports historical research. In: Proceedings of the 9th Edition of the Language Resources and Evaluation Conference (LREC). Reykjavik, Iceland (2014)

  22. Friesen, N., Hill, H.J., Wegener, D., Doerr, M., Stalmann, K.: Semantic-based retrieval of cultural heritage multimedia objects. Int. J. Semantic Comput. 06(03), 315–327 (2012). doi:10.1142/S1793351X12400107. http://www.worldscientific.com/doi/abs/10.1142/S1793351X12400107

  23. Gil, Y., Miles, S.: PROV model primer. Tech. rep., W3C (2013). http://www.w3.org/TR/prov-primer/

  24. Gruber, J.: Markdown language. http://daringfireball.net/projects/markdown/

  25. Haslhofer, B., Isaac, A.: data.europeana.eu—the europeana linked open data pilot. In: DCMI International Conference on Dublin Core and Metadata Applications. The Hague, The Netherlands (2011). http://eprints.cs.univie.ac.at/2919/

  26. Hyvönen, E., Mäkelä, E., Salminen, M., Valo, A., Viljanen, K., Saarela, S., Junnila, M., Kettula, S.: Finnish museums on the semantic web. J. Web Semant. 3, 25 (2005)

    Article  Google Scholar 

  27. Initiative, D.C.: Dublin core metadata element set (2012). http://dublincore.org/documents/dces/

  28. Initiative, O.D.: Open data initiative (2013). http://www.opendatainitiative.org

  29. Isaac, A., Summers, E.: SKOS simple knowledge organization system prime. Tech. Rep., W3C (2009). http://www.w3.org/TR/skos-primer/

  30. Lagoze, C., de Sompel, H.V., Nelson, M., Warner, S.: The open archives initiative protocol for metadata harvesting (2008). http://www.openarchives.org/OAI/openarchivesprotocol.html

  31. LexML: Rede de informação informativa e jurídica (2013). http://www.lexml.gov.br

  32. Library of Congress: The library of congress’ photostream on flickr. http://www.flickr.com/photos/library_of_congress/

  33. Macleod, C., Grishman, R., Meyers, A., Barret, L., Reeves, R.: Nomlex: A lexicon of nominalizations. In: Proceedings of Euralex 1998, pp. 187–193. Liege, Belgium (1998)

  34. Manola, F., Miller, E. (eds.): RDF Primer. W3C Recommendation. World Wide Web Consortium (2004). http://www.w3.org/TR/rdf-primer/

  35. Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  36. Neto, N., Patrick, C., Klautau, A., Trancoso, I.: Free tools and resources for Brazilian Portuguese speech recognition. J. Braz. Comput. Soc. 17, 53–68 (2011)

    Article  Google Scholar 

  37. Niles, I., Pease, A.: Towards a standard upper ontology. In: Proceedings of the International Conference on Formal Ontology in Information Systems, vol. 2001, pp. 2–9. ACM, New York (2001)

  38. Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pp. 23–25. European Language Resources Association (ELRA), Istanbul, Turkey (2012)

  39. Purday, J.: Think culture: Europeana.eu from concept to construction. Electron. Libr. 27, 919–937 (2009)

    Article  Google Scholar 

  40. Rademaker, A., Higuchi, S., Oliveira, D.A.B.: A linked open data architecture for contemporary historical archives. In: Predoiu, L., Mitschick, A., Nurnberger, A., Risse, T., Ross, S. (eds.) Proceedings of 3rd Edition of the Semantic Digital Archives Workshop. Valetta, Malta (2013). Workshop website at http://mt.inf.tu-dresden.de/sda2013/. Proceedings at http://ceur-ws.org/Vol-1091/

  41. Raggett, D., Hors, A.L., Jacobs, I.: Html 4.01 specification. Tech. Rep. REC-html401-19991224, W3C (1999). http://www.w3.org/TR/html401/

  42. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: 16th International World Wide Web Conference (WWW 2007). ACM Press, New York (2007)

  43. Szekely, P., Knoblock, C., Yang, F., Zhu, X., Fink, E., Allen, R., Goodlander, G.: Connecting the smithsonian american art museum to the linked data cloud. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) The Semantic Web: Semantics and Big Data, Lecture Notes in Computer Science, vol. 7882, pp. 593–607. Springer, Berlin (2013). doi:10.1007/978-3-642-38288-8_40

  44. Vasconcelos, C.N., Sa, A.M., Carvalho, P.C., Sa, M.I.: Structuring and embedding image captions: the v.i.f. multi-modal system. In: VAST: International Symposium on Virtual Reality. Archaeology and Intelligent Cultural Heritage, pp. 25–32. Eurographics Association, Brighton (2012)

  45. Vatant, B., Wick, M.: Geonames Ontology (2012). http://www.geonames.org/ontology/documentation.html

  46. Wick, M., Vatant, B.: Geonames Ontology (2011). http://www.geonames.org/ontology

  47. Young, S.J., Evermann, G., Gales, M., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.4. Cambridge University Engineering Department, Cambridge (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandre Rademaker.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rademaker, A., Oliveira, D.A.B., de Paiva, V. et al. A linked open data architecture for the historical archives of the Getulio Vargas Foundation. Int J Digit Libr 15, 153–167 (2015). https://doi.org/10.1007/s00799-015-0147-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-015-0147-1

Keywords

Navigation