Abstract
This paper presents an architecture for historical archives maintenance based on Open Linked Data technologies and open source distributed development model and tools. The proposed architecture is being implemented for the archives of the Centro de Pesquisa e Documentação de História Contemporânea do Brasil (Center for Research and Documentation of Brazilian Contemporary History) of the Fundação Getulio Vargas (Getulio Vargas Foundation). We discuss the benefits of this initiative and suggest ways of implementing it, as well as describing the preliminary milestones already achieved. We also present some of the possibilities for extending the accessibility and usefulness of the data archives information using semantic web technologies, natural language processing, image analysis tools, and audio–textual alignment, both in progress and planned.
Similar content being viewed by others
Notes
This is an acronym for specifying information systems that usually implement the search, create, read, update and delete operations, http://goo.gl/33piYJ.
In text files, all DHBB entries uses less than 50 % of the current space used by the entries saved as HTML in the database.
In this application we used Jekyll, http://jekyllrb.com, but any other static site generator could be used.
References
Abreu, A.A., Lattman-Weltman, F., de Paula, C.J.: Dicionário Histórico–Biográfico Brasileiro pós-1930, 3 edn. CPDOC/FGV, Rio de Janeiro (2010)
Ben-Kiki, O., Evans, C., dot Net, I.: Yaml: Yaml ain’t markup language. http://www.yaml.org/spec/1.2/spec.html
Bergman, M.K.: White paper: the deep web: surfacing hidden value. J. Electron. Publ. 7(1) (2001). http://quod.lib.umich.edu/j/jep/3336451.0007.104?view=text;rgn=main
Berners-Lee, T.: Relational databases on the semantic web. Tech. rep., W3C (1998). http://www.w3.org/DesignIssues/RDB-RDF.html
Bizer, C., Cyganiak, R.: D2R server-publishing relational databases on the semantic web. In: 5th International Semantic Web Conference, p. 26 (2006). http://d2rq.org
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia—a crystallization point for the web of data. Web Semant. 7(3), 154–165 (2009). doi:10.1016/j.websem.2009.07.002
Bond, F., Paik, K.: A survey of wordnets and their licenses. In: Proceedings of the 6th Global WordNet Conference (GWC 2012), pp. 64–71. Matsue (2012). http://bit.ly/1aN0Xxd
Brickley, D., Miller, L.: FOAF vocabulary specification (2010). http://xmlns.com/foaf/spec/
Cafezeiro, I., Haeusler, E.H., Rademaker, A.: Ontology and context. In: IEEE International Conference on Pervasive Computing and Communications. IEEE Computer Society, Los Alamitos (2008). doi:10.1109/PERCOM.2008.21
Clark, K.G., Feigenbaum, L., Torres, E.: SPARQL protocol for RDF. Tech. rep., W3C (2008)
Coelho, L.M.R., Rademaker, A., de Paiva, V., de Melo, G.: Embedding NomLex-BR nominalizations into OpenWordnet-PT In: Orav, H., Fellbaum, C., Vossen, P. (eds.) Proceedings of the 7th global WordNet conference, Tartu, Estonia, pp. 378–382. http://globalwordnet.org/global-wordnetconferences-2/ (2014)
Crofts, N., Doerr, M., Gill, T., Stead, S., Stiff, M.: Definition of the CIDOC conceptual reference model. Tech. Rep. 5.0.4, CIDOC CRM Special Interest Group (SIG) (2011). http://www.cidoc-crm.org/index.html
da Cultura, M.: Registro aberto da cultura (r.a.c): manual do usuário (2013). http://sniic.cultura.gov.br
Cyganiak, R., Bizer, C., Garbers, J., Maresch, O., Becker, C.: The D2RQ mapping language. http://d2rq.org/d2rq-language
Davis, I., Galbraith, D.: BIO: a vocabulary for biographical information (2011). http://vocab.org/bio/0.1/.html
Deborah L., McGuinness, F.v.H. (ed.): OWL 2 Web Ontology Language Document Overview, 2 edn. W3C Recommendation. World Wide Web Consortium (2012)
de Paiva, V., Rademaker, A., de Melo, G.: Openwordnet-pt: an open brazilian wordnet for reasoning. In: Proceedings of the 24th International Conference on Computational Linguistics (2012). http://hdl.handle.net/10438/10274
de Paiva, V., Oliveira, D.A.B., Higuchi, S., Rademaker, A., de Melo, G.: Exploratory information extraction from a historical dictionary. In: Proceedings of IEEE 10th International Conference on e-Science (e-Science), Sao Paulo, 20-24 Oct 2014, vol. 2, pp. 11–18 (2014)
Federal, G.: Governo federal dados abertos (2013). http://dados.gov.br/
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Fokkens, A., ter Braake, S., Ockeloen, N., Vossen, P., Legêne, S., Schreiber, G.: Biographynet: methodological issues when nlp supports historical research. In: Proceedings of the 9th Edition of the Language Resources and Evaluation Conference (LREC). Reykjavik, Iceland (2014)
Friesen, N., Hill, H.J., Wegener, D., Doerr, M., Stalmann, K.: Semantic-based retrieval of cultural heritage multimedia objects. Int. J. Semantic Comput. 06(03), 315–327 (2012). doi:10.1142/S1793351X12400107. http://www.worldscientific.com/doi/abs/10.1142/S1793351X12400107
Gil, Y., Miles, S.: PROV model primer. Tech. rep., W3C (2013). http://www.w3.org/TR/prov-primer/
Gruber, J.: Markdown language. http://daringfireball.net/projects/markdown/
Haslhofer, B., Isaac, A.: data.europeana.eu—the europeana linked open data pilot. In: DCMI International Conference on Dublin Core and Metadata Applications. The Hague, The Netherlands (2011). http://eprints.cs.univie.ac.at/2919/
Hyvönen, E., Mäkelä, E., Salminen, M., Valo, A., Viljanen, K., Saarela, S., Junnila, M., Kettula, S.: Finnish museums on the semantic web. J. Web Semant. 3, 25 (2005)
Initiative, D.C.: Dublin core metadata element set (2012). http://dublincore.org/documents/dces/
Initiative, O.D.: Open data initiative (2013). http://www.opendatainitiative.org
Isaac, A., Summers, E.: SKOS simple knowledge organization system prime. Tech. Rep., W3C (2009). http://www.w3.org/TR/skos-primer/
Lagoze, C., de Sompel, H.V., Nelson, M., Warner, S.: The open archives initiative protocol for metadata harvesting (2008). http://www.openarchives.org/OAI/openarchivesprotocol.html
LexML: Rede de informação informativa e jurídica (2013). http://www.lexml.gov.br
Library of Congress: The library of congress’ photostream on flickr. http://www.flickr.com/photos/library_of_congress/
Macleod, C., Grishman, R., Meyers, A., Barret, L., Reeves, R.: Nomlex: A lexicon of nominalizations. In: Proceedings of Euralex 1998, pp. 187–193. Liege, Belgium (1998)
Manola, F., Miller, E. (eds.): RDF Primer. W3C Recommendation. World Wide Web Consortium (2004). http://www.w3.org/TR/rdf-primer/
Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)
Neto, N., Patrick, C., Klautau, A., Trancoso, I.: Free tools and resources for Brazilian Portuguese speech recognition. J. Braz. Comput. Soc. 17, 53–68 (2011)
Niles, I., Pease, A.: Towards a standard upper ontology. In: Proceedings of the International Conference on Formal Ontology in Information Systems, vol. 2001, pp. 2–9. ACM, New York (2001)
Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pp. 23–25. European Language Resources Association (ELRA), Istanbul, Turkey (2012)
Purday, J.: Think culture: Europeana.eu from concept to construction. Electron. Libr. 27, 919–937 (2009)
Rademaker, A., Higuchi, S., Oliveira, D.A.B.: A linked open data architecture for contemporary historical archives. In: Predoiu, L., Mitschick, A., Nurnberger, A., Risse, T., Ross, S. (eds.) Proceedings of 3rd Edition of the Semantic Digital Archives Workshop. Valetta, Malta (2013). Workshop website at http://mt.inf.tu-dresden.de/sda2013/. Proceedings at http://ceur-ws.org/Vol-1091/
Raggett, D., Hors, A.L., Jacobs, I.: Html 4.01 specification. Tech. Rep. REC-html401-19991224, W3C (1999). http://www.w3.org/TR/html401/
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: 16th International World Wide Web Conference (WWW 2007). ACM Press, New York (2007)
Szekely, P., Knoblock, C., Yang, F., Zhu, X., Fink, E., Allen, R., Goodlander, G.: Connecting the smithsonian american art museum to the linked data cloud. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) The Semantic Web: Semantics and Big Data, Lecture Notes in Computer Science, vol. 7882, pp. 593–607. Springer, Berlin (2013). doi:10.1007/978-3-642-38288-8_40
Vasconcelos, C.N., Sa, A.M., Carvalho, P.C., Sa, M.I.: Structuring and embedding image captions: the v.i.f. multi-modal system. In: VAST: International Symposium on Virtual Reality. Archaeology and Intelligent Cultural Heritage, pp. 25–32. Eurographics Association, Brighton (2012)
Vatant, B., Wick, M.: Geonames Ontology (2012). http://www.geonames.org/ontology/documentation.html
Wick, M., Vatant, B.: Geonames Ontology (2011). http://www.geonames.org/ontology
Young, S.J., Evermann, G., Gales, M., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.4. Cambridge University Engineering Department, Cambridge (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rademaker, A., Oliveira, D.A.B., de Paiva, V. et al. A linked open data architecture for the historical archives of the Getulio Vargas Foundation. Int J Digit Libr 15, 153–167 (2015). https://doi.org/10.1007/s00799-015-0147-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-015-0147-1