Construction of Semantic Data Models

Perez-Arriaga, Martha O.; Estrada, Trilce; Abad-Mota, Soraya

doi:10.1007/978-3-319-94809-6_3

Martha O. Perez-Arriaga¹²,
Trilce Estrada¹² &
Soraya Abad-Mota¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 814))

Included in the following conference series:

International Conference on Data Management Technologies and Applications

557 Accesses
2 Citations

Abstract

The production of scientific publications has increased 8–9% each year during the previous six decades [1]. In order to conduct state-of-the-art research, scientists and scholars have to dig relevant information out of a large volume of documents. Additional challenges to analyze scientific documents include the variability of publishing standards, formats, and domains. Novel methods are needed to analyze and find concrete information in publications rapidly. In this work, we present a conceptual design to systematically build semantic data models using relevant elements including context, metadata, and tables that appear in publications from any domain. To enrich the models, as well as to provide semantic interoperability among documents, we use general-purpose ontologies and a vocabulary to organize their information. The resulting models allow us to synthesize, explore, and exploit information promptly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66(11), 2215–2222 (2015)
Article Google Scholar
Peckham, J., Maryanski, F.: Semantic data models. ACM Comput. Surv. (CSUR) 20(3), 153–189 (1988)
Article Google Scholar
Prli, A., Martinez, M.A., Dimitropoulos, D., Beran, B., Yukich, B.T., Rose, P.W., Bourne, P.E., Fink, J.L.: Integration of open access literature into the RCSB Protein Data Bank using BioLit. BMC Bioinformatics 11, 1–5 (2010)
Google Scholar
Comeau, D.C., Islamaj Doan, R., Ciccarese, P., Cohen, K.B., Krallinger, M., Leitner, F., Lu, Z., Peng, Y., Rinaldi, F., Torii, M., Valencia, A.: BioC: a minimalist approach to interoperability for biomedical text processing. In: Database, bat064 (2013)
Article Google Scholar
Ware, M., Mabe, M.: The STM report: an overview of scientific and scholarly journal publishing (2015)
Google Scholar
The Semantic Web Science Association. http://swsa.semanticweb.org/
Peroni, S.: Semantic Web Technologies and Legal Scholarly Publishing. LGTS, vol. 15. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04777-5
Book Google Scholar
Ouksel, A.M., Sheth, A.: Semantic interoperability in global information systems. ACM Sigmod Rec. 28(1), 5–12 (1999)
Article Google Scholar
Perez-Arriaga, M.O., Estrada, T., Abad-Mota, S.: Table interpretation and extraction of semantic relationships to synthesize digital documents. In: Proceedings of the 6th International Conference on Data Science, Technology and Applications, pp. 223–232 (2017)
Google Scholar
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. AAAI 5, 1306–1313 (2010)
Google Scholar
Nakashole, N., Weikum, G., Suchanek, F.: PATTY: a taxonomy of relational patterns with semantic types. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1135–1145. Association for Computational Linguistics (2012)
Google Scholar
Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: TextRunner: open information extraction on the web. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 25–26. Association for Computational Linguistics (2007)
Google Scholar
Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Commun. ACM 51(12), 68–74 (2008)
Article Google Scholar
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. IJCAI 11, 3–10 (2011)
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics (2011)
Google Scholar
Hull, R., King, R.: Semantic database modeling: survey, applications, and research issues. ACM Comput. Surv. (CSUR) 19(3), 201–260 (1987)
Article Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the Web of Data. Web Semant. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)
Article Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Dumontier, M., Baker, C.J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., Del Rio, N.R., Duck, G., Furlong, L.I., Keath, N., Klassen, D.: The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. J. Biomed. Semant. 5(1), 1–11 (2014)
Article Google Scholar
Data Model - schema.org. http://schema.org/docs/datamodel.html
Nenkova, A., McKeown, K.: Automatic summarization. Found. Trends® Inf. Retrieval 5(2–3), 103–233 (2011)
Article Google Scholar
Teufel, S., Moens, M.: Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist. 28(4), 409–445 (2002)
Article Google Scholar
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., Kochut, K.: Text Summarization Techniques: A Brief Survey. arXiv preprint arXiv:1707.02268, pp. 1–9 (2017)
Baralis, E., Cagliero, L., Jabeen, S., Fiori, A.: Multi-document summarization exploiting frequent itemsets. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, pp. 782–786, ACM (2012)
Google Scholar
National Information Standards Organization Press: Understanding metadata. National Information Standards, vol. 20 (2004)
Google Scholar
Perez-Arriaga, M.O., Wilson, S., Williams, K.P., Schoeniger, J., Waymire, R.L., Powell, A.J.: Omics Metadata Management Software (OMMS). Bioinformation 11(4), 165172 (2015). https://doi.org/10.6026/97320630011165
Article Google Scholar
Shinyama, Y.: PDFMiner: python PDF parser and analyzer (2015). Accessed 11 June 2015
Google Scholar
Statistics - En.wikipedia.org. https://en.wikipedia.org/wiki/Wikipedia:Statistics
Kim, S., Han, K., Kim, S.Y. and Liu, Y.: Scientific table type classification in digital library. In: Proceedings of the 2012 ACM Symposium on Document Engineering, pp. 133–136. ACM (2012)
Google Scholar
Berglund, A., Boag, S., Chamberlin, D., Fernndez, M.F., Kay, M., Robie, J., Simon, J.: XML path language (xpath). World Wide Web Consortium (W3C) (2003)
Google Scholar
Perez-Arriaga, M.O., Estrada, T., Abad-Mota, S.: TAO: system for table detection and extraction from PDF documents. In: The 29th Florida Artificial Intelligence Research Society Conference, FLAIRS 2016, pp. 591–596. AAAI (2016)
Google Scholar
Loria, S., Keen, P., Honnibal, M., Yankovsky, R., Karesh, D., Dempsey, E.: TextBlob: simplified text processing. Secondary TextBlob: Simplified Text Processing (2014)
Google Scholar
Microsoft Cognitive Services. https://azure.microsoft.com/en-us/services/cognitive-services/bing-web-search-api
Zukas, A., Price, R.J.: Document categorization using latent semantic indexing. In: Proceedings 2003 Symposium on Document Image Understanding Technology, UMD, pp. 1–10 (2003)
Google Scholar
Dahchour, M., Pirotte, A., Zimányi, E.: Generic relationships in information modeling. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 1–34. Springer, Heidelberg (2005). https://doi.org/10.1007/11603412_1
Chapter Google Scholar
Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72. Association for Computational Linguistics (2006)
Google Scholar
World Wide Web Consortium. JSON-LD 1.0: a JSON-based serialization for linked data (2014)
Google Scholar
JSON-LD Playground. http://json-ld.org/playground
Hook, V., Bark, S., Gupta, N., Lortie, M., Lu, W.D., Bandeira, N., Funkelstein, L., Wegrzyn, J., OConnor, D.T.: Neuropeptidomic components generated by proteomic functions in secretory vesicles for cellcell communication. AAPS J. 12(4), 635–645 (2010)
Article Google Scholar
Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems. Pearson, Boston (2015)
MATH Google Scholar
Perez-Arriaga, M.O.: Automated Development of Semantic Data Models Using Scientific Publications. University of New Mexico, USA (2018)
Google Scholar
Sivertsen, T., Vernes, G., Steras, O., Nymoen, U., Lunder, T.: Plasma vitamin e and blood selenium concentrations in norwegian dairy cows: regional differences and relations to feeding and health. Acta Veterinaria Scandinavica 46(4), 177 (2005)
Article Google Scholar
Sogstad, A.M., Fjeldaas, T., Steras, O.: Lameness and claw lesions of the norwegian red dairy cattle housed in free stalls in relation to environment, parity and stage of lactation. Acta Veterinaria Scandinavica 46(4), 203 (2005)
Article Google Scholar
DBpedia. http://dbpedia.org

Download references

Author information

Authors and Affiliations

University of New Mexico, Albuquerque, NM, 87131, USA
Martha O. Perez-Arriaga, Trilce Estrada & Soraya Abad-Mota

Authors

Martha O. Perez-Arriaga
View author publications
You can also search for this author in PubMed Google Scholar
Trilce Estrada
View author publications
You can also search for this author in PubMed Google Scholar
Soraya Abad-Mota
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martha O. Perez-Arriaga .

Editor information

Editors and Affiliations

INSTICC, Polytechnic Institute of Setúbal, Setúbal, Portugal
Joaquim Filipe
University of Coimbra, Coimbra, Portugal
Jorge Bernardino
RWTH Aachen University, Aachen, Germany
Christoph Quix

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Perez-Arriaga, M.O., Estrada, T., Abad-Mota, S. (2018). Construction of Semantic Data Models. In: Filipe, J., Bernardino, J., Quix, C. (eds) Data Management Technologies and Applications. DATA 2017. Communications in Computer and Information Science, vol 814. Springer, Cham. https://doi.org/10.1007/978-3-319-94809-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-94809-6_3
Published: 30 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94808-9
Online ISBN: 978-3-319-94809-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics