Skip to main content

Big Research Data Integration

  • Conference paper
  • First Online:
Information Search, Integration, and Personalization (ISIP 2018)

Abstract

The paper presents a vision about a new paradigm of data integration in the context of the scientific world, where data integration is instrumental in exploratory studies carried out by research teams. It briefly overviews the technological challenges to be faced in order to successfully carry out the traditional approach to data integration. Then, three important application scenarios are described in terms of their main characteristics that heavily influence the data integration process. The first application scenario is characterized by the need of large enterprises to combine information from a variety of heterogeneous data sets developed autonomously, managed and maintained independently from the others in the enterprises. The second application scenario is characterized by the need of many organizations to combine information from a large number of data sets dynamically created, distributed worldwide and available on the Web. The third application scenario is characterized by the need of scientists and researchers to connect each others research data as new insight is revealed by connections between diverse research data sets. The paper highlights the fact that the characteristics of the second and third application scenarios make unfeasible the traditional approach to data integration, i.e., the design of a global schema and mappings between the local schemata and the global schema. The focus of the paper is on the data integration problem in the context of the third application scenario. A new paradigm of data integration is proposed based on the emerging new empiricist scientific method, i.e., data driven research and the new data seeking paradigm, i.e., data exploration. Finally, a generic scientific application scenario is presented for the purpose of better illustrating the new data integration paradigm, and a concise list of actions that must be performed in order to successfully carry out the new paradigm of big research data integration is described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bernstein, P.A., Haas, L.M.: Information integration in the enterprise. Commun. ACM 51(9), 72–79 (2008)

    Article  Google Scholar 

  2. Bizer, C., Seaborne, A.: D2RQ-treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd international semantic web conference (ISWC 2004), vol. 2004 (2004)

    Google Scholar 

  3. Brackett, M.H.: Data Resource Integration: Understanding and Resolving a Disparate Data Resource, vol. 2. Technics Publications, Denville (2012)

    Google Scholar 

  4. Buneman, P., Davidson, S., Frew, J.: Why data citation is a computational problem. Commun. ACM 59(9), 50–57 (2016)

    Article  Google Scholar 

  5. Chawathe, S., et al.: The TSIMMIS project: integration of heterogenous information sources (1994)

    Google Scholar 

  6. Council, N.R., et al.: Steps Toward Large-scale Data Integration in the Sciences: Summary of a Workshop. National Academies Press, Washington, D.C. (2010)

    Google Scholar 

  7. Daraio, C., et al.: Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics 106(2), 857–871 (2016)

    Article  Google Scholar 

  8. Doan, A., Halevy, A.Y.: Semantic integration research in the database community: a brief survey. AI Mag. 26(1), 83–83 (2005)

    Google Scholar 

  9. Dong, X.L., Srivastava, D.: Big data integration. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1245–1248. IEEE (2013)

    Google Scholar 

  10. Guarino, N., Oberle, D., Staab, S.: What is an ontology? In: Staab, S., Studer, R. (eds.) Handbook on Ontologies. IHIS, pp. 1–17. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-92673-3_0

    Chapter  Google Scholar 

  11. Gutierrez, C., Hurtado, C.A., Vaisman, A.: Introducing time into RDF. IEEE Trans. Knowl. Data Eng. 19(2), 207–218 (2007)

    Article  Google Scholar 

  12. Halevy, A., Rajaraman, A., Ordille, J.: Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 9–16. VLDB Endowment (2006)

    Google Scholar 

  13. Heath, T., Bizer, C.: Linked data: evolving the web into a global data space. Synth. Lect. Semant. Web: Theory Technol. 1(1), 1–136 (2011)

    Google Scholar 

  14. Idreos, S., Papaemmanouil, O., Chaudhuri, S.: Overview of data exploration techniques. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 277–281. ACM (2015)

    Google Scholar 

  15. Kitchin, R.: Big data, new epistemologies and paradigm shifts. Big Data Soc. 1(1), 2053951714528481 (2014)

    Article  Google Scholar 

  16. Koch, C.: Data integration against multiple evolving autonomous schemata. Ph.D. thesis, Vienna U (2001)

    Google Scholar 

  17. Lenzerini, M.: Data integration: a theoretical perspective. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 233–246. ACM (2002)

    Google Scholar 

  18. McBride, B.: The resource description framework (RDF) and its vocabulary description language RDFs. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies, pp. 51–65. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24750-0_3

    Chapter  Google Scholar 

  19. Naumann, F., Bilke, A., Bleiholder, J., Weis, M.: Data fusion in three steps: resolving inconsistencies at schema-, tuple-, and value-level. IEEE Data Eng. Bull. 29(2), 21–31 (2006)

    Google Scholar 

  20. Proll, S., Rauber, A.: Scalable data citation in dynamic, large databases: model and reference implementation. In: 2013 IEEE International Conference on Big Data, pp. 307–312. IEEE (2013)

    Google Scholar 

  21. Silvello, G.: Theory and practice of data citation. J. Assoc. Inf. Sci. Technol. 69(1), 6–20 (2018)

    Article  Google Scholar 

  22. Vassiliadis, P.: A survey of extract-transform-load technology. Int. J. Data Warehous. Min. (IJDWM) 5(3), 1–27 (2009)

    Article  Google Scholar 

  23. Wiederhold, G.: Interoperation, mediation and ontologies. In: FGCS Workshop on Heterogeneous Cooperative Knowledge-Bases (1994)

    Google Scholar 

  24. Ziegler, P., Dittrich, K.R.: Three decades of data intecration — all problems solved? In: Jacquart, R. (ed.) Building the Information Society. IIFIP, vol. 156, pp. 3–12. Springer, Boston, MA (2004). https://doi.org/10.1007/978-1-4020-8157-6_1

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valentina Bartalesi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bartalesi, V., Meghini, C., Thanos, C. (2019). Big Research Data Integration. In: Kotzinos, D., Laurent, D., Spyratos, N., Tanaka, Y., Taniguchi, Ri. (eds) Information Search, Integration, and Personalization. ISIP 2018. Communications in Computer and Information Science, vol 1040. Springer, Cham. https://doi.org/10.1007/978-3-030-30284-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30284-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30283-2

  • Online ISBN: 978-3-030-30284-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics