Skip to main content
Log in

Semantic-based Big Data integration framework using scalable distributed ontology matching strategy

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Nowadays, Big Data management has become a key basis for innovation, productivity growth, and competition. The correlated exploitation of data of this magnitude remains primordial to discover valuable insights and support decision making for domains of major interest. Furthermore, despite the complex aspects of Big Data environments, users are usually looking for a unified and appropriate view of this huge and heterogeneous data, to support the extraction of reliable and consistent knowledge. Thus, Big Data integration mechanisms must be considered to provide a uniform query interface, to mediate across large datasets and provide data scientists with a consistent integrated view suitable for analytical exploitations. Thus, this paper presents a semantic-based Big Data integration framework that relies on large-scale ontology matching and probabilistic-logical based assessment strategies. This framework applies optimization mechanisms and leverages parallel-computing paradigms (Hadoop and MapReduce) using commodity computational resources, to efficiently address the Big Data challenges and aspects. Several experiments were conducted and have proven the efficiency of this framework in terms of accuracy, performance, and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. http://www.w3.org/TR/owl-features/.

  2. https://www.w3.org/.

  3. https://jena.apache.org/.

  4. http://oaei.ontologymatching.org.

References

  1. Abbes, H., Gargouri, F.: Mongodb-based modular ontology building for big data integration. J. Data Seman. 7(1), 1–27 (2018)

    Article  Google Scholar 

  2. Alasoud, A., Haarslev, V., Shiri, N.: A hybrid approach for ontology integration. In: Proceedings of the VLDB Workshop on Ontologies-based techniques for DataBases and Information Systems (ODBIS), Trondheim, Norway (2005)

  3. Algergawy, A., Massmann, S., Rahm, E.: A clustering-based approach for large-scale ontology matching. In: Proceedings of the East European Conference on Advances in Databases and Information Systems, Springer, pp. 415–428 (2011)

  4. Algergawy, A., Babalou, S., Kargar, M.J., Davarpanah, S.H.: Seecont: A new seeding-based clustering approach for ontology matching. In: Proceedings of the East European Conference on Advances in Databases and Information Systems, Springer, pp 245–258 (2015)

  5. Amin, M.B., Khan, W.A., Lee, S., Kang, B.H.: Performance-based ontology matching. Appl. Intell. 43(2), 356–385 (2015)

    Article  Google Scholar 

  6. Ba, M., Diallo, G.: Large-scale biomedical ontology matching with servomap. IRBM 34(1), 56–59 (2013)

    Article  Google Scholar 

  7. Bansal, S.K., Kagemann, S.: Integrating big data: a semantic extract-transform-load framework. Computer 48(3), 42–50 (2015)

    Article  Google Scholar 

  8. Bello-Orgaz, G., Jung, J.J., Camacho, D.: Social big data: recent achievements and new challenges. Inf. Fus. 28, 45–59 (2016)

    Article  Google Scholar 

  9. Brandes, U., Borgatti, S.P., Freeman, L.C.: Maintaining the duality of closeness and betweenness centrality. Soc. Netw. 44, 153–159 (2016)

    Article  Google Scholar 

  10. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: the dl-lite family. J. Autom. Reason. 39(3), 385–429 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  11. Castano, S., Ferrara, A., Montanelli, S.: Matching techniques for data integration and exploration: from databases to big data. In: A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years, Springer, pp 61–76 (2018)

  12. Cerbah, F.: Learning ontologies with deep class hierarchies by mining the content of relational databases. In: Advances in knowledge discovery and management, Springer, pp 271–286 (2010)

  13. Cheatham, M., Pesquita, C.: Semantic data integration. In: Handbook of Big Data Technologies, Springer, pp 263–305 (2017)

  14. Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S., Zhou, X.: Big data challenge: a data management perspective. Front. Comput. Sci. 7(2), 157–164 (2013)

    Article  MathSciNet  Google Scholar 

  15. Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mobile Netw. Appl. 19(2), 171–209 (2014)

    Article  Google Scholar 

  16. Cruz, I.F., Xiao, H.: The role of ontologies in data integration. Eng. Intell. Syst. Electr. Eng. Commun. 13(4), 245 (2005)

    Google Scholar 

  17. Csató, L.: Measuring centrality by a generalization of degree. Central Eur. J. Oper. Res. 25(4), 771–790 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  18. Curé, O., Lamolle, M., Duc, C.L.: Ontology based data integration over document and column family oriented nosql. arXiv preprint arXiv:13072603 (2013)

  19. Daraio, C., Lenzerini, M., Leporelli, C., Moed, H.F., Naggar, P., Bonaccorsi, A., Bartolucci, A.: Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics 106(2), 857–871 (2016a)

    Article  Google Scholar 

  20. Daraio, C., Lenzerini, M., Leporelli, C., Naggar, P., Bonaccorsi, A., Bartolucci, A.: The advantages of an ontology-based data management approach: openness, interoperability and data quality. Scientometrics 108(1), 441–455 (2016b)

    Article  Google Scholar 

  21. David, J., Guillet, F., Briand, H.: Matching directories and owl ontologies with aroma. In: Proceedings of the 15th ACM international conference on Information and knowledge management, ACM, pp 830–831 (2006)

  22. Djeddi, W.E., Khadir, M.T.: A novel approach using context-based measure for matching large scale ontologies. In: International Conference on Data Warehousing and Knowledge Discovery, Springer, pp 320–331 (2014)

  23. Do, H.H., Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)

    Article  Google Scholar 

  24. Ehrig, M., Staab, S.: Qom–quick ontology mapping. In: Proceedings of the International Semantic Web Conference, Springer, pp 683–697 (2004)

  25. El Idrissi Esserhrouchni, O., Frikh, B., Ouhbi, B.: Learning non-taxonomic relationships of financial ontology. In: Proceedings of the International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, SCITEPRESS-Science and Technology Publications, Lda, pp 479–489 (2015)

  26. El Idrissi, Esserhrouchni O., Frikh, B., Ouhbi, B., Ibrahim, I.K.: Learning domain taxonomies: the taxoline approach. Int. J. Web Inf. Syst. 13(3), 281–301 (2017)

    Article  Google Scholar 

  27. Emani, C.K., Cullot, N., Nicolle, C.: Understandable big data: a survey. Comput. Sci. Rev. 17, 70–81 (2015)

    Article  MathSciNet  Google Scholar 

  28. Erraissi, A., Belangour, A.: Capturing hadoop storage big data layer meta-concepts. In: Proceedings of the International Conference on Advanced Intelligent Systems for Sustainable Development, Springer, pp 413–421 (2018)

  29. Essayeh, A., Abed, M.: Towards ontology matching based system through terminological, structural and semantic level. Procedia Comput. Sci. 60, 403–412 (2015)

    Article  Google Scholar 

  30. Euzenat, J., Shvaiko, P., et al.: Ontology Matching, vol. 18. Springer, New York (2007)

    MATH  Google Scholar 

  31. Fürber, C., Hepp, M.: Towards a vocabulary for data quality management in semantic web architectures. In: Proceedings of the 1st International Workshop on Linked Web Data Management, ACM, pp 1–8 (2011)

  32. Gao, C., Wei, D., Hu, Y., Mahadevan, S., Deng, Y.: A modified evidential methodology of identifying influential nodes in weighted networks. Phys. A Stat. Mech. Appl. 392(21), 5490–5500 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  33. García, MdMR, García-Nieto, J., Aldana-Montes, J.F.: An ontology-based data integration approach for web analytics in e-commerce. Expert Syst. Appl. 63, 20–34 (2016)

    Article  Google Scholar 

  34. George, L.: HBase: The Definitive Guide: Random Access to Your Planet-size Data. O’Reilly Media Inc, Newton (2011)

    Google Scholar 

  35. Gross, A., Hartung, M., Kirsten, T., Rahm, E.: On matching large life science ontologies in parallel. In: Proceedings of the International Conference on Data Integration in the Life Sciences, Springer, pp 35–49 (2010)

  36. Hu, W., Qu, Y., Cheng, G.: Matching large ontologies: a divide-and-conquer approach. Data Knowl. Eng. 67(1), 140–160 (2008)

    Article  Google Scholar 

  37. Hui, J., Li, L., Zhang, Z.: Integration of big data: a survey. In: Proceedings of the International Conference of Pioneering Computer Scientists, Engineers and Educators. pp. 101–121. Springer (2018)

  38. Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: Proceedings of the USENIX Annual Technical Conference, Boston, vol 8 (2010)

  39. Jiménez-Ruiz, E., Grau ,B.C.: Logmap: Logic-based and scalable ontology matching. In: Proceedings of the International Semantic Web Conference, Springer, pp 273–288 (2011)

  40. Jirkovskỳ, V., Obitko, M.: Semantic heterogeneity reduction for big data in industrial automation. In: Proceedings of the ITAT (2014)

  41. Kiran, V., Vijayakumar, R.: Ontology based data integration of nosql datastores. In: Proceedings of the Industrial and Information Systems (ICIIS), 2014 9th International Conference on, IEEE, pp 1–6 (2014)

  42. Klein, D.: Centrality measure in graphs. J. Math. Chem. 47(4), 1209–1223 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  43. Krishnan, K.: Data Warehousing in the Age of Big Data. Newnes, Oxford (2013)

    Google Scholar 

  44. Landherr, A., Friedl, B., Heidemann, J.: A critical review of centrality measures in social networks. Bus. Inf. Syst. Eng. 2(6), 371–385 (2010)

    Article  Google Scholar 

  45. Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, pp 233–246 (2002)

  46. Li, L., Wei, Y., Tian, F.: A framework for ontology-based top-k global schema generation. J. Data Seman. 6(1), 31–53 (2017)

    Article  MathSciNet  Google Scholar 

  47. Liaw, S.T., Rahimi, A., Ray, P., Taggart, J., Dennis, S., de Lusignan, S., Jalaludin, B., Yeo, A., Talaei-Khoei, A.: Towards an ontology for data quality in integrated chronic disease management: a realist review of the literature. Int. J. Med. Inf. 82(1), 10–24 (2013)

    Article  Google Scholar 

  48. Liu, J., Xiong, Q., Shi, W., Shi, X., Wang, K.: Evaluating the importance of nodes in complex networks. Phys. A Stat. Mech. Appl. 452, 209–219 (2016)

    Article  Google Scholar 

  49. Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intell. Syst. 16(2), 72–79 (2001)

    Article  Google Scholar 

  50. Mailavaram, A., Rani, B.P.: Big data: scalability storage. In: Innovations in Computer Science and Engineering, Springer, pp 473–481 (2019)

  51. Mallede, W.Y., Marir, F., Vassilev, V.T.: Algorithms for mapping rdb schema to rdf for facilitating access to deep web. In: Proceedings of the First International Conference on Building and Exploring Web Based Environments, pp 32–41 (2013)

  52. Malucelli, A., da Costa Oliveira, E.: Ontology-services to facilitate agents’ interoperability. In: Proceedings of the Pacific Rim International Workshop on Multi-Agents, Springer, pp 170–181 (2003)

  53. Marsden, P.V.: Network centrality, measures of, 2nd edn. International Encyclopedia of the Social and Behavioral Sciences (2015)

  54. Mena, E., Illarramendi, A., Kashyap, V., Sheth, A.P.: Observer: an approach for query processing in global information systems based on interoperation across pre-existing ontologies. Distrib. Parallel Databases 8(2), 223–271 (2000)

    Article  Google Scholar 

  55. Mezghani, E., Exposito, E., Drira, K., Da Silveira, M., Pruski, C.: A semantic big data platform for integrating heterogeneous wearable data in healthcare. J. Med. Syst. 39(12), 185 (2015)

    Article  Google Scholar 

  56. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  57. Moawed, S., Algergawy, A., Sarhan, A., Eldosouky, A., Saake, G.: A latent semantic indexing-based approach to determine similar clusters in large-scale schema matching. In: New Trends in Databases and Information Systems, Springer, pp 267–276 (2014)

  58. Nadal, S., Romero, O., Abelló, A., Vassiliadis, P., Vansummeren, S.: An integration-oriented ontology to govern evolution in big data ecosystems. Inf. Syst. 79, 3–19 (2019)

    Article  Google Scholar 

  59. Noessner, J., Niepert, M., Stuckenschmidt, H.: Rockit: Exploiting parallelism and symmetry for map inference in statistical relational models. In: Proceedings of the AAAI Workshop: Statistical Relational Artificial Intelligence (2013)

  60. Oldham, S., Fulcher, B., Parkes, L., Arnatkeviciute, A., Suo, C., Fornito, A.: Consistency and differences between centrality measures across distinct classes of networks. arXiv preprint arXiv:180502375 (2018)

  61. Otero-Cerdeira, L., Rodríguez-Martínez, F.J., Gómez-Rodríguez, A.: Ontology matching: a literature review. Expert Syst. Appl. 42(2), 949–971 (2015)

    Article  Google Scholar 

  62. Peukert, E., Berthold, H., Rahm, E.: Rewrite techniques for performance optimization of schema matching processes. In: Proceedings of the 13th International Conference on Extending Database Technology, ACM, pp 453–464 (2010)

  63. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. In: Journal on data semantics X, Springer, pp 133–173 (2008)

  64. Putnik, G., Sluga, A., ElMaraghy, H., Teti, R., Koren, Y., Tolio, T., Hon, B.: Scalability in manufacturing systems design and operation: state-of-the-art and future developments roadmap. CIRP Ann. 62(2), 751–774 (2013)

    Article  Google Scholar 

  65. Rahm, E.: Towards large-scale schema and ontology matching. In: Schema matching and mapping, Springer, pp 3–27 (2011)

  66. Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1–2), 107–136 (2006)

    Article  MATH  Google Scholar 

  67. Ruflin, N., Burkhart, H., Rizzotti, S.: Social-data storage-systems. In: Databases and social networks, ACM, pp 7–12 (2011)

  68. Sakr, S.: Big Data 2.0 Processing Systems: A Survey. Springer, New York (2016)

    Book  Google Scholar 

  69. Sandhya, N., Sekar, M.R.: Analysis of variant approaches for initial centroid selection in k-means clustering algorithm. In: Smart Computing and Informatics, Springer, pp 109–121 (2018)

  70. Santipantakis, G., Kotis, K., Vouros, G.A.: Obdair: ontology-based distributed framework for accessing, integrating and reasoning with data in disparate data sources. Expert Syst. Appl. 90, 464–483 (2017)

    Article  Google Scholar 

  71. Schneider, T., Hashemi, A., Bennett, M., Brady, M., Casanave, C., Graves, H., Gruninger, M., Guarino, N., Levenchuk, A., Lucier, E., et al.: Ontology for big systems: the ontology summit 2012 communique. Appl. Ontol. 7(3), 357–371 (2012)

    Article  Google Scholar 

  72. Schuhmacher, M., Ponzetto, S.P.: Ranking entities in a large semantic network. In: Proceedings of the European Semantic Web Conference, Springer, pp 254–258 (2014)

  73. Seddiqui, M.H., Aono, M.: An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Web Seman. 7(4), 344–356 (2009)

    Article  Google Scholar 

  74. Sezer, O.B., Dogdu, E., Ozbayoglu, M., Onal, A.: An extended iot framework with semantics, big data, and analytics. In: Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), IEEE, pp 1849–1856 (2016)

  75. Shvachko, K., Kuang, H., Radia, S., Chansler, R., et al.: The hadoop distributed file system. MSST 10, 1–10 (2010)

    Google Scholar 

  76. Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)

    Article  Google Scholar 

  77. Siddiqa, A., Hashem, I.A.T., Yaqoob, I., Marjani, M., Shamshirband, S., Gani, A., Nasaruddin, F.: A survey of big data management: taxonomy and state-of-the-art. J. Netw. Comput. Appl. 71, 151–166 (2016)

    Article  Google Scholar 

  78. Siddiqa, A., Karim, A., Gani, A.: Big data storage technologies: a survey. Front. Inf. Technol. Electron. Eng. 18(8), 1040–1070 (2017)

    Article  Google Scholar 

  79. Song, F., Zacharewicz, G., Chen, D.: An analytic aggregation-based ontology alignment approach with multiple matchers. In: Advanced Techniques for Knowledge Engineering and Innovative Applications, Springer, pp 143–159 (2013)

  80. Steyskal, S., Polleres, A.: Mix’n’match: An alternative approach for combining ontology matchers. In: Proceedings of the OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, Springer, pp 555–563 (2013)

  81. Strohbach, M., Daubert, J., Ravkin, H., Lischka, M.: Big data storage. In: New horizons for a data-driven economy, pp. 119–141. Springer, Cham (2016)

  82. Sure, Y., Staab, S., Studer, R.: Methodology for development and employment of ontology based knowledge management applications. ACM Sigmod. Record. 31(4), 18–23 (2002)

    Article  Google Scholar 

  83. Taylor, R.C.: An overview of the hadoop/mapreduce/hbase framework and its current applications in bioinformatics. In: BMC bioinformatics, BioMed Central, vol 11, p S1 (2010)

  84. Thorsby, J., Stowers, G.N., Wolslegel, K., Tumbuan, E.: Understanding the content and features of open data portals in american cities. Government Inf. Q. 34(1), 53–61 (2017)

    Article  Google Scholar 

  85. Uzdanaviciute, V., Butleris, R.: Ontology-based foundations for data integration. In: Proceedings of the BUSTECH The First International Conference on Business Intelligence and Technology, Citeseer, pp 34–39 (2011)

  86. Van Hage, W.R., Katrenko, S., Schreiber, G.: A method to combine linguistic ontology-mapping techniques. In: Proceedings of the International Semantic Web Conference, Springer, pp 732–744 (2005)

  87. Vandecasteele, A., Napoli, A.: Spatial ontologies for detecting abnormal maritime behaviour. In: Proceedings of the OCEANS 2012 MTS/IEEE Yeosu Conference: The Living Ocean and Coast-Diversity of Resources and Sustainable Activities, IEEE-Institute of Electrical and Electronics Engineers, pp 7–pages (2012)

  88. Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996)

    Article  Google Scholar 

  89. Wang, P., Zhou, Y., Xu, B.: Matching large ontologies based on reduction anchors. In: Proceedings of the IJCAI, pp 2343–2348 (2011)

  90. White, T.: Hadoop: The Definitive Guide. O’Reilly Media Inc., Newton (2012)

    Google Scholar 

  91. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 133–138 (1994)

  92. Zamboulis, L., Poulovassilis, A., Wang, J.: Ontology-assisted data transformation and integration. In: Proceedings of the ODBIS, pp 29–36 (2008)

  93. Zhou, K., Fu, C., Yang, S.: Big data driven smart energy management: from big data to big insights. Renew. Sustain. Energy Rev. 56, 215–225 (2016)

    Article  Google Scholar 

  94. Zhou, L.: Ontology learning: state of the art and open issues. Inf. Technol. Manage. 8(3), 241–252 (2007)

    Article  Google Scholar 

  95. Zhu, X., Song, B., Ni, Y., Ren, Y., Li, R.: Business Trends in the Digital Era: Evolution of Theories and Applications. Springer, New York (2016)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imadeddine Mountasser.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mountasser, I., Ouhbi, B., Hdioud, F. et al. Semantic-based Big Data integration framework using scalable distributed ontology matching strategy. Distrib Parallel Databases 39, 891–937 (2021). https://doi.org/10.1007/s10619-021-07321-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-021-07321-6

Keywords

Navigation