skip to main content
10.1145/3152494.3152516acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

HDSanalytics: a data analytics framework for heterogeneous data sources

Published:11 January 2018Publication History

ABSTRACT

This paper presents HDSAnalytics: A data analytics framework for heterogeneous data sources. This framework utilizes data from a variety of data sources differing in formats and volume. These data sources can contain data in structured, semi-structured or unstructured form. The integration of data from these different data sources into a single unified data source may result in some loss of information due to semantic, syntactic and schematic differences that arise among data sources. Semantic heterogeneity arises because of the presence of similar data in different forms in different data sources. Schematic and Syntactic heterogeneity arises due to the difference in formats/schema in which the data is stored and the way in which the data is accessed and retrieved respectively. Hence, the need to access, retrieve and utilize the information from different data sources possess challenges like 1. Mapping similar attributes among different data sources, 2. Retrieving specific attributes from different data sources that are relevant with respect to a users query, 3. Retrieving data from different data sources in different formats as requested by different components in the system. The proposed HDS Analytics framework design aides analytic models in using heterogeneous data sources "As-Is" without integrating into a single data source, thereby overcoming all the above mentioned challenges. Our prototype of the framework, experimented using data from Bangalore Metropolitan Transport Corporation (BMTC), India, demonstrates how bus fleet operations can be smoothly analyzed, diagnosed and explored for improving bus fleet schedules and reducing the operations costs. It provides detailed insight on bus fleet operations. Our prototype scales and works efficiently well with increasing number of heterogeneous data sources.

References

  1. N. Alalwan, H. Zedan, and F. Siewe. 2009. Generating OWL Ontology for Database Integration. In Proceedings of the Third International Conference on Advances in Semantic Processing (SEMAPRO). IEEE, 22fi?!31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bogdan Alexe, Balder Ten Cate, Phokion G Kolaitis, and Wang-Chiew Tan. 2011. EIRENE: Interactive design and refinement of schema mappings via data examples. Proceedings of the VLDB Endowment 4, 12 (2011), 1414--1417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ladjel Bellatreche, Dung Nguyen Xuan, Guy Pierra, and Hondjack Dehainsala. 2006. Contribution of ontology-based data modeling to automatic integration of electronic catalogues within engineering databases. Computers in Industry 57, 8-9 (2006), 711--724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gerrit A Blaauw and Frederick P Brooks Jr. 1997. Computer architecture: concepts and evolution. Addison-Wesley Longman Publishing Co., Inc. 489fi?!493 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Agustina Buccella, Ra Cechich, and Nieves R. Brisaboa. {n. d.}. Ontology-Based Data Integration Methods: A Framework for Comparison.Google ScholarGoogle Scholar
  6. Isabel F. Cruz and Huiyong Xiao. 2005. The role of ontologies in data integration. Engineering Intelligent Systems 13, 4 (2005), 245--252.Google ScholarGoogle Scholar
  7. Isabel F. Cruz and Huiyong Xiao. 2009. Ontology Driven Data Integration in Heterogeneous Networks. In Complex Systems in Knowledge-based Environments: Theory, Models and Applications. 75--98.Google ScholarGoogle Scholar
  8. Nadine Cullot, Raji Ghawi, , and Kokou Ytongnon. 2007. DB2OWL: A Tool for Automatic Database-to-Ontology Mapping. In Proceedings of the 15th Italian Symposium on Advanced Database Systems(SEBD). 491--494.Google ScholarGoogle Scholar
  9. C.P. de Laborda and S. Conrad. 2005. Relational.OWL - A Data and Schema Representation Format Based on OWL. In In Proc. Second Asia-Pacific Conference on Conceptual Modelling (APCCM2005). 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Galvis. 2010. Messaging Design Pattern and Pattern Implementation. https://java.net/downloads/jt/MDP.pdf. (2010).Google ScholarGoogle Scholar
  11. Georg Gottlob and Pierre Senellart. 2010. Schema mapping discovery from data instances. Journal of the ACM (JACM) 57, 2 (2010), 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T.R. Gruber. 1993. A translation approach to portable ontology specifications. Knowledge Acquisition 5, 2 (1993), 199--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Guarino. 1998. Formal Ontology in Information Systems. IOS Press, Amsterdam. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H.A.Hashim, A. Ahmed, N. Salim, A.Osman O.Y.Sheng, A.Sim, A.Bakri, N.H.Zakaria, R.Ibrahim, and S.S.Omar. 2005. A New Database Integration Model Using An Ontology-Driven Mediated Warehousing Approach. Journal of Theoretical and Applied Information Technology 58, 2 (2005), 392--409.Google ScholarGoogle Scholar
  15. Abdolreza Hajmoosaei and Sameem Abdul-Kareem. 2007. An ontology-based approach for resolving semantic schema conflicts in the extraction and integration of query-based information from heterogeneous web data sources. In In Proceedings of the Third Australasian Workshop on Advances in Ontologies. ACM, 35--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Spyros Kotoulas, Vanessa Lopez, Raymond Lloyd, Marco Luca Sbodio, Freddy Lecue, Martin Stephenson, Elizabeth Daly, Veli Bicer, Aris Gkoulalas-Divanis, Giusy Di Lorenzo, Anika Schumann, and Pol Mac Aonghusa. 2014. SPUD - Semantic Processing of Urban Data. Journal of Web Semantics 24, 0 (2014).Google ScholarGoogle ScholarCross RefCross Ref
  17. Eric Lambrecht, Subbarao Kambhampati, and Senthil Gnanaprakasam. 1999. Optimizing Recursive Information-Gathering Plans.. In IJCAI. Morgan Kaufmann, 1204--1211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alon Levy, Anand Rajaraman, and Joann Ordille. 1996. Querying Heterogeneous Information Sources Using Source Descriptions. In Proceedings of the International Conference on Very Large Data Bases. 251--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sun Microsystems. 2002. Core J2EE Patterns - Data Access Object. http://www.oracle.com/technetwork/java/dataaccessobject-138824.html. (2002).Google ScholarGoogle Scholar
  20. Natayala F. Noy. 2004. Semantic integration: a survey of ontology-based approaches. 33, 4 (2004), 65--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Pan, Y. Zheng, D. Wilkie, and C. Shahabi. 2013. Crowd sensing of traffic anomalies based on human mobility and social media. In In Proceedings of the 21th ACM SIGSPATIAL Conference on Advances in Geographical Information Systems. ACM, 344--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jeffrey D. Ullman. 1997. Information Integration Using Logical Views. In Proceedings of the 6th International Conference on Database Theory. Springer-Verlag, 19--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Wache, T. Vgele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. Hbner. 2001. Ontology-Based Integration of Information - A Survey of Existing Approaches. In IJCAI-01 Workshop: Ontologies and Information. 108--117.Google ScholarGoogle Scholar
  24. Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. 2014. Urban computing: Concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol 5, 3 (2014), 3--57. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HDSanalytics: a data analytics framework for heterogeneous data sources

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
      January 2018
      379 pages
      ISBN:9781450363419
      DOI:10.1145/3152494

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 January 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CODS-COMAD '18 Paper Acceptance Rate50of150submissions,33%Overall Acceptance Rate197of680submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader