ABSTRACT
This paper presents HDSAnalytics: A data analytics framework for heterogeneous data sources. This framework utilizes data from a variety of data sources differing in formats and volume. These data sources can contain data in structured, semi-structured or unstructured form. The integration of data from these different data sources into a single unified data source may result in some loss of information due to semantic, syntactic and schematic differences that arise among data sources. Semantic heterogeneity arises because of the presence of similar data in different forms in different data sources. Schematic and Syntactic heterogeneity arises due to the difference in formats/schema in which the data is stored and the way in which the data is accessed and retrieved respectively. Hence, the need to access, retrieve and utilize the information from different data sources possess challenges like 1. Mapping similar attributes among different data sources, 2. Retrieving specific attributes from different data sources that are relevant with respect to a users query, 3. Retrieving data from different data sources in different formats as requested by different components in the system. The proposed HDS Analytics framework design aides analytic models in using heterogeneous data sources "As-Is" without integrating into a single data source, thereby overcoming all the above mentioned challenges. Our prototype of the framework, experimented using data from Bangalore Metropolitan Transport Corporation (BMTC), India, demonstrates how bus fleet operations can be smoothly analyzed, diagnosed and explored for improving bus fleet schedules and reducing the operations costs. It provides detailed insight on bus fleet operations. Our prototype scales and works efficiently well with increasing number of heterogeneous data sources.
- N. Alalwan, H. Zedan, and F. Siewe. 2009. Generating OWL Ontology for Database Integration. In Proceedings of the Third International Conference on Advances in Semantic Processing (SEMAPRO). IEEE, 22fi?!31. Google ScholarDigital Library
- Bogdan Alexe, Balder Ten Cate, Phokion G Kolaitis, and Wang-Chiew Tan. 2011. EIRENE: Interactive design and refinement of schema mappings via data examples. Proceedings of the VLDB Endowment 4, 12 (2011), 1414--1417.Google ScholarDigital Library
- Ladjel Bellatreche, Dung Nguyen Xuan, Guy Pierra, and Hondjack Dehainsala. 2006. Contribution of ontology-based data modeling to automatic integration of electronic catalogues within engineering databases. Computers in Industry 57, 8-9 (2006), 711--724. Google ScholarDigital Library
- Gerrit A Blaauw and Frederick P Brooks Jr. 1997. Computer architecture: concepts and evolution. Addison-Wesley Longman Publishing Co., Inc. 489fi?!493 pages. Google ScholarDigital Library
- Agustina Buccella, Ra Cechich, and Nieves R. Brisaboa. {n. d.}. Ontology-Based Data Integration Methods: A Framework for Comparison.Google Scholar
- Isabel F. Cruz and Huiyong Xiao. 2005. The role of ontologies in data integration. Engineering Intelligent Systems 13, 4 (2005), 245--252.Google Scholar
- Isabel F. Cruz and Huiyong Xiao. 2009. Ontology Driven Data Integration in Heterogeneous Networks. In Complex Systems in Knowledge-based Environments: Theory, Models and Applications. 75--98.Google Scholar
- Nadine Cullot, Raji Ghawi, , and Kokou Ytongnon. 2007. DB2OWL: A Tool for Automatic Database-to-Ontology Mapping. In Proceedings of the 15th Italian Symposium on Advanced Database Systems(SEBD). 491--494.Google Scholar
- C.P. de Laborda and S. Conrad. 2005. Relational.OWL - A Data and Schema Representation Format Based on OWL. In In Proc. Second Asia-Pacific Conference on Conceptual Modelling (APCCM2005). 89--96. Google ScholarDigital Library
- A. Galvis. 2010. Messaging Design Pattern and Pattern Implementation. https://java.net/downloads/jt/MDP.pdf. (2010).Google Scholar
- Georg Gottlob and Pierre Senellart. 2010. Schema mapping discovery from data instances. Journal of the ACM (JACM) 57, 2 (2010), 6. Google ScholarDigital Library
- T.R. Gruber. 1993. A translation approach to portable ontology specifications. Knowledge Acquisition 5, 2 (1993), 199--220. Google ScholarDigital Library
- N. Guarino. 1998. Formal Ontology in Information Systems. IOS Press, Amsterdam. Google ScholarDigital Library
- H.A.Hashim, A. Ahmed, N. Salim, A.Osman O.Y.Sheng, A.Sim, A.Bakri, N.H.Zakaria, R.Ibrahim, and S.S.Omar. 2005. A New Database Integration Model Using An Ontology-Driven Mediated Warehousing Approach. Journal of Theoretical and Applied Information Technology 58, 2 (2005), 392--409.Google Scholar
- Abdolreza Hajmoosaei and Sameem Abdul-Kareem. 2007. An ontology-based approach for resolving semantic schema conflicts in the extraction and integration of query-based information from heterogeneous web data sources. In In Proceedings of the Third Australasian Workshop on Advances in Ontologies. ACM, 35--43. Google ScholarDigital Library
- Spyros Kotoulas, Vanessa Lopez, Raymond Lloyd, Marco Luca Sbodio, Freddy Lecue, Martin Stephenson, Elizabeth Daly, Veli Bicer, Aris Gkoulalas-Divanis, Giusy Di Lorenzo, Anika Schumann, and Pol Mac Aonghusa. 2014. SPUD - Semantic Processing of Urban Data. Journal of Web Semantics 24, 0 (2014).Google ScholarCross Ref
- Eric Lambrecht, Subbarao Kambhampati, and Senthil Gnanaprakasam. 1999. Optimizing Recursive Information-Gathering Plans.. In IJCAI. Morgan Kaufmann, 1204--1211. Google ScholarDigital Library
- Alon Levy, Anand Rajaraman, and Joann Ordille. 1996. Querying Heterogeneous Information Sources Using Source Descriptions. In Proceedings of the International Conference on Very Large Data Bases. 251--262. Google ScholarDigital Library
- Sun Microsystems. 2002. Core J2EE Patterns - Data Access Object. http://www.oracle.com/technetwork/java/dataaccessobject-138824.html. (2002).Google Scholar
- Natayala F. Noy. 2004. Semantic integration: a survey of ontology-based approaches. 33, 4 (2004), 65--70. Google ScholarDigital Library
- B. Pan, Y. Zheng, D. Wilkie, and C. Shahabi. 2013. Crowd sensing of traffic anomalies based on human mobility and social media. In In Proceedings of the 21th ACM SIGSPATIAL Conference on Advances in Geographical Information Systems. ACM, 344--353. Google ScholarDigital Library
- Jeffrey D. Ullman. 1997. Information Integration Using Logical Views. In Proceedings of the 6th International Conference on Database Theory. Springer-Verlag, 19--40. Google ScholarDigital Library
- H. Wache, T. Vgele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. Hbner. 2001. Ontology-Based Integration of Information - A Survey of Existing Approaches. In IJCAI-01 Workshop: Ontologies and Information. 108--117.Google Scholar
- Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. 2014. Urban computing: Concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol 5, 3 (2014), 3--57. Google ScholarDigital Library
Index Terms
- HDSanalytics: a data analytics framework for heterogeneous data sources
Recommendations
Research and Design of Interactive Data Transformation and Migration System for Heterogeneous Data Sources
ICIE '09: Proceedings of the 2009 WASE International Conference on Information Engineering - Volume 01To solve the problems of data transformation and migration in heterogeneous environment, an interactive data transformation and migration method for heterogeneous data sources is proposed. The basic theory of data transformation and migration is ...
Methodology of Big Data Integration from A Priori Unknown Heterogeneous Data Sources
CSAI '18: Proceedings of the 2018 2nd International Conference on Computer Science and Artificial IntelligenceThe success of data preparation for Big Data analytics directly depends on the quality of data integration from heterogeneous data sources. Extract, Transform and Load (ETL) systems have proved to be an efficient solution for this task. But to the ...
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
AbstractConsolidation of the research information improves the quality of data integration, reducing duplicates between systems and enabling the required flexibility and scalability when processing various data sources. We assume that the combination of a ...
Comments