ABSTRACT
Many real-world applications have to face the problem of diversity in data formats and semantics. Currently, how to deal with heterogeneous data effectively is still a big challenge. With the rise of knowledge graphs, more and more applications are built upon graph-like data models, which benefit from flexible schemas and convenient support for relationship queries. We propose a graph-based unifying system for heterogeneous data unification, which helps to (1) transform data in many other formats into graphs, or conversely, from graph to other formats, (2) integrate graph data based on HAO intelligence, which achieves schema integration and entity consolidation, and (3) explore data at different levels via querying the integrated graphs. In this paper, we introduce the overall system architecture, explain in detail the implementation, and display the usage in two practical scenarios.
- Dippy Aggarwal and Karen C. Davis. 2018. Employing Graph Databases as a Standardization Model for Addressing Heterogeneity and Integration. In Quality Software Through Reuse and Integration,, Stuart H. Rubin and Thouraya Bouabana-Tebibel (Eds.). Springer International Publishing, Cham, 109--138.Google Scholar
- AnHai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration. Morgan Kaufmann, Boston, MA, USA. Google ScholarDigital Library
- Xin Luna Dong and Divesh Srivastava. 2015. Big Data Integration. Synthesis Lectures on Data Management, Vol. 7, 1 (2015), 1--198.Google ScholarCross Ref
- Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep Entity Matching with Pre-Trained Language Models. Proc. VLDB Endow., Vol. 14, 1 (Sept. 2020), 50--60. Google ScholarDigital Library
- Sofía Maiolo, Lorena Etcheverry, and Adriana Marotta. 2020. Data Profiling in Property Graph Databases. Journal of Data and Information Quality, Vol. 12, 4 (2020), 1--27. Google ScholarDigital Library
- Neo4j. 2020. Developer Guides: Data Import. https://neo4j.com/developer/data-import/ Retrieved November 15, 2020 fromGoogle Scholar
- Michael Stonebraker. 2017. The seven tenets of scalable data unification. Tamr Inc (2017).Google Scholar
- Minghui Wu and Xindong Wu. 2019. On big wisdom. Knowledge and Information Systems, Vol. 58, 1 (2019), 1--8. Google ScholarDigital Library
- Xindong Wu, Gongqing Wu, Xingquan Zhu, and Wei Ding. 2014. Data mining with big data. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, 1 (2014), 97--107. Google ScholarDigital Library
Index Terms
HAO Unity: A Graph-based System for Unifying Heterogeneous Data
Recommendations
Towards the preservation of functional dependency in XML data transformation
With the advent of XML as a data representation and exchange format over the web, a massive amount of data is being stored in XML. As the use of XML grows rapidly, the task of data transformation for integration purposes in XML is getting much ...
Conceptual modeling of XML schemas
WIDM '03: Proceedings of the 5th ACM international workshop on Web information and data managementXML has become the standard format for representing structured and semi-structured data on the Web. To describe the structure and content of XML data, several XML schema languages have been proposed. Although being very useful for validating XML ...
Composing schema mappings: Second-order dependencies to the rescue
Special Issue: SIGMOD/PODS 2004A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). A fundamental problem is composing schema mappings: given ...
Comments