ABSTRACT
Many issues about data quality have been studied in relational data, such as data consistency, data deduplication, data accuracy, data completeness and so on. In this paper, we focus on the discovery of abnormal data in RDF graphs. As the amount of RDF data is increasing, data quality is becoming an important issue for usability of these RDF repositories. Although association rules have been used to find abnormals in RDF graph, existing solutions ignore the latent semantics of connected structures in RDF graphs. In order to detect latent dependencies in RDF graph, firstly, we innovatively define Graph-based Conditional Functional Dependency(GCFD) that can represent the attribute value and semantic structure dependencies of RDF data in a uniform manner. Then, we propose an efficient framework and some novel pruning rules to discover GCFD in large RDF graphs. Extensive experiments on several real-life RDF repositories confirm the superiority of our solution.
- Z. Abedjan and F. Naumann. Improving rdf data through association rule mining. Datenbank-Spektrum, 13(2):111--120, 2013.Google ScholarCross Ref
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, pages 722--735, 2007. Google ScholarDigital Library
- P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755, 2007.Google ScholarCross Ref
- K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD Conference, pages 1247--1250, 2008. Google ScholarDigital Library
- W. W. Eckerson. Data Quality and the Bottom Line: Achieving Business Success through a Commitment to High Quality Data. TDWI Report Series, The Data Warehousing Institute, Seattle, USA, February 2002.Google Scholar
- W. Fan, F. Geerts, J. Li, and M. Xiong. Discovering conditional functional dependencies. IEEE Trans. Knowl. Data Eng., 23(5):683--698, 2011. Google ScholarDigital Library
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD Conference, pages 1--12, 2000. Google ScholarDigital Library
- Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. Tane: An efficient algorithm for discovering functional and approximate dependencies. Comput. J., 42(2):100--111, 1999.Google ScholarCross Ref
- A. B. Kahn. Topological sorting of large networks. Commun. ACM, 5(11):558--562, 1962. Google ScholarDigital Library
- A. Serge, H. Richard, and V. Victor. Foundations of Databases. Addison-Wesley Reading Massachusetts, 1995.Google Scholar
- F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, pages 697--706, 2007. Google ScholarDigital Library
- C. M. Wyss, C. Giannella, and E. L. Robertson. Fastfds: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances - extended abstract. In DaWaK, pages 101--110, 2001. Google ScholarDigital Library
- Y. Yu and J. Heflin. Extending functional dependency to detect abnormal data in rdf graphs. In International Semantic Web Conference (1), pages 794--809, 2011. Google ScholarDigital Library
Index Terms
- Using Conditional Functional Dependency to Discover Abnormal Data in RDF Graphs
Recommendations
Extending functional dependency to detect abnormal data in RDF graphs
ISWC'11: Proceedings of the 10th international conference on The semantic web - Volume Part IData quality issues arise in the Semantic Web because data is created by diverse people and/or automated tools. In particular, erroneous triples may occur due to factual errors in the original data source, the acquisition tools employed, misuse of ...
Detecting Abnormal Semantic Web Data Using Semantic Dependency
ICSC '11: Proceedings of the 2011 IEEE Fifth International Conference on Semantic ComputingData quality is a critical problem for the Semantic Web. We propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. Inspired by data dependency, which has shown promise in database ...
Interpreting SWRL Rules in RDF Graphs
An unresolved issue in SWRL (the Semantic Web Rule Language) is whether the intended semantics of its RDF representation can be described as an extension of the W3C RDF semantics. In this paper we propose to make the model-theoretic semantics of SWRL ...
Comments