ABSTRACT
Finding similar entities is a fundamental problem in graph data analysis. Similarity search algorithms usually leverage the structural properties of the database to quantify the degree of similarity between entities. However, the same information can be represented in different structures and the structural properties observed over particular representations may not hold for the alternatives. These algorithms are effective on some representations and ineffective on others. We define the property of representation independence for similarity search algorithms as their robustness against transformations that modify the structure of databases but preserve the information content. We introduce a widespread group of such transformations called relationship reorganizing. We propose an algorithm called R-PathSim, which is provably robust under relationship reorganizing. Our empirical results show that current algorithms except R-PathSim are highly sensitive to the data representation and R-PathSim is as efficient and effective as other algorithms.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases: The Logical Level. Addison-Wesley, 1994. Google ScholarDigital Library
- Y. Chodpathumwan, A. Aleyasin, A. Termehchy, and Y. Sun. Representation Independent Proximity and Similarity Search. 2015, arXiv:1508.03763 {cs.DB}.Google Scholar
- W. Fan and P. Bohannon. Information Preserving XML Schema Embedding. TODS, 33(1), 2008. Google ScholarDigital Library
- A. Hogana, M. Arenas, A. Mallea, and A. Polleres. Everything you always wanted to know about blank nodes. Web Semantics, 2014. Google ScholarDigital Library
- G. Jeh and J. Widom. SimRank: A Measure of Structural-context Similarity. In KDD, 2002. Google ScholarDigital Library
- Y. Sun, J. Han, X. Yan, S. P. Yu, and T. Wu. PathSim: MetaPath-Based Top-K Similarity Search in Heterogeneous Information Networks. In VLDB, 2011.Google ScholarDigital Library
- A. Termehchy, M. Winslett, Y. Chodpathumwan, and A. Gibbons. Design Independent Query Interfaces. TKDE, 2012. Google ScholarDigital Library
- H. Tong and C. Faloutsos. Center-Piece Subgraphs: Problem Definition and Fast Solutions. In KDD, 2006. Google ScholarDigital Library
- H. Tong, C. Faloutsos, and J. Pan. Fast Random Walk with Restart and its Applications. In ICDM, 2006. Google ScholarDigital Library
Index Terms
- Towards Representation Independent Similarity Search Over Graph Databases
Recommendations
A methodology for supporting existing CODASYL databases with new database machines
ACM '78: Proceedings of the 1978 annual conference - Volume 2In this paper, an attempt is made to show that conventional database management system software, in particular those of CODASYL type, can be effectively replaced by database machines with good performance. The replacement of CODASYL system software ...
Toward Representation Independent Similarity Search Over Graphs
GRADES'14: Proceedings of Workshop on GRAph Data management Experiences and SystemsFinding similar entities over data graphs is an important problem with many applications. Current similarity search algorithms use intuitively appealing heuristics that leverage the link information in the data graph to quantify the degree of similarity ...
Comments