Abstract:
The discounted hitting time (DHT), which is a random-walk similarity measure for graph node pairs, is useful in various applications, including link prediction, collabora...Show MoreMetadata
Abstract:
The discounted hitting time (DHT), which is a random-walk similarity measure for graph node pairs, is useful in various applications, including link prediction, collaborative recommendation, and reputation ranking. We examine a novel query, called the multi-way join (or n-way join), on DHT scores. Given a graph and n sets of nodes, the n-way join retrieves a set of n-tuples with the k highest scores, according to some aggregation function of DHT values. This query enables analysis and prediction of complex relationship among n sets of nodes. Since an n-way join is expensive to compute, we develop the Partial Join algorithm (or PJ). This solution decomposes an n-way join into a number of top-m 2-way joins, and combines their results to construct the answer of the n-way join. Since PJ may necessitate the computation of top-(m+ 1) 2-way joins, we study an incremental solution, which allows the top-(m+ 1) 2-way join to be derived quickly from the top-m 2-way join results earlier computed. We further examine fast processing and pruning algorithms for 2-way joins. An extensive evaluation on three real datasets shows that PJ accurately evaluates n-way joins, and is four orders of magnitude faster than basic solutions.
Date of Conference: 31 March 2014 - 04 April 2014
Date Added to IEEE Xplore: 19 May 2014
Electronic ISBN:978-1-4799-2555-1