Abstract
Measuring distance or some other form of proximity between objects is a standard data mining tool. Connection subgraphs were recently proposed as a way to demonstrate proximity between nodes in networks. We propose a new way of measuring and extracting proximity in networks called “cycle-free effective conductance” (CFEC). Importantly, the measured proximity is accompanied with a proximity subgraph which allows assessing and understanding measured values. Our proximity calculation can handle more than two endpoints, directed edges, is statistically well behaved, and produces an effectiveness score for the computed subgraphs. We provide an efficient algorithm to measure and extract proximity. Also, we report experimental results and show examples for four large network datasets: a telecommunications calling graph, the IMDB actors graph, an academic coauthorship network, and a movie recommendation system.
- Barabasi, A.-L. and Albert, R. 1999. Emergence of scaling in random networks. Sci. 286, 509--512.Google ScholarCross Ref
- Bell, R. M., Koren, Y., and Volinsky, C. 2007. Modeling relationships at multiple scales to improve accuracy of large recommender systems. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarDigital Library
- Bhattacharya, I. and Getoor, L. 2005. Relational clustering for multi-type entity resolution. In Proceedings of the 11th ACM SIGKDD Workshop on Multi Relational Data Mining, 3--11. Google ScholarDigital Library
- Bollobas, B. 1998. Modern Graph Theory. Springer.Google Scholar
- Brandes, U. and Fleischer, D. 2005. Centrality measures based on current flow. In Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS). Lecture Notes in Computer Science, vol. 3404, Springer, 533--544. Google ScholarDigital Library
- Cohen, J. D. 1997. Drawing graphs to convey proximity: An incremental arrangement method. ACM Trans. Comput.-Hum. Interact. 4, 3, 197--229. Google ScholarDigital Library
- Cormen, T. H., Leiserson, C. L., and Rivest, R. L. 1990. Introduction to Algorithms. McGraw-Hill/MIT Press. Google ScholarDigital Library
- dblp. 1998. DBLP computer science bibliography. dblp.uni-trier.de.Google Scholar
- Doyle, P. G. and Snell, J. L. 1984. Random Walks and Electrical Networks. Mathematical Association of America. http://arxiv.org/abs/math.PR/0001057.Google Scholar
- Faloutsos, C., McCurley, K. S., and Tomkins, A. 2004. Fast discovery of connection subgraphs. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 118--127. Google ScholarDigital Library
- Flake, G., Lawrence, S., and Giles, C. L. 2000. Efficient identification of web communities. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, 150--160. Google ScholarDigital Library
- Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability, A Guide to the Theory of NP-Completeness. W.H. Freeman, New York. Google ScholarDigital Library
- Gibson, D., Kleinberg, J. M., and Raghavan, P. 1998. Inferring Web communities from link topology. In Proceedings of the 9th ACM Conference on Hypertext and Hypermedia, Pittsburgh, Pennsylvania, 225--234. Google ScholarDigital Library
- Hadjiconstantinou, E. and Christofides, N. 1999. An efficient implementation of an algorithm for finding k shortest simple paths. Netw. 34, 88--101.Google ScholarCross Ref
- imdb. 2007. Internet Movie Database. www.imdb.com.Google Scholar
- John Hershberger, J. M. M. and Suri, S. 2003. Finding the k shortest simple paths: A new algorithm and its implementation. In Proceedings of the 5th Workshop on Algorithm Engineering and Experimentation (ALENEX). SIAM, 26--36.Google Scholar
- Katoh, N., Ibaraki, T., and Mine, H. 1982. An efficient algorithm for k shortest simple paths. Netw. 12, 411--427.Google ScholarCross Ref
- Koren, Y., North, S. C., and Volinsky, C. 2006. Measuring and extracting proximity in networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 245--255. Google ScholarDigital Library
- Lang, K. 2004. Finding good nearly balanced cuts in power law graphs. Tech. Rep., Yahoo Research Labs.Google Scholar
- Liben-Nowell, D. and Kleinberg, J. M. 2003. The link prediction problem for social networks. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), ACM, 556--559. Google ScholarDigital Library
- Myers, C. L., Robson, D., Wible, A., Hibbs, M. A., Chiriac, C., Theesfeld, C. L., Dolinski, K., and Troyanskaya, O. G. 2005. Discovery of biological networks from diverse functional genomic data. Genome Biol. 6, R114.Google ScholarCross Ref
- netflix. 2007. Netflix prize. www.netflixprize.com.Google Scholar
- Popescul, A. and Ungar, L. H. 2003. Statistical relational learning for link prediction. In Proceedings of the Workshop on Learning Statistical Models from Relational Data (IJCAI).Google Scholar
- Salakhutdinov, R., Mnih, A., and Hinton, G. 2007. Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th International Conference on Machine Learning (ICML), 791--798. Google ScholarDigital Library
- Tenenbaum, J. B., de Silva, V., and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. Sci. 290, 2319--2323.Google ScholarCross Ref
- Tong, H. and Faloutsos, C. 2006. Center-Piece subgraphs: Problem definition and fast solutions. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 404--413. Google ScholarDigital Library
- Winkler, W. E. 1999. The state of record linkage and current research problems. Tech. Rep., Statistical Research Division, U.S. Bureau of the Census.Google Scholar
Index Terms
- Measuring and extracting proximity graphs in networks
Recommendations
Measuring and extracting proximity in networks
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningMeasuring distance or some other form of proximity between objects is a standard data mining tool. Connection subgraphs were recently proposed as a way to demonstrate proximity between nodes in networks. We propose a new way of measuring and extracting ...
Fast direction-aware proximity for graph mining
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data miningIn this paper we study asymmetric proximity measures on directed graphs, which quantify the relationships between two nodes or two groups of nodes. The measures are useful in several graph mining tasks, including clustering, link prediction and ...
Nordhaus-Gaddum relations for proximity and remoteness in graphs
The transmission of a vertex in a connected graph is the sum of all distances from that vertex to the others. It is said to be normalized if divided by n-1, where n denotes the order of the graph. The proximity of a graph is the minimum normalized ...
Comments