skip to main content
article

Measuring and extracting proximity graphs in networks

Published:01 December 2007Publication History
Skip Abstract Section

Abstract

Measuring distance or some other form of proximity between objects is a standard data mining tool. Connection subgraphs were recently proposed as a way to demonstrate proximity between nodes in networks. We propose a new way of measuring and extracting proximity in networks called “cycle-free effective conductance” (CFEC). Importantly, the measured proximity is accompanied with a proximity subgraph which allows assessing and understanding measured values. Our proximity calculation can handle more than two endpoints, directed edges, is statistically well behaved, and produces an effectiveness score for the computed subgraphs. We provide an efficient algorithm to measure and extract proximity. Also, we report experimental results and show examples for four large network datasets: a telecommunications calling graph, the IMDB actors graph, an academic coauthorship network, and a movie recommendation system.

References

  1. Barabasi, A.-L. and Albert, R. 1999. Emergence of scaling in random networks. Sci. 286, 509--512.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bell, R. M., Koren, Y., and Volinsky, C. 2007. Modeling relationships at multiple scales to improve accuracy of large recommender systems. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bhattacharya, I. and Getoor, L. 2005. Relational clustering for multi-type entity resolution. In Proceedings of the 11th ACM SIGKDD Workshop on Multi Relational Data Mining, 3--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bollobas, B. 1998. Modern Graph Theory. Springer.Google ScholarGoogle Scholar
  5. Brandes, U. and Fleischer, D. 2005. Centrality measures based on current flow. In Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS). Lecture Notes in Computer Science, vol. 3404, Springer, 533--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cohen, J. D. 1997. Drawing graphs to convey proximity: An incremental arrangement method. ACM Trans. Comput.-Hum. Interact. 4, 3, 197--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cormen, T. H., Leiserson, C. L., and Rivest, R. L. 1990. Introduction to Algorithms. McGraw-Hill/MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. dblp. 1998. DBLP computer science bibliography. dblp.uni-trier.de.Google ScholarGoogle Scholar
  9. Doyle, P. G. and Snell, J. L. 1984. Random Walks and Electrical Networks. Mathematical Association of America. http://arxiv.org/abs/math.PR/0001057.Google ScholarGoogle Scholar
  10. Faloutsos, C., McCurley, K. S., and Tomkins, A. 2004. Fast discovery of connection subgraphs. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 118--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Flake, G., Lawrence, S., and Giles, C. L. 2000. Efficient identification of web communities. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, 150--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability, A Guide to the Theory of NP-Completeness. W.H. Freeman, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gibson, D., Kleinberg, J. M., and Raghavan, P. 1998. Inferring Web communities from link topology. In Proceedings of the 9th ACM Conference on Hypertext and Hypermedia, Pittsburgh, Pennsylvania, 225--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hadjiconstantinou, E. and Christofides, N. 1999. An efficient implementation of an algorithm for finding k shortest simple paths. Netw. 34, 88--101.Google ScholarGoogle ScholarCross RefCross Ref
  15. imdb. 2007. Internet Movie Database. www.imdb.com.Google ScholarGoogle Scholar
  16. John Hershberger, J. M. M. and Suri, S. 2003. Finding the k shortest simple paths: A new algorithm and its implementation. In Proceedings of the 5th Workshop on Algorithm Engineering and Experimentation (ALENEX). SIAM, 26--36.Google ScholarGoogle Scholar
  17. Katoh, N., Ibaraki, T., and Mine, H. 1982. An efficient algorithm for k shortest simple paths. Netw. 12, 411--427.Google ScholarGoogle ScholarCross RefCross Ref
  18. Koren, Y., North, S. C., and Volinsky, C. 2006. Measuring and extracting proximity in networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 245--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lang, K. 2004. Finding good nearly balanced cuts in power law graphs. Tech. Rep., Yahoo Research Labs.Google ScholarGoogle Scholar
  20. Liben-Nowell, D. and Kleinberg, J. M. 2003. The link prediction problem for social networks. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), ACM, 556--559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Myers, C. L., Robson, D., Wible, A., Hibbs, M. A., Chiriac, C., Theesfeld, C. L., Dolinski, K., and Troyanskaya, O. G. 2005. Discovery of biological networks from diverse functional genomic data. Genome Biol. 6, R114.Google ScholarGoogle ScholarCross RefCross Ref
  22. netflix. 2007. Netflix prize. www.netflixprize.com.Google ScholarGoogle Scholar
  23. Popescul, A. and Ungar, L. H. 2003. Statistical relational learning for link prediction. In Proceedings of the Workshop on Learning Statistical Models from Relational Data (IJCAI).Google ScholarGoogle Scholar
  24. Salakhutdinov, R., Mnih, A., and Hinton, G. 2007. Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th International Conference on Machine Learning (ICML), 791--798. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tenenbaum, J. B., de Silva, V., and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. Sci. 290, 2319--2323.Google ScholarGoogle ScholarCross RefCross Ref
  26. Tong, H. and Faloutsos, C. 2006. Center-Piece subgraphs: Problem definition and fast solutions. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 404--413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Winkler, W. E. 1999. The state of record linkage and current research problems. Tech. Rep., Statistical Research Division, U.S. Bureau of the Census.Google ScholarGoogle Scholar

Index Terms

  1. Measuring and extracting proximity graphs in networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader