Skip to main content

A Survey of Graph Mining for Web Applications

  • Chapter
  • First Online:
Managing and Mining Graph Data

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

Abstract

Graph structures provide a general framework for modeling entities and their relationships, and they are routinely used to describe a wide variety of data such as the Internet, the web, social networks, metabolic networks, protein-interaction networks, food webs, citation networks, and many more. In recent years, there has been an increasing amount of literature on studying properties, models, and algorithms for graph data. In this chapter we provide a brief overview of graph- mining algorithms for web and social-media applications. We review a wide range of algorithms, such as those for estimating reputation and popularity of items in a network, mining query logs and performing query recommendations. The main goal of the chapter is to provide the reader with an understanding of how graph structural mining algorithms can be exploited in the context of web applications. This highlights the challenges of, and provides an understanding of the power of graph mining in the context of web and social-media applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lada A. Adamic, Jun Zhang, Eytan Bakshy, and Mark S. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In Proceedings of the 17th international conference on World Wide Web (WWW), 2008.

    Google Scholar 

  2. Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne. Finding high quality content in social media, with an application to community-based question answering. In Proceedings of ACM WSDM, pages 183–194, Stanford, CA, USA, February 2008. ACM Press.

    Chapter  Google Scholar 

  3. Ricardo Baeza-Yates. Graphs from search engine queries. In Theory and Practice of Computer Science (SOFSEM), 2007.

    Google Scholar 

  4. Ricardo Baeza-Yates and Alessandro Tiberi. Extracting semantic relations from query logs. In Proceedings of the 13th ACM international conference on Knowledge discovery and data mining (KDD), 2007.

    Google Scholar 

  5. Ricardo A. Baeza-Yates, Carlos A. Hurtado, and Marcelo Mendoza. Query recommendation using query logs in search engines. In Current Trends in Database Technology – EDBT Workshops, 2004.

    Google Scholar 

  6. A. L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.

    Article  MathSciNet  Google Scholar 

  7. Albert-Laszlo Barabasi. Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life. Plume Books, April 2002.

    Google Scholar 

  8. L. Becchetti, C. Castillo, D. Donato, R. Baeza-Yates, and S. Leonardi. Link analysis for web spam detection. ACM Transactions on the Web (TWEB), 2(1):1–42, February 2008.

    Article  Google Scholar 

  9. Doug Beeferman and Adam Berger. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACM international conference on Knowledge discovery and data mining (KDD), 2000.

    Google Scholar 

  10. Nicholas J. Belkin. The human element: helping people find what they don’t know. Communications of the ACM, 43(8), 2000.

    Google Scholar 

  11. Jiang Bian, Yandong Liu, Ding Zhou, Eugene Agichtein, and Hongyuan Zha. Learning to recognize reliable users and content in social media with coupled mutual reinforcement. In Proceedings of the 18th international conference on World Wide Web (WWW), 2009.

    Google Scholar 

  12. P. Boldi, R. Posenato, M. Santini, and S. Vigna. Traps and pitfalls of topic-biased pagerank. In Proceedings of the 4th International Workshop on Algorithms and Models for the Web-Graph (WAW), 2008.

    Google Scholar 

  13. Paolo Boldi, Francesco Bonchi, Carlos Castillo, Debora Donato, Aristides Gionis, and Sebastiano Vigna. The query-flow graph: model and applications. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.

    Google Scholar 

  14. Francesco Bonchi, Carlos Castillo, Debora Donato, and Aristides Gionis. Topical query decomposition. In Proceedings of the 14th ACM international conference on Knowledge discovery and data mining (KDD), 2008.

    Google Scholar 

  15. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engines. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998.

    Article  Google Scholar 

  16. Guido Caldarelli. Scale-Free Networks. Oxford University Press, 2007.

    Google Scholar 

  17. Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, and Hang Li. Context-aware query suggestion by mining click-through and session data. In Proceeding of the 14th ACM international conference on Knowledge discovery and data mining (KDD), 2008.

    Google Scholar 

  18. Carlos Castillo, Debora Donato, and Aristides Gionis. Estimating the number of citations of a paper using author reputation. In String Processing and Information Retrieval Symposium (SPIRE), 2007.

    Google Scholar 

  19. L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 6, 1995.

    Google Scholar 

  20. Hyunwoo Chun, Haewoon Kwak, Young H. Eom, Yong Y. Ahn, Sue Moon, and Hawoong Jeong. Comparison of online social relations in volume vs interaction: a case study of cyworld. In Proceedings of the 8th ACM SIGCOMM conference on Internet measurement (IMC), 2008.

    Google Scholar 

  21. CiteSeer, http://citeseer.com.

  22. Nick Craswell, Rosie Jones, Georges Dupret, and Evelyne Viegas, editors. Workshop on Web Search Click Data (WSCD), held in conjunction with WSDM, Barcelona, Spain, 2009.

    Google Scholar 

  23. Nick Craswell and Martin Szummer. Random walks on the click graph. In Proceedings of the 30th annual international ACM conference on Research and development in information retrieval (SIGIR), 2007.

    Google Scholar 

  24. G. M. Del Corso, A. Gulli, and F. Romani. Fast pagerank computation via a sparse linear system. Internet Mathematics, 2(3), 2005.

    Google Scholar 

  25. Alex Fabrikant, Elias Koutsoupias, and Christos Papadimitriou. Heuristically optimized trade-offs: A new paradigm for power laws in the internet. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming (ICALP), 2002.

    Google Scholar 

  26. Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On powerlaw relationships of the internet topology. In Proceedings of the annual ACM conference on Data Communication (SIGCOMM), 1999.

    Google Scholar 

  27. D. Fogaras, B. Racz, K. Csalogany, and T. Sarlos. Towards scaling fully personalized pageRank: algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333–358, 2005.

    MATH  MathSciNet  Google Scholar 

  28. Bruno M. Fonseca, Paulo Braz Golgher, Edleno Silva de Moura, Bruno Possas, and Nivio Ziviani. Discovering search engine related queries using association rules. Journal of Web Engineering, 2(4), 2004.

    Google Scholar 

  29. Ko Fujimura and Naoto Tanimoto. The eigenrumor algorithm for calculating contributions in cyberspace communities. Trusting Agents for Trusting Electronic Societies, pages 59–74, 2005.

    Google Scholar 

  30. Robert Gunning. The technique of clear writing. McGraw-Hill, 1952.

    Google Scholar 

  31. Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating Web spam with TrustRank. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pages 576–587, Toronto, Canada, August 2004. Morgan Kaufmann.

    Google Scholar 

  32. T.H. Haveliwala. Topic-sensitive pagerank. In Proceedings of the eleventh International World Wide Web Conference (WWW), Honolulu, Hawaii, 2002.

    Google Scholar 

  33. Rosie Jones and Kristina L. Klinkner. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 16th ACM conference on Conference on information and knowledge management (CIKM), 2008.

    Google Scholar 

  34. Rosie Jones, Ravi Kumar, Bo Pang, and Andrew Tomkins. I know what you did last summer: query logs and user privacy. In Proceeding of the 16th ACM conference on Information and knowledge management (CIKM), 2007.

    Google Scholar 

  35. Rosie Jones, Ravi Kumar, Bo Pang, and Andrew Tomkins. Vanity fair: privacy in querylog bundles. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.

    Google Scholar 

  36. S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University, 2003.

    Google Scholar 

  37. J. Peter Kincaid, Robert P. Fishburn, Richard L. Rogers, and Brad S. Chissom. Derivation of new readability formulas for navy enlisted personnel. Technical Report Research Branch Report 8–75, Millington, Tenn, Naval Air Station, 1975.

    Google Scholar 

  38. Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of ACM, 46(5), 1999.

    Google Scholar 

  39. Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mishra, and Alexandros Ntoulas. Releasing search queries and clicks privately. In Proceedings of the 18th international conference on World Wide Web (WWW), 2009.

    Google Scholar 

  40. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), 2000.

    Google Scholar 

  41. Ravi Kumar, Jasmine Novak, Bo Pang, and Andrew Tomkins. On anonymizing query logs via token-based hashing. In Proceedings of the 16th international conference on World Wide Web (WWW), 2007.

    Google Scholar 

  42. A.N. Langville and C.D. Meyer. Updating pagerank with iterative aggregation. In Proceedings of the 13th International World Wide Web Conference on Alternate track papers & posters (WWW), New York, NY, USA, 2004.

    Google Scholar 

  43. G. Harry McLaughlin. SMOG grading: A new readability formula. Journal of Reading, 12(8):639–646, 1969.

    Google Scholar 

  44. Qiaozhu Mei, Dengyong Zhou, and Kenneth Church. Query suggestion using hitting time. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.

    Google Scholar 

  45. Michael Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1(2), 2003.

    Google Scholar 

  46. M. Newman. Power laws, pareto distributions and zipf’s law. Contemporary Physics, 2005.

    Google Scholar 

  47. M. E. J. Newman and Juyong Park. Why social networks are different from other types of networks. Physical Review E, 68(3):036122, Sep 2003.

    Article  Google Scholar 

  48. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.

    Google Scholar 

  49. Romualdo Pastor-Satorras, Alexei Vazquez, and Alessandro Vespignani. Dynamical and correlation properties of the internet. Physical Review Letters, 87(25):258701, Nov 2001.

    Article  Google Scholar 

  50. Benjamin Piwowarski and Hugo Zaragoza. Predictive user click models based on click-through history. In Proceedings of the 16th ACM conference on Conference on information and knowledge management (CIKM), 2007.

    Google Scholar 

  51. Barbara Poblete and Ricardo Baeza-Yates. A content and structure website mining model. In Proceedings of the 15th international conference on World Wide Web (WWW), 2006.

    Google Scholar 

  52. Barbara Poblete, Carlos Castillo, and Aristides Gionis. Dr. searcher and mr. browser: a unified hyperlink-click graph. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.

    Google Scholar 

  53. Filip Radlinski and Thorsten Joachims. Query chains: learning to rank from implicit feedback. In Proceeding of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, 2005.

    Google Scholar 

  54. Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy. Estimating pagerank on graph streams. In Proceedings of the 27th ACM Symposium on Principles of Database Systems (PODS), 2008.

    Google Scholar 

  55. Stephan H. Strogatz. Exploring complex networks. Nature, 410(6825):268–276, March 2001.

    Article  Google Scholar 

  56. Qi Su, Dmitry Pavlov, Jyh-Herng Chow, and Wendell C. Baker. Internet-scale collection of human-reviewed data. In Proceedings of the 16th international conference on World Wide Web (WWW), pages 231–240, New York, NY, USA, 2007. ACM Press.

    Chapter  Google Scholar 

  57. Jaime Teevan, Eytan Adar, Rosie Jones, and Michael A. S. Potts. Information re-retrieval: repeat queries in yahoo’s logs. In Proceedings of the 30th annual international ACM conference on Research and development in information retrieval (SIGIR), 2007.

    Google Scholar 

  58. Stanley Wasserman and Katherine Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.

    Google Scholar 

  59. Ji-Rong Wen, Jian-Yun Nie, and Hong-Jiang Zhang. Clustering user queries of a search engine. In Proceedings of the 10th international conference on World Wide Web (WWW), 2001.

    Google Scholar 

  60. Gui-Rong Xue, Hua-Jun Zeng, Zheng Chen, Yong Yu, Wei-Ying Ma, WenSi Xi, and WeiGuo Fan. Optimizing web search using web click-through data. In Proceedings of the 13th ACM international conference on Information and knowledge management (CIKM), 2004.

    Google Scholar 

  61. Yahoo! Answers, http://answers.yahoo.com.

  62. Soon-Hyung Yook, Filippo Radicchi, and Hildegard Meyer-Ortmanns. Self-similar scale-free networks and disassortativity, Jul 2005.

    Google Scholar 

  63. Jun Zhang, Mark S. Ackerman, and Lada Adamic. Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference on World Wide Web (WWW), 2007.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debora Donato .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag US

About this chapter

Cite this chapter

Donato, D., Gionis, A. (2010). A Survey of Graph Mining for Web Applications. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6045-0_15

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-6044-3

  • Online ISBN: 978-1-4419-6045-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics