A Survey of Graph Mining for Web Applications

Donato, Debora; Gionis, Aristides

doi:10.1007/978-1-4419-6045-0_15

Debora Donato³ &
Aristides Gionis³

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

7277 Accesses
4 Citations

Abstract

Graph structures provide a general framework for modeling entities and their relationships, and they are routinely used to describe a wide variety of data such as the Internet, the web, social networks, metabolic networks, protein-interaction networks, food webs, citation networks, and many more. In recent years, there has been an increasing amount of literature on studying properties, models, and algorithms for graph data. In this chapter we provide a brief overview of graph- mining algorithms for web and social-media applications. We review a wide range of algorithms, such as those for estimating reputation and popularity of items in a network, mining query logs and performing query recommendations. The main goal of the chapter is to provide the reader with an understanding of how graph structural mining algorithms can be exploited in the context of web applications. This highlights the challenges of, and provides an understanding of the power of graph mining in the context of web and social-media applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lada A. Adamic, Jun Zhang, Eytan Bakshy, and Mark S. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In Proceedings of the 17th international conference on World Wide Web (WWW), 2008.
Google Scholar
Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne. Finding high quality content in social media, with an application to community-based question answering. In Proceedings of ACM WSDM, pages 183–194, Stanford, CA, USA, February 2008. ACM Press.
Chapter Google Scholar
Ricardo Baeza-Yates. Graphs from search engine queries. In Theory and Practice of Computer Science (SOFSEM), 2007.
Google Scholar
Ricardo Baeza-Yates and Alessandro Tiberi. Extracting semantic relations from query logs. In Proceedings of the 13th ACM international conference on Knowledge discovery and data mining (KDD), 2007.
Google Scholar
Ricardo A. Baeza-Yates, Carlos A. Hurtado, and Marcelo Mendoza. Query recommendation using query logs in search engines. In Current Trends in Database Technology – EDBT Workshops, 2004.
Google Scholar
A. L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.
Article MathSciNet Google Scholar
Albert-Laszlo Barabasi. Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life. Plume Books, April 2002.
Google Scholar
L. Becchetti, C. Castillo, D. Donato, R. Baeza-Yates, and S. Leonardi. Link analysis for web spam detection. ACM Transactions on the Web (TWEB), 2(1):1–42, February 2008.
Article Google Scholar
Doug Beeferman and Adam Berger. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACM international conference on Knowledge discovery and data mining (KDD), 2000.
Google Scholar
Nicholas J. Belkin. The human element: helping people find what they don’t know. Communications of the ACM, 43(8), 2000.
Google Scholar
Jiang Bian, Yandong Liu, Ding Zhou, Eugene Agichtein, and Hongyuan Zha. Learning to recognize reliable users and content in social media with coupled mutual reinforcement. In Proceedings of the 18th international conference on World Wide Web (WWW), 2009.
Google Scholar
P. Boldi, R. Posenato, M. Santini, and S. Vigna. Traps and pitfalls of topic-biased pagerank. In Proceedings of the 4th International Workshop on Algorithms and Models for the Web-Graph (WAW), 2008.
Google Scholar
Paolo Boldi, Francesco Bonchi, Carlos Castillo, Debora Donato, Aristides Gionis, and Sebastiano Vigna. The query-flow graph: model and applications. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.
Google Scholar
Francesco Bonchi, Carlos Castillo, Debora Donato, and Aristides Gionis. Topical query decomposition. In Proceedings of the 14th ACM international conference on Knowledge discovery and data mining (KDD), 2008.
Google Scholar
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engines. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998.
Article Google Scholar
Guido Caldarelli. Scale-Free Networks. Oxford University Press, 2007.
Google Scholar
Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, and Hang Li. Context-aware query suggestion by mining click-through and session data. In Proceeding of the 14th ACM international conference on Knowledge discovery and data mining (KDD), 2008.
Google Scholar
Carlos Castillo, Debora Donato, and Aristides Gionis. Estimating the number of citations of a paper using author reputation. In String Processing and Information Retrieval Symposium (SPIRE), 2007.
Google Scholar
L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 6, 1995.
Google Scholar
Hyunwoo Chun, Haewoon Kwak, Young H. Eom, Yong Y. Ahn, Sue Moon, and Hawoong Jeong. Comparison of online social relations in volume vs interaction: a case study of cyworld. In Proceedings of the 8th ACM SIGCOMM conference on Internet measurement (IMC), 2008.
Google Scholar
CiteSeer, http://citeseer.com.
Nick Craswell, Rosie Jones, Georges Dupret, and Evelyne Viegas, editors. Workshop on Web Search Click Data (WSCD), held in conjunction with WSDM, Barcelona, Spain, 2009.
Google Scholar
Nick Craswell and Martin Szummer. Random walks on the click graph. In Proceedings of the 30th annual international ACM conference on Research and development in information retrieval (SIGIR), 2007.
Google Scholar
G. M. Del Corso, A. Gulli, and F. Romani. Fast pagerank computation via a sparse linear system. Internet Mathematics, 2(3), 2005.
Google Scholar
Alex Fabrikant, Elias Koutsoupias, and Christos Papadimitriou. Heuristically optimized trade-offs: A new paradigm for power laws in the internet. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming (ICALP), 2002.
Google Scholar
Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On powerlaw relationships of the internet topology. In Proceedings of the annual ACM conference on Data Communication (SIGCOMM), 1999.
Google Scholar
D. Fogaras, B. Racz, K. Csalogany, and T. Sarlos. Towards scaling fully personalized pageRank: algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333–358, 2005.
MATH MathSciNet Google Scholar
Bruno M. Fonseca, Paulo Braz Golgher, Edleno Silva de Moura, Bruno Possas, and Nivio Ziviani. Discovering search engine related queries using association rules. Journal of Web Engineering, 2(4), 2004.
Google Scholar
Ko Fujimura and Naoto Tanimoto. The eigenrumor algorithm for calculating contributions in cyberspace communities. Trusting Agents for Trusting Electronic Societies, pages 59–74, 2005.
Google Scholar
Robert Gunning. The technique of clear writing. McGraw-Hill, 1952.
Google Scholar
Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating Web spam with TrustRank. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pages 576–587, Toronto, Canada, August 2004. Morgan Kaufmann.
Google Scholar
T.H. Haveliwala. Topic-sensitive pagerank. In Proceedings of the eleventh International World Wide Web Conference (WWW), Honolulu, Hawaii, 2002.
Google Scholar
Rosie Jones and Kristina L. Klinkner. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 16th ACM conference on Conference on information and knowledge management (CIKM), 2008.
Google Scholar
Rosie Jones, Ravi Kumar, Bo Pang, and Andrew Tomkins. I know what you did last summer: query logs and user privacy. In Proceeding of the 16th ACM conference on Information and knowledge management (CIKM), 2007.
Google Scholar
Rosie Jones, Ravi Kumar, Bo Pang, and Andrew Tomkins. Vanity fair: privacy in querylog bundles. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.
Google Scholar
S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University, 2003.
Google Scholar
J. Peter Kincaid, Robert P. Fishburn, Richard L. Rogers, and Brad S. Chissom. Derivation of new readability formulas for navy enlisted personnel. Technical Report Research Branch Report 8–75, Millington, Tenn, Naval Air Station, 1975.
Google Scholar
Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of ACM, 46(5), 1999.
Google Scholar
Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mishra, and Alexandros Ntoulas. Releasing search queries and clicks privately. In Proceedings of the 18th international conference on World Wide Web (WWW), 2009.
Google Scholar
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), 2000.
Google Scholar
Ravi Kumar, Jasmine Novak, Bo Pang, and Andrew Tomkins. On anonymizing query logs via token-based hashing. In Proceedings of the 16th international conference on World Wide Web (WWW), 2007.
Google Scholar
A.N. Langville and C.D. Meyer. Updating pagerank with iterative aggregation. In Proceedings of the 13th International World Wide Web Conference on Alternate track papers & posters (WWW), New York, NY, USA, 2004.
Google Scholar
G. Harry McLaughlin. SMOG grading: A new readability formula. Journal of Reading, 12(8):639–646, 1969.
Google Scholar
Qiaozhu Mei, Dengyong Zhou, and Kenneth Church. Query suggestion using hitting time. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.
Google Scholar
Michael Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1(2), 2003.
Google Scholar
M. Newman. Power laws, pareto distributions and zipf’s law. Contemporary Physics, 2005.
Google Scholar
M. E. J. Newman and Juyong Park. Why social networks are different from other types of networks. Physical Review E, 68(3):036122, Sep 2003.
Article Google Scholar
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.
Google Scholar
Romualdo Pastor-Satorras, Alexei Vazquez, and Alessandro Vespignani. Dynamical and correlation properties of the internet. Physical Review Letters, 87(25):258701, Nov 2001.
Article Google Scholar
Benjamin Piwowarski and Hugo Zaragoza. Predictive user click models based on click-through history. In Proceedings of the 16th ACM conference on Conference on information and knowledge management (CIKM), 2007.
Google Scholar
Barbara Poblete and Ricardo Baeza-Yates. A content and structure website mining model. In Proceedings of the 15th international conference on World Wide Web (WWW), 2006.
Google Scholar
Barbara Poblete, Carlos Castillo, and Aristides Gionis. Dr. searcher and mr. browser: a unified hyperlink-click graph. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.
Google Scholar
Filip Radlinski and Thorsten Joachims. Query chains: learning to rank from implicit feedback. In Proceeding of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, 2005.
Google Scholar
Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy. Estimating pagerank on graph streams. In Proceedings of the 27th ACM Symposium on Principles of Database Systems (PODS), 2008.
Google Scholar
Stephan H. Strogatz. Exploring complex networks. Nature, 410(6825):268–276, March 2001.
Article Google Scholar
Qi Su, Dmitry Pavlov, Jyh-Herng Chow, and Wendell C. Baker. Internet-scale collection of human-reviewed data. In Proceedings of the 16th international conference on World Wide Web (WWW), pages 231–240, New York, NY, USA, 2007. ACM Press.
Chapter Google Scholar
Jaime Teevan, Eytan Adar, Rosie Jones, and Michael A. S. Potts. Information re-retrieval: repeat queries in yahoo’s logs. In Proceedings of the 30th annual international ACM conference on Research and development in information retrieval (SIGIR), 2007.
Google Scholar
Stanley Wasserman and Katherine Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
Google Scholar
Ji-Rong Wen, Jian-Yun Nie, and Hong-Jiang Zhang. Clustering user queries of a search engine. In Proceedings of the 10th international conference on World Wide Web (WWW), 2001.
Google Scholar
Gui-Rong Xue, Hua-Jun Zeng, Zheng Chen, Yong Yu, Wei-Ying Ma, WenSi Xi, and WeiGuo Fan. Optimizing web search using web click-through data. In Proceedings of the 13th ACM international conference on Information and knowledge management (CIKM), 2004.
Google Scholar
Yahoo! Answers, http://answers.yahoo.com.
Soon-Hyung Yook, Filippo Radicchi, and Hildegard Meyer-Ortmanns. Self-similar scale-free networks and disassortativity, Jul 2005.
Google Scholar
Jun Zhang, Mark S. Ackerman, and Lada Adamic. Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference on World Wide Web (WWW), 2007.
Google Scholar

Download references

Author information

Authors and Affiliations

Yahoo Research, Avd Diagonal 177, Barcelona, Spain
Debora Donato & Aristides Gionis

Authors

Debora Donato
View author publications
You can also search for this author in PubMed Google Scholar
Aristides Gionis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debora Donato .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, U.S.A.
Charu C. Aggarwal
Microsoft Research Asia, Zhichun Road 49, Beijing, 100080, China, People's Republic
Haixun Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Donato, D., Gionis, A. (2010). A Survey of Graph Mining for Web Applications. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_15

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6045-0_15
Published: 18 January 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics