Abstract
Graph structures provide a general framework for modeling entities and their relationships, and they are routinely used to describe a wide variety of data such as the Internet, the web, social networks, metabolic networks, protein-interaction networks, food webs, citation networks, and many more. In recent years, there has been an increasing amount of literature on studying properties, models, and algorithms for graph data. In this chapter we provide a brief overview of graph- mining algorithms for web and social-media applications. We review a wide range of algorithms, such as those for estimating reputation and popularity of items in a network, mining query logs and performing query recommendations. The main goal of the chapter is to provide the reader with an understanding of how graph structural mining algorithms can be exploited in the context of web applications. This highlights the challenges of, and provides an understanding of the power of graph mining in the context of web and social-media applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lada A. Adamic, Jun Zhang, Eytan Bakshy, and Mark S. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In Proceedings of the 17th international conference on World Wide Web (WWW), 2008.
Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne. Finding high quality content in social media, with an application to community-based question answering. In Proceedings of ACM WSDM, pages 183–194, Stanford, CA, USA, February 2008. ACM Press.
Ricardo Baeza-Yates. Graphs from search engine queries. In Theory and Practice of Computer Science (SOFSEM), 2007.
Ricardo Baeza-Yates and Alessandro Tiberi. Extracting semantic relations from query logs. In Proceedings of the 13th ACM international conference on Knowledge discovery and data mining (KDD), 2007.
Ricardo A. Baeza-Yates, Carlos A. Hurtado, and Marcelo Mendoza. Query recommendation using query logs in search engines. In Current Trends in Database Technology – EDBT Workshops, 2004.
A. L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.
Albert-Laszlo Barabasi. Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life. Plume Books, April 2002.
L. Becchetti, C. Castillo, D. Donato, R. Baeza-Yates, and S. Leonardi. Link analysis for web spam detection. ACM Transactions on the Web (TWEB), 2(1):1–42, February 2008.
Doug Beeferman and Adam Berger. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACM international conference on Knowledge discovery and data mining (KDD), 2000.
Nicholas J. Belkin. The human element: helping people find what they don’t know. Communications of the ACM, 43(8), 2000.
Jiang Bian, Yandong Liu, Ding Zhou, Eugene Agichtein, and Hongyuan Zha. Learning to recognize reliable users and content in social media with coupled mutual reinforcement. In Proceedings of the 18th international conference on World Wide Web (WWW), 2009.
P. Boldi, R. Posenato, M. Santini, and S. Vigna. Traps and pitfalls of topic-biased pagerank. In Proceedings of the 4th International Workshop on Algorithms and Models for the Web-Graph (WAW), 2008.
Paolo Boldi, Francesco Bonchi, Carlos Castillo, Debora Donato, Aristides Gionis, and Sebastiano Vigna. The query-flow graph: model and applications. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.
Francesco Bonchi, Carlos Castillo, Debora Donato, and Aristides Gionis. Topical query decomposition. In Proceedings of the 14th ACM international conference on Knowledge discovery and data mining (KDD), 2008.
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engines. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998.
Guido Caldarelli. Scale-Free Networks. Oxford University Press, 2007.
Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, and Hang Li. Context-aware query suggestion by mining click-through and session data. In Proceeding of the 14th ACM international conference on Knowledge discovery and data mining (KDD), 2008.
Carlos Castillo, Debora Donato, and Aristides Gionis. Estimating the number of citations of a paper using author reputation. In String Processing and Information Retrieval Symposium (SPIRE), 2007.
L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 6, 1995.
Hyunwoo Chun, Haewoon Kwak, Young H. Eom, Yong Y. Ahn, Sue Moon, and Hawoong Jeong. Comparison of online social relations in volume vs interaction: a case study of cyworld. In Proceedings of the 8th ACM SIGCOMM conference on Internet measurement (IMC), 2008.
CiteSeer, http://citeseer.com.
Nick Craswell, Rosie Jones, Georges Dupret, and Evelyne Viegas, editors. Workshop on Web Search Click Data (WSCD), held in conjunction with WSDM, Barcelona, Spain, 2009.
Nick Craswell and Martin Szummer. Random walks on the click graph. In Proceedings of the 30th annual international ACM conference on Research and development in information retrieval (SIGIR), 2007.
G. M. Del Corso, A. Gulli, and F. Romani. Fast pagerank computation via a sparse linear system. Internet Mathematics, 2(3), 2005.
Alex Fabrikant, Elias Koutsoupias, and Christos Papadimitriou. Heuristically optimized trade-offs: A new paradigm for power laws in the internet. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming (ICALP), 2002.
Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On powerlaw relationships of the internet topology. In Proceedings of the annual ACM conference on Data Communication (SIGCOMM), 1999.
D. Fogaras, B. Racz, K. Csalogany, and T. Sarlos. Towards scaling fully personalized pageRank: algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333–358, 2005.
Bruno M. Fonseca, Paulo Braz Golgher, Edleno Silva de Moura, Bruno Possas, and Nivio Ziviani. Discovering search engine related queries using association rules. Journal of Web Engineering, 2(4), 2004.
Ko Fujimura and Naoto Tanimoto. The eigenrumor algorithm for calculating contributions in cyberspace communities. Trusting Agents for Trusting Electronic Societies, pages 59–74, 2005.
Robert Gunning. The technique of clear writing. McGraw-Hill, 1952.
Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating Web spam with TrustRank. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pages 576–587, Toronto, Canada, August 2004. Morgan Kaufmann.
T.H. Haveliwala. Topic-sensitive pagerank. In Proceedings of the eleventh International World Wide Web Conference (WWW), Honolulu, Hawaii, 2002.
Rosie Jones and Kristina L. Klinkner. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 16th ACM conference on Conference on information and knowledge management (CIKM), 2008.
Rosie Jones, Ravi Kumar, Bo Pang, and Andrew Tomkins. I know what you did last summer: query logs and user privacy. In Proceeding of the 16th ACM conference on Information and knowledge management (CIKM), 2007.
Rosie Jones, Ravi Kumar, Bo Pang, and Andrew Tomkins. Vanity fair: privacy in querylog bundles. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.
S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University, 2003.
J. Peter Kincaid, Robert P. Fishburn, Richard L. Rogers, and Brad S. Chissom. Derivation of new readability formulas for navy enlisted personnel. Technical Report Research Branch Report 8–75, Millington, Tenn, Naval Air Station, 1975.
Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of ACM, 46(5), 1999.
Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mishra, and Alexandros Ntoulas. Releasing search queries and clicks privately. In Proceedings of the 18th international conference on World Wide Web (WWW), 2009.
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), 2000.
Ravi Kumar, Jasmine Novak, Bo Pang, and Andrew Tomkins. On anonymizing query logs via token-based hashing. In Proceedings of the 16th international conference on World Wide Web (WWW), 2007.
A.N. Langville and C.D. Meyer. Updating pagerank with iterative aggregation. In Proceedings of the 13th International World Wide Web Conference on Alternate track papers & posters (WWW), New York, NY, USA, 2004.
G. Harry McLaughlin. SMOG grading: A new readability formula. Journal of Reading, 12(8):639–646, 1969.
Qiaozhu Mei, Dengyong Zhou, and Kenneth Church. Query suggestion using hitting time. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.
Michael Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1(2), 2003.
M. Newman. Power laws, pareto distributions and zipf’s law. Contemporary Physics, 2005.
M. E. J. Newman and Juyong Park. Why social networks are different from other types of networks. Physical Review E, 68(3):036122, Sep 2003.
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.
Romualdo Pastor-Satorras, Alexei Vazquez, and Alessandro Vespignani. Dynamical and correlation properties of the internet. Physical Review Letters, 87(25):258701, Nov 2001.
Benjamin Piwowarski and Hugo Zaragoza. Predictive user click models based on click-through history. In Proceedings of the 16th ACM conference on Conference on information and knowledge management (CIKM), 2007.
Barbara Poblete and Ricardo Baeza-Yates. A content and structure website mining model. In Proceedings of the 15th international conference on World Wide Web (WWW), 2006.
Barbara Poblete, Carlos Castillo, and Aristides Gionis. Dr. searcher and mr. browser: a unified hyperlink-click graph. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM), 2008.
Filip Radlinski and Thorsten Joachims. Query chains: learning to rank from implicit feedback. In Proceeding of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, 2005.
Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy. Estimating pagerank on graph streams. In Proceedings of the 27th ACM Symposium on Principles of Database Systems (PODS), 2008.
Stephan H. Strogatz. Exploring complex networks. Nature, 410(6825):268–276, March 2001.
Qi Su, Dmitry Pavlov, Jyh-Herng Chow, and Wendell C. Baker. Internet-scale collection of human-reviewed data. In Proceedings of the 16th international conference on World Wide Web (WWW), pages 231–240, New York, NY, USA, 2007. ACM Press.
Jaime Teevan, Eytan Adar, Rosie Jones, and Michael A. S. Potts. Information re-retrieval: repeat queries in yahoo’s logs. In Proceedings of the 30th annual international ACM conference on Research and development in information retrieval (SIGIR), 2007.
Stanley Wasserman and Katherine Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
Ji-Rong Wen, Jian-Yun Nie, and Hong-Jiang Zhang. Clustering user queries of a search engine. In Proceedings of the 10th international conference on World Wide Web (WWW), 2001.
Gui-Rong Xue, Hua-Jun Zeng, Zheng Chen, Yong Yu, Wei-Ying Ma, WenSi Xi, and WeiGuo Fan. Optimizing web search using web click-through data. In Proceedings of the 13th ACM international conference on Information and knowledge management (CIKM), 2004.
Yahoo! Answers, http://answers.yahoo.com.
Soon-Hyung Yook, Filippo Radicchi, and Hildegard Meyer-Ortmanns. Self-similar scale-free networks and disassortativity, Jul 2005.
Jun Zhang, Mark S. Ackerman, and Lada Adamic. Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference on World Wide Web (WWW), 2007.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag US
About this chapter
Cite this chapter
Donato, D., Gionis, A. (2010). A Survey of Graph Mining for Web Applications. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_15
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6045-0_15
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)