Abstract
Searching for information is one of the most common tasks that users of any computer system perform, ranging from searching on a local computer, to a shared database, to the Internet. The growth of the Internet and the World Wide Web, the access to an immense amount of data, and the ability of millions of users to freely publish their own content has made the search problem more central than ever before. Compared to traditional information-retrieval systems, many of the emerging information systems of interest, including peer-to-peer networks, blogs, and social networks among others, exhibit a number of characteristics that make the search problem considerably more challenging. We survey algorithms for searching information in systems that are characterized by a number of such features: the data are linked in an underlying graph structure, they are distributed and highly dynamic, and they contain social information, tagging capabilities, and more. We call such algorithms next-generation search algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We remind the reader that the Kendall’s τ distance between two rankings r 1 and r 2 on n items is defined to be the fraction of item pairs (i, j) for which the two rankings disagree
$\mathit{KDist} = \frac{{\sum \nolimits }_{i,j}{K}_{\{i,j\}}({r}_{1},{r}_{2})} {n(n - 1)/2}$where K {i, j}(r 1, r 2) is equal to 1 if i and j are in different order in r 1 and r 2 and 0 otherwise.
References
K. Aberer and J. Wu. A framework for decentralized ranking in web information retrieval. In X. Zhou, Y. Zhang, and M.E. Orlowska, editors, APWeb, volume 2642 of Lecture Notes in Computer Science, pages 213–226. Springer, 2003.
L. Adamic and E. Adar. How to search a social network. Social Neworks, 27(3):187–203, July 2005.
L. Adamic, R. Lukose, A. Puniyani, and B. Huberman. Search in power-law networks. Physical Review E, 64, 2001.
R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(47), 2002.
J. Aldrich. R.A. Fisher and the making of maximum likelihood 1912-1922. Statist. Sci., (3):162–176, 1997.
S. Amer Yahia, M. Benedikt, and P. Bohannon. Challenges in searching online communities. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pages 1–9, 2007.
R. Baeza-Yates, P. Boldi, and C. Castillo. Generalizing pagerank: damping functions for link-based ranking algorithms. In Procs. of the ACM Conference on Research and Development in Information Retrieval (SIGIR), 2006.
R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges on distributed web retrieval. In Procs. of the IEEE 23rd International Conference on Data Engineering (ICDE), 2007.
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, May 1999.
N. Bansal and N. Koudas. Searching the blogosphere. In Procs. of the International Workshop on the Web and Databases (WebDB), 2007.
A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286, 1999.
L. Becchetti, C. Castillo, D. Donato, R. Baeza-Yates, and S. Leonardi. Link analysis for web spam detection. ACM Transactions on the Web (TWEB), 2(1):1–42, February 2008.
M. Benedikt, S. Amer Yahia, L. Lakshmanan, and J. Stoyanovich. Efficient network-aware search in collaborative tagging sites. In Procs. of the 34th International Conference on Very Large Databases (VLDB), 2008.
S. Bhagat, I. Rozenbaum, G. Cormode, S. Muthukrishnan, and H. Xue. No blog is an island — analyzing connections across information networks. In Intlernational Conference on Weblogs and Social Media (ICWSM), 2007.
K. Bharat, B.W. Chang, M. R. Henzinger, and M. Ruhl. Who links to whom: Mining linkage between web sites. In Procs. of the IEEE International Conference on Data Mining (ICDM), 2001.
P. Boldi, R. Posenato, M. Santini, and S. Vigna. Traps and pitfalls of topic-biased pagerank. In Fourth International Workshop on Algorithms and Models for the Web-Graph (WAW), 2008.
B. Bollobás. Mathematical results on scale-free random graphs. Handbook of Graphs and Networks, 2002.
B. Bollobás and W. F. de la Vega. The diameter of random regular graphs. Combinatorica, 2(2), 1982.
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engines. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998.
Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking SVM to document retrieval. In Procs. of the ACM Conference on Research and Development in Information Retrieval (SIGIR), 2006.
D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Computer Surveys, 38(1), 2006.
Y.-Y. Chen, Q. Gan, and T. Suel. Local methods for estimating pagerank values. In Procs. of the 13nd ACM Conference on Information and Knowledge Management (CIKM), pages 381–389, New York, NY, USA, 2004.
P. J. Courtois. Queueing and Computer System Applications. Academic Press, 1997.
D. De Solla Price. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science and Technology, 27, 1976.
G. M. Del Corso, A. Gulli, and F. Romani. Fast pagerank computation via a sparse linear system. Internet Mathematics, 2(3), 2005.
G.M. Del Corso, A. Gulli, and F. Romani. Ranking a stream of news. In Procs. of the 14th International Conference on World Wide Web (WWW), pages 97–106, 2005.
P. Erdős and A. Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5, 1960.
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In Procs. of the 12th ACM Symposium on Principles of database systems (PODS), 2001.
D. Fogaras, B. Rácz, K. Csalogány, and T. Sarlós. Towards scaling fully personalized pageRank: algorithms, lower bounds, and experiments. Internet Math., 2(3):333–358, 2005.
K. Fujimura and N. Tanimoto. The eigenrumor algorithm for calculating contributions in cyberspace communities. Trusting Agents for Trusting Electronic Societies, pages 59–74, 2005.
Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, October 1996.
F. Grey. Inferring probability of relevance using the method of logistic regression. In Procs. of the ACM Conference on Research and Development in Information Retrieval (SIGIR), 1994.
A. Gulli. The anatomy of a news search engine. In WWW, 2005.
A. Gulli, S. Cataudella, and L. Foschini. Tc-socialrank: Ranking the social web. In Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph (WAW), 2009.
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating Web spam with TrustRank. In Procs. of the 30th International Conference on Very Large Data Bases (VLDB), pages 576–587, Toronto, Canada, August 2004. Morgan Kaufmann.
T.H. Haveliwala. Topic-sensitive pagerank. In Procs. of the 11th International World Wide Web Conference (WWW), Honolulu, Hawaii, May 2002.
P. Heymann, G. Koutrika, and H. Garcia-Molina. Can social bookmarking improve web search? In Procs. of the International Conference on Web Search and Web Data Mining (WSDM), 2008.
Ask blog search. http://blog.ask.com/.
Google blog search. http://blogsearch.google.com/.
Ice rocket blog search. http://blogs.icerocket.com.
Blogpulse. http://www.blogpulse.com/.
The state of the live web, april 2007. http://www.sifry.com/alerts/archives/000493.html.
Technorati. whats percolating in blogs now. http://www.technorati.com.
S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University, 2003.
M. Kendall and J.D. Gibbons. Rank Correlation Methods. Edward Arnold, 1990.
J.M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999.
J.M. Kleinberg. Navigation in a small world. Nature, 6798, 2000.
J.M. Kleinberg. The Small-World Phenomenon: An Algorithmic Perspective. In Procs. of the 32nd ACM Symposium on Theory of Computing (STOC), 2000.
J.M. Kleinberg. Small-world phenomena and the dynamics of information. In Advances in Neural Information Processing Systems (NIPS), 2001.
J.M. Kleinberg. Bursty and hierarchical structure in streams. In Procs. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 91–101, New York, NY, USA, 2002. ACM Press.
J.M. Kleinberg. Complex networks and decentralized search algorithms. In International Congress of Mathematicians (ICM), 2006.
R. Kraft, C.C. Chang, F. Maghoul, and R. Kumar. Searching with context. In Procs. of the 15th International Conference on World Wide Web (WWW), 2006.
R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Procs. of the 12th International Conference on World Wide Web (WWW), pages 568–576. ACM Press, 2003.
L. Lamport. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley Professional, July 2002.
A.N. Langville and C.D. Meyer. Updating pagerank with iterative aggregation. In Procs. of the 13th International World Wide Web Conference on Alternate track papers & posters (WWW), pages 392–393, New York, NY, USA, 2004. ACM Press.
C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.
C. Marlow, M. Naaman, D. Boyd, and M. Davis. Ht06, tagging paper, taxonomy, flickr, academic article, to read. In Procs. of the 17th Conference on Hypertext and hypermedia (HYPERTEXT), 2006.
C.D. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, 2000.
S. Milgram. The small world problem. Psychology Today, 2:60–67, 1967.
M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1(2), 2003.
R. Nallapati. Discriminative models for information retrieval. In Procs. of the ACM Conference on Research and Development in Information Retrieval (SIGIR), 2004.
M. Newman. The structure and function of complex networks. SIAM Review, 45(2), 2003.
J. Risson and T. Moors. Survey of research towards robust peer-to-peer networks: Search methods. Technical report, Univ of New South Wales, Sydney Australia, 2006.
K. Sankaralingam, S. Sethumadhavan, and J.C. Browne. Distributed pagerank for p2p systems. pages 58+. IEEE Computer Society, 2003.
H. Simon. On a class of skew distribution functions. Biometrica, 42(4/3), 1955.
H.A. Simon and A. Ando. Aggregation of variables in dynamic systems. Econometrica, 29:111–138, 1961.
A. Ukkonen, C. Castillo, D. Donato, and A. Gionis. Searching the wikipedia with contextual information. In Procs. of the 17th ACM Conference on Information and knowledge management (CIKM), 2008.
V. Von Brzeski, U. Irmak, and R. Kraft. Leveraging context in user-centric entity detection systems. In Procs. of the 16th ACM Conference on Information and knowledge management (CIKM), 2007.
Y. Wang and D. J. Dewitt. Computing pagerank in a distributed internet search system. In Procs. of the 30th International Conference on Very Large Databases (VLDB), 2004.
D. Watts and S.H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 6684, 1998.
J. Wu and K. Aberer. Using siterank for P2P web retrieval. Technical Report IC/2004/31, Swiss Federal Institute of Technology, Lausanne, Switzerland, 2004.
J. Xavier-Parreira, C. Castillo, D. Donato, S. Michel, and G. Weikum. The JXP method for robust pagerank approximation in a peer-to-peer web search network. VLDB Journal, 17(2):291–313, 2008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag London Limited
About this chapter
Cite this chapter
Donato, D., Gionis, A. (2010). Next Generation Search. In: Cormode, G., Thottan, M. (eds) Algorithms for Next Generation Networks. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-84882-765-3_16
Download citation
DOI: https://doi.org/10.1007/978-1-84882-765-3_16
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-84882-764-6
Online ISBN: 978-1-84882-765-3
eBook Packages: Computer ScienceComputer Science (R0)