Skip to main content

Next Generation Search

  • Chapter
  • First Online:
Algorithms for Next Generation Networks

Part of the book series: Computer Communications and Networks ((CCN))

Abstract

Searching for information is one of the most common tasks that users of any computer system perform, ranging from searching on a local computer, to a shared database, to the Internet. The growth of the Internet and the World Wide Web, the access to an immense amount of data, and the ability of millions of users to freely publish their own content has made the search problem more central than ever before. Compared to traditional information-retrieval systems, many of the emerging information systems of interest, including peer-to-peer networks, blogs, and social networks among others, exhibit a number of characteristics that make the search problem considerably more challenging. We survey algorithms for searching information in systems that are characterized by a number of such features: the data are linked in an underlying graph structure, they are distributed and highly dynamic, and they contain social information, tagging capabilities, and more. We call such algorithms next-generation search algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We remind the reader that the Kendall’s τ distance between two rankings r 1 and r 2 on n items is defined to be the fraction of item pairs (i, j) for which the two rankings disagree

    $\mathit{KDist} = \frac{{\sum \nolimits }_{i,j}{K}_{\{i,j\}}({r}_{1},{r}_{2})} {n(n - 1)/2}$

    where K {i, j}(r 1, r 2) is equal to 1 if i and j are in different order in r 1 and r 2 and 0 otherwise.

References

  1. K. Aberer and J. Wu. A framework for decentralized ranking in web information retrieval. In X. Zhou, Y. Zhang, and M.E. Orlowska, editors, APWeb, volume 2642 of Lecture Notes in Computer Science, pages 213–226. Springer, 2003.

    Google Scholar 

  2. L. Adamic and E. Adar. How to search a social network. Social Neworks, 27(3):187–203, July 2005.

    Article  Google Scholar 

  3. L. Adamic, R. Lukose, A. Puniyani, and B. Huberman. Search in power-law networks. Physical Review E, 64, 2001.

    Google Scholar 

  4. R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(47), 2002.

    Google Scholar 

  5. J. Aldrich. R.A. Fisher and the making of maximum likelihood 1912-1922. Statist. Sci., (3):162–176, 1997.

    MathSciNet  Google Scholar 

  6. S. Amer Yahia, M. Benedikt, and P. Bohannon. Challenges in searching online communities. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pages 1–9, 2007.

    Google Scholar 

  7. R. Baeza-Yates, P. Boldi, and C. Castillo. Generalizing pagerank: damping functions for link-based ranking algorithms. In Procs. of the ACM Conference on Research and Development in Information Retrieval (SIGIR), 2006.

    Google Scholar 

  8. R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges on distributed web retrieval. In Procs. of the IEEE 23rd International Conference on Data Engineering (ICDE), 2007.

    Google Scholar 

  9. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, May 1999.

    Google Scholar 

  10. N. Bansal and N. Koudas. Searching the blogosphere. In Procs. of the International Workshop on the Web and Databases (WebDB), 2007.

    Google Scholar 

  11. A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286, 1999.

    Google Scholar 

  12. L. Becchetti, C. Castillo, D. Donato, R. Baeza-Yates, and S. Leonardi. Link analysis for web spam detection. ACM Transactions on the Web (TWEB), 2(1):1–42, February 2008.

    Article  Google Scholar 

  13. M. Benedikt, S. Amer Yahia, L. Lakshmanan, and J. Stoyanovich. Efficient network-aware search in collaborative tagging sites. In Procs. of the 34th International Conference on Very Large Databases (VLDB), 2008.

    Google Scholar 

  14. S. Bhagat, I. Rozenbaum, G. Cormode, S. Muthukrishnan, and H. Xue. No blog is an island — analyzing connections across information networks. In Intlernational Conference on Weblogs and Social Media (ICWSM), 2007.

    Google Scholar 

  15. K. Bharat, B.W. Chang, M. R. Henzinger, and M. Ruhl. Who links to whom: Mining linkage between web sites. In Procs. of the IEEE International Conference on Data Mining (ICDM), 2001.

    Google Scholar 

  16. P. Boldi, R. Posenato, M. Santini, and S. Vigna. Traps and pitfalls of topic-biased pagerank. In Fourth International Workshop on Algorithms and Models for the Web-Graph (WAW), 2008.

    Google Scholar 

  17. B. Bollobás. Mathematical results on scale-free random graphs. Handbook of Graphs and Networks, 2002.

    Google Scholar 

  18. B. Bollobás and W. F. de la Vega. The diameter of random regular graphs. Combinatorica, 2(2), 1982.

    Google Scholar 

  19. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engines. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998.

    Article  Google Scholar 

  20. Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking SVM to document retrieval. In Procs. of the ACM Conference on Research and Development in Information Retrieval (SIGIR), 2006.

    Google Scholar 

  21. D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Computer Surveys, 38(1), 2006.

    Google Scholar 

  22. Y.-Y. Chen, Q. Gan, and T. Suel. Local methods for estimating pagerank values. In Procs. of the 13nd ACM Conference on Information and Knowledge Management (CIKM), pages 381–389, New York, NY, USA, 2004.

    Google Scholar 

  23. P. J. Courtois. Queueing and Computer System Applications. Academic Press, 1997.

    Google Scholar 

  24. D. De Solla Price. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science and Technology, 27, 1976.

    Google Scholar 

  25. G. M. Del Corso, A. Gulli, and F. Romani. Fast pagerank computation via a sparse linear system. Internet Mathematics, 2(3), 2005.

    Google Scholar 

  26. G.M. Del Corso, A. Gulli, and F. Romani. Ranking a stream of news. In Procs. of the 14th International Conference on World Wide Web (WWW), pages 97–106, 2005.

    Google Scholar 

  27. P. Erdős and A. Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5, 1960.

    Google Scholar 

  28. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In Procs. of the 12th ACM Symposium on Principles of database systems (PODS), 2001.

    Google Scholar 

  29. D. Fogaras, B. Rácz, K. Csalogány, and T. Sarlós. Towards scaling fully personalized pageRank: algorithms, lower bounds, and experiments. Internet Math., 2(3):333–358, 2005.

    Article  MathSciNet  MATH  Google Scholar 

  30. K. Fujimura and N. Tanimoto. The eigenrumor algorithm for calculating contributions in cyberspace communities. Trusting Agents for Trusting Electronic Societies, pages 59–74, 2005.

    Google Scholar 

  31. Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, October 1996.

    Google Scholar 

  32. F. Grey. Inferring probability of relevance using the method of logistic regression. In Procs. of the ACM Conference on Research and Development in Information Retrieval (SIGIR), 1994.

    Google Scholar 

  33. A. Gulli. The anatomy of a news search engine. In WWW, 2005.

    Google Scholar 

  34. A. Gulli, S. Cataudella, and L. Foschini. Tc-socialrank: Ranking the social web. In Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph (WAW), 2009.

    Google Scholar 

  35. Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating Web spam with TrustRank. In Procs. of the 30th International Conference on Very Large Data Bases (VLDB), pages 576–587, Toronto, Canada, August 2004. Morgan Kaufmann.

    Google Scholar 

  36. T.H. Haveliwala. Topic-sensitive pagerank. In Procs. of the 11th International World Wide Web Conference (WWW), Honolulu, Hawaii, May 2002.

    Google Scholar 

  37. P. Heymann, G. Koutrika, and H. Garcia-Molina. Can social bookmarking improve web search? In Procs. of the International Conference on Web Search and Web Data Mining (WSDM), 2008.

    Google Scholar 

  38. Ask blog search. http://blog.ask.com/.

  39. Google blog search. http://blogsearch.google.com/.

  40. Ice rocket blog search. http://blogs.icerocket.com.

  41. Blogpulse. http://www.blogpulse.com/.

  42. The state of the live web, april 2007. http://www.sifry.com/alerts/archives/000493.html.

  43. Technorati. whats percolating in blogs now. http://www.technorati.com.

  44. S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University, 2003.

    Google Scholar 

  45. M. Kendall and J.D. Gibbons. Rank Correlation Methods. Edward Arnold, 1990.

    Google Scholar 

  46. J.M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999.

    Article  MathSciNet  MATH  Google Scholar 

  47. J.M. Kleinberg. Navigation in a small world. Nature, 6798, 2000.

    Google Scholar 

  48. J.M. Kleinberg. The Small-World Phenomenon: An Algorithmic Perspective. In Procs. of the 32nd ACM Symposium on Theory of Computing (STOC), 2000.

    Google Scholar 

  49. J.M. Kleinberg. Small-world phenomena and the dynamics of information. In Advances in Neural Information Processing Systems (NIPS), 2001.

    Google Scholar 

  50. J.M. Kleinberg. Bursty and hierarchical structure in streams. In Procs. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 91–101, New York, NY, USA, 2002. ACM Press.

    Google Scholar 

  51. J.M. Kleinberg. Complex networks and decentralized search algorithms. In International Congress of Mathematicians (ICM), 2006.

    Google Scholar 

  52. R. Kraft, C.C. Chang, F. Maghoul, and R. Kumar. Searching with context. In Procs. of the 15th International Conference on World Wide Web (WWW), 2006.

    Google Scholar 

  53. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Procs. of the 12th International Conference on World Wide Web (WWW), pages 568–576. ACM Press, 2003.

    Google Scholar 

  54. L. Lamport. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley Professional, July 2002.

    Google Scholar 

  55. A.N. Langville and C.D. Meyer. Updating pagerank with iterative aggregation. In Procs. of the 13th International World Wide Web Conference on Alternate track papers & posters (WWW), pages 392–393, New York, NY, USA, 2004. ACM Press.

    Chapter  Google Scholar 

  56. C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.

    Google Scholar 

  57. C. Marlow, M. Naaman, D. Boyd, and M. Davis. Ht06, tagging paper, taxonomy, flickr, academic article, to read. In Procs. of the 17th Conference on Hypertext and hypermedia (HYPERTEXT), 2006.

    Google Scholar 

  58. C.D. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, 2000.

    Google Scholar 

  59. S. Milgram. The small world problem. Psychology Today, 2:60–67, 1967.

    Google Scholar 

  60. M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1(2), 2003.

    Google Scholar 

  61. R. Nallapati. Discriminative models for information retrieval. In Procs. of the ACM Conference on Research and Development in Information Retrieval (SIGIR), 2004.

    Google Scholar 

  62. M. Newman. The structure and function of complex networks. SIAM Review, 45(2), 2003.

    Google Scholar 

  63. J. Risson and T. Moors. Survey of research towards robust peer-to-peer networks: Search methods. Technical report, Univ of New South Wales, Sydney Australia, 2006.

    Google Scholar 

  64. K. Sankaralingam, S. Sethumadhavan, and J.C. Browne. Distributed pagerank for p2p systems. pages 58+. IEEE Computer Society, 2003.

    Google Scholar 

  65. H. Simon. On a class of skew distribution functions. Biometrica, 42(4/3), 1955.

    Google Scholar 

  66. H.A. Simon and A. Ando. Aggregation of variables in dynamic systems. Econometrica, 29:111–138, 1961.

    Article  MATH  Google Scholar 

  67. A. Ukkonen, C. Castillo, D. Donato, and A. Gionis. Searching the wikipedia with contextual information. In Procs. of the 17th ACM Conference on Information and knowledge management (CIKM), 2008.

    Google Scholar 

  68. V. Von Brzeski, U. Irmak, and R. Kraft. Leveraging context in user-centric entity detection systems. In Procs. of the 16th ACM Conference on Information and knowledge management (CIKM), 2007.

    Google Scholar 

  69. Y. Wang and D. J. Dewitt. Computing pagerank in a distributed internet search system. In Procs. of the 30th International Conference on Very Large Databases (VLDB), 2004.

    Google Scholar 

  70. D. Watts and S.H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 6684, 1998.

    Google Scholar 

  71. J. Wu and K. Aberer. Using siterank for P2P web retrieval. Technical Report IC/2004/31, Swiss Federal Institute of Technology, Lausanne, Switzerland, 2004.

    Google Scholar 

  72. J. Xavier-Parreira, C. Castillo, D. Donato, S. Michel, and G. Weikum. The JXP method for robust pagerank approximation in a peer-to-peer web search network. VLDB Journal, 17(2):291–313, 2008.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debora Donato .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag London Limited

About this chapter

Cite this chapter

Donato, D., Gionis, A. (2010). Next Generation Search. In: Cormode, G., Thottan, M. (eds) Algorithms for Next Generation Networks. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-84882-765-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-84882-765-3_16

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84882-764-6

  • Online ISBN: 978-1-84882-765-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics