skip to main content
article

Core algorithms in the CLEVER system

Published: 01 May 2006 Publication History

Abstract

This article describes the CLEVER search system developed at the IBM Almaden Research Center. We present a detailed and unified exposition of the various algorithmic components that make up the system, and then present results from two user studies.

References

[1]
Achlioptas, D., Fiat, A., Karlin, A., and McSherry, F. 2001. Web search via hub synthesis. In Proceedings of the 42nd IEEE Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA, 500--509.
[2]
Bharat, K. and Henzinger, M. 1998. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 104--111.
[3]
Borodin, A., Roberts, G., Rosenthal, J., and Tsaparas, P. 2006. Link analysis ranking: Algorithms, theory, and experiments. ACM Trans. Internet Tech. 5, 1, 231--297.
[4]
Borodin, A., Roberts, G. O., Rosenthal, J. S., and Tsaparas, P. 2001. Finding authorities and hubs from link structures on the World Wide Web. In Proceedings of the 10th International Conference on World Wide Web. ACM Press, New York, NY, 415--429.
[5]
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. WWW7/Comput. Netw. 30, 1--7, 107--117.
[6]
Broder, A., Glassman, S. C., Manasse, M. S., and Zweig, G. 1997. Syntactic clustering of the Web. WWW6/Comput. Netw. 29, 8--13, 1157--1166.
[7]
Chakrabarti, S. 2001. Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction. In Proceedings of the 10th International World Wide Web Conference. ACM Press, New York, NY, 211--220.
[8]
Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., and Rajagopalan, S. 1998a. Automatic resource compilation by analyzing hyperlink structure and associated text. WWW7/Comput. Netw. 30, 1--7, 65--74.
[9]
Chakrabarti, S., Dom, B., Gibson, D., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1998b. Spectral filtering for resource discovery. In Proceedings of the ACM SIGIR Workshop on Hypertext Analysis. ACM Press, New York, NY, 13--21.
[10]
Cohn, D. and Chang, H. 2000. Learning to probabilistically identify authoritative documents. In Proceedings of the 17th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 167--174.
[11]
Dean, J. and Henzinger, M. 1999. Finding related pages in the World Wide Web. WWW8/Comput. Netw. 31, 11--16, 1467--1479.
[12]
Ding, C., He, X., Husbands, P., Zha, H., and Simon, H. 2002. Pagerank, hits, and a unified framework for link analysis. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 354--354.
[13]
Farahat, A., Lofaro, T., Miller, J., Rae, G., Schaefer, F., and Ward, L. 2001. Modification of Kleinberg's HITS algorithm using matrix exponentiation and Web log records. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 444--445.
[14]
Feller, W. 1968. An Introduction to Probability Theory and its Applications, I & II. John Wiley, New York, NY.
[15]
Gibson, D., Kleinberg, J., and Raghavan, P. 2000. Clustering categorical data: An approach based on dynamical systems. VLDB J. 8, 3--4, 222--236.
[16]
Golub, G. and Loan, C. V. 1989. Matrix Computations. Johns Hopkins University Press, Baltimore, MD.
[17]
Haveliwala, T. 2002. Topic sensitive page rank. In Proceedings of the 11th International World Wide Web Conference. ACM Press, New York, NY, 517--526.
[18]
Hofmann, T. 2000. Learning probabilistic models of the web. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 369--371.
[19]
Jeh, G. and Widom, J. 2003. Scaling personalized Web search. In Proceedings of the 12th International Conference on World Wide Web. ACM Press, New York, NY, 271--279.
[20]
Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. J. Assoc. Comput. Mach. 46, 5, 604--632.
[21]
Kleinberg, J. M. and Tomkins, A. 1999. Applications of linear algebra in information retrieval and hypertext analysis. In Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 185--193.
[22]
Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1999. Trawling the Web for emerging cyber-communities. Comput. Netw. 31, 11--16, 1481--1493.
[23]
Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 2001. On semi-automated Web taxonomy construction. In Proceedings of the 4th ACM WebDB. ACM Press, New York, NY, 91--96.
[24]
Langville, A. and Meyer, C. D. 2005. A survey of eigenvector methods for Web information retrieval. SIAM Rev. 47, 1, 135--161.
[25]
Lempel, R. and Moran, S. 2000. The stochastic approach for link-structure analysis ( SALSA ) and the TKC effect. WWW9/Comput. Netw. 33, 1--6, 387--401.
[26]
Lempel, R. and Moran, S. 2001. SALSA: The stochastic approach for link-structure analysis. ACM Trans. Informat. Syst. 19, 2, 131--160.
[27]
Li, L., Shang, Y., and Zhang, W. 2002. Improvement of HITS -based algorithms on Web documents. In Proceedings of the 11th International World Wide Web Conference. ACM Press, New York, NY, 527--535.
[28]
Maarek, Y. and Smadja, F. 1989. Full text indexing based on lexical relations. In Proceedings of the 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 198--206.
[29]
Ng, A., Zheng, A., and Jordan, M. 2001. Stable algorithms for link analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 258--266.
[30]
Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the Web. Tech. rep. Stanford University, Stanford, CA.
[31]
Rafei, D. and Mendelzon, A. 2000. What is this page known for? Computing Web page reputations. WWW9/Comput. Netw. 33, 1--6, 823--835.
[32]
Reddy, P. and Kitsuregawa, M. 2001. An approach to relate the web communities through bipartite graphs. In Proceedings of the 2nd International Conference on Web Information Systems Engineering (WISE '01, Kyoto, Japan, Dec.3--6).
[33]
Richardson, M. and Domingos, P. 2002. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems (NIPS). Morgan Kaufmann, San Francisco, CA, 1441--1448.
[34]
Salton, G. and Buckley, C. 1990. Improving retrieval performance for relevance feedback. J. Amer. Soc. Informat. Sci. 41, 4, 288--297.
[35]
Tomlin, J. A. 2003. A new paradigm for ranking pages on the World Wide Web. In Proceedings of the 12th International Conference on World Wide Web. ACM Press, New York, NY, 350--355.
[36]
Toyoda, M. and Kitsuregawa, M. 2001. A Web community chart for navigating related communities. In Proceedings of the 10th International World Wide Web Conference. Poster.
[37]
Tsaparas, P. 2003. Link analysis ranking algorithms. Ph.D. thesis, University of Toronto.
[38]
Tsaparas, P. 2004. Using non-linear dynamical systems for Web searching and ranking. In Proceedings of the 23rd ACM Symposium on Principles of Database Systems. ACM Press, New York, NY, 59--70.

Cited By

View all
  • (2023)Graph Ranking and the Cost of Sybil DefenseProceedings of the 24th ACM Conference on Economics and Computation10.1145/3580507.3597782(586-625)Online publication date: 9-Jul-2023
  • (2022)Rethinking Algorithmic Fairness in the Context of Information AccessProceedings of the Association for Information Science and Technology10.1002/pra2.73659:1(815-817)Online publication date: 14-Oct-2022
  • (2012)A model for mining relevant and non-redundant informationProceedings of the 27th Annual ACM Symposium on Applied Computing10.1145/2245276.2245304(132-137)Online publication date: 26-Mar-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 6, Issue 2
May 2006
105 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/1149121
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2006
Published in TOIT Volume 6, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Anchortext
  2. Web search
  3. hyperlinks
  4. linear algebra
  5. link analysis

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Graph Ranking and the Cost of Sybil DefenseProceedings of the 24th ACM Conference on Economics and Computation10.1145/3580507.3597782(586-625)Online publication date: 9-Jul-2023
  • (2022)Rethinking Algorithmic Fairness in the Context of Information AccessProceedings of the Association for Information Science and Technology10.1002/pra2.73659:1(815-817)Online publication date: 14-Oct-2022
  • (2012)A model for mining relevant and non-redundant informationProceedings of the 27th Annual ACM Symposium on Applied Computing10.1145/2245276.2245304(132-137)Online publication date: 26-Mar-2012
  • (2011)Statistical Properties of Social NetworksSocial Network Data Analytics10.1007/978-1-4419-8462-3_2(17-42)Online publication date: 17-Mar-2011
  • (2008)Weighted graphs and disconnected componentsProceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1401890.1401955(524-532)Online publication date: 24-Aug-2008
  • (2007)Data cleansing for Web information retrieval using query independent featuresJournal of the American Society for Information Science and Technology10.1002/asi.2063358:12(1884-1898)Online publication date: 23-Jul-2007

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media