skip to main content
10.1145/1066157.1066220acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Page quality: in search of an unbiased web ranking

Published: 14 June 2005 Publication History

Abstract

In a number of recent studies [4, 8] researchers have found that because search engines repeatedly return currently popular pages at the top of search results, popular pages tend to get even more popular, while unpopular pages get ignored by an average user. This "rich-get-richer" phenomenon is particularly problematic for new and high-quality pages because they may never get a chance to get users' attention, decreasing the overall quality of search results in the long run. In this paper, we propose a new ranking function, called page quality that can alleviate the problem of popularity-based ranking. We first present a formal framework to study the search engine bias by discussing what is an "ideal" way to measure the intrinsic quality of a page. We then compare how PageRank, the current ranking metric used by major search engines, differs from this ideal quality metric. This framework will help us investigate the search engine bias in more concrete terms and provide clear understanding why PageRank is effective in many cases and exactly when it is problematic. We then propose a practical way to estimate the intrinsic page quality to avoid the inherent bias of PageRank. We derive our proposed quality estimator through a careful analysis of a reasonable web user model, and we present experimental results that show the potential of our proposed estimator. We believe that our quality estimator has the potential to alleviate the rich-get-richer phenomenon and help new and high-quality pages get the attention that they deserve.

References

[1]
S. Abiteboul, M. Preda, and G. Cobna. Adaptive on-line page importance computation. In Proceedings of the International World-Wide Web Conference, May 2003.
[2]
D. Achlioptas, A. Fiat, A. R. Karlin, and F. McSherry. Web search via hub synthesis. In IEEE Symposium on Foundations of Computer Science, pages 500--509, 2001.
[3]
R. Albert, A.-L. Barabasi, and H. Jeong. Diameter of the World Wide Web. Nature, 401(6749):130--131, September 1999.
[4]
R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web dynamics, age and page quality. In Proceedings of SPIRE 2002, September 2002.
[5]
A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: authority-based keyword search in databases. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004.
[6]
A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, October 1999.
[7]
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. In Proceedings of the International World-Wide Web Conference, May 2000.
[8]
J. Cho and S. Roy. Impact of search engines on page popularity. In Proceedings of the International World-Wide Web Conference, May 2004.
[9]
J. Cho, S. Roy, and R. E. Adams. Page quality: In search of an unbiased web ranking. Technical report, UCLA Computer Science, 2005.
[10]
N. Fuhr. Probabilistic models in information retrieval. The Computer Journal, 35(3):243--255, 1992.
[11]
F. Geerts, H. Mannila, and E. Terzi. Relational link-based ranking. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004.
[12]
R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity search in databases. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 26--37, 1998.
[13]
S. P. Harter. Variations in relevance assessments and the measurement of retrieval effectiveness. Journal of the American Society for Information Science, 47(1):37--49, December 1996.
[14]
T. H. Haveliwala. Topic-sensitive pagerank. In Proceedings of the International World-Wide Web Conference, May 2002.
[15]
S. Kamvar, T. Haveliwala, and G. Golub. Adaptive methods for the computation of pagerank. In Proceedings of International Conference on the Numerical Solution of Markov Chains. September 2003.
[16]
S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Extrapolation methods for accelerating pagerank computations. In Proceedings of the International World-Wide Web Conference, May 2003.
[17]
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, September 1999.
[18]
S. Mizzaro. Measuring the agreement among relevance judges. In Proceedings of MIRA Conference, April 1999.
[19]
Nielsen NetRatings. http://www.nielsen-netratings.com/.
[20]
Npd search and portal site study. Available at http: / /www.npd.com/press/releases/press_000919.htm.
[21]
S. Olsen. Does search engine's power threaten web's independence? Available at http://news.com.com/2009--1023-963618.html, October 2002.
[22]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University Database Group, 1998. Available at http://dbpubs.stanford.edu:8090/pub/1999--66.
[23]
D. M. Pennock, G. W. Flake, S. Lawrence, E. J. Glover, and C. L. Giles. Winners don't take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences, 99(8):5207--5211, 2002.
[24]
S. E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1975.
[25]
G. Salton. The SMART Retrieval System -- Experiments in Automatic Document Processing. Prentice Hall Inc., 1971.
[26]
G. Salton and M. J. McGill. Introduction to modern information retrieval. McGraw-Hill, 1983.
[27]
J. A. Tomlin. A new paradigm for ranking pages on the world wide web. In Proceedings of the International World-Wide Web Conference, May 2003.
[28]
TREC: Text retrieval conference. http://trec.nist.gov.
[29]
A. C. Tsoi, G. Morini, F. Scarselli, M. Hagenbuchner, and M. Maggini. Adaptive ranking of web pages. In Proceedings of the International World-Wide Web Conference, May 2003.
[30]
F. Verhulst. Nonlinear Differential Equations and Dynamical Systems, Springer Verlag, 2nd edition, 1997.
[31]
Y. Wang and D. DeWitt. Computing pagerank in a distributed internet search system. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004.
[32]
S. Wartick. Boolean operations. Information Retrieval: Data Structures and Algorithms, pages 264--292, 1992.

Cited By

View all
  • (2023)Equitable Top-k Results for Long Tail DataProceedings of the ACM on Management of Data10.1145/36267271:4(1-24)Online publication date: 12-Dec-2023
  • (2023)Layph: Making Change Propagation Constraint in Incremental Graph Processing by Layering Graph2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00212(2766-2779)Online publication date: Apr-2023
  • (2023)Building the Groundwork for a Natural Search, to Make Accurate and Trustworthy Filtered Searches: The Case of a New Educational Platform with a Global Heat Map to Geolocate Innovations in Renewable EnergyIntelligent Human Computer Interaction10.1007/978-3-031-27199-1_25(239-250)Online publication date: 11-Apr-2023
  • Show More Cited By
  1. Page quality: in search of an unbiased web ranking

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
    June 2005
    990 pages
    ISBN:1595930604
    DOI:10.1145/1066157
    • Conference Chair:
    • Fatma Ozcan
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS05
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Equitable Top-k Results for Long Tail DataProceedings of the ACM on Management of Data10.1145/36267271:4(1-24)Online publication date: 12-Dec-2023
    • (2023)Layph: Making Change Propagation Constraint in Incremental Graph Processing by Layering Graph2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00212(2766-2779)Online publication date: Apr-2023
    • (2023)Building the Groundwork for a Natural Search, to Make Accurate and Trustworthy Filtered Searches: The Case of a New Educational Platform with a Global Heat Map to Geolocate Innovations in Renewable EnergyIntelligent Human Computer Interaction10.1007/978-3-031-27199-1_25(239-250)Online publication date: 11-Apr-2023
    • (2021)Content and link-structure perspective of ranking webpagesComputer Science Review10.1016/j.cosrev.2021.10039740:COnline publication date: 1-May-2021
    • (2021)Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systemsThe VLDB Journal10.1007/s00778-021-00671-830:5(739-768)Online publication date: 5-May-2021
    • (2020)Rider-Rank Algorithm-Based Feature Extraction for Re-ranking the Webpages in the Search EngineThe Computer Journal10.1093/comjnl/bxaa032Online publication date: 12-Jun-2020
    • (2019)Enabling Ontology-Based Search: A Case Study in the Bioinformatics Domain2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE)10.1109/BIBE.2019.00048(227-234)Online publication date: Oct-2019
    • (2019)Information credibility evaluation in online professional social network using tree augmented naïve Bayes classifierElectronic Commerce Research10.1007/s10660-019-09387-yOnline publication date: 15-Nov-2019
    • (2018)IntroductionWeb Searching and Mining10.1007/978-981-13-3053-7_1(1-27)Online publication date: 13-Dec-2018
    • (2018)Web Documents Prioritization Using Iterative ImprovementSmart and Innovative Trends in Next Generation Computing Technologies10.1007/978-981-10-8657-1_35(459-473)Online publication date: 9-Jun-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media