skip to main content
10.1145/2600428.2609577acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Multidimensional relevance modeling via psychometrics and crowdsourcing

Published: 03 July 2014 Publication History

Abstract

While many multidimensional models of relevance have been posited, prior studies have been largely exploratory rather than confirmatory. Lacking a methodological framework to quantify the relationships among factors or measure model fit to observed data, many past models could not be empirically tested or falsified. To enable more positivist experimentation, Xu and Chen [77] proposed a psychometric framework for multidimensional relevance modeling. However, we show their framework exhibits several methodological limitations which could call into question the validity of findings drawn from it. In this work, we identify and address these limitations, scale their methodology via crowdsourcing, and describe quality control methods from psychometrics which stand to benefit crowdsourcing IR studies in general. Methodology we describe for relevance judging is expected to benefit both human-centered and systems-centered IR.

References

[1]
Alonso, O. 2013. Implementing crowdsourcing-based relevance experimentation: an industrial perspective. Information Retrieval. 16, 2, 101--120.
[2]
Anderson, J.C. and Gerbing, D.W. 1988. Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin. 103, 3, 411--423.
[3]
Bailey, P. et al. 2008. Relevance assessment: are judges exchangeable and does it matter. SIGIR'08, 667--674.
[4]
Balatsoukas, P. and Ruthven, I. 2012. An eye-tracking approach to the analysis of relevance judgments on the Web: The case of Google search engine. JASIST. 63, 9, 1728--1746.
[5]
Baokstein, A. 1979. Relevance. JASIS. 30, 5, 269--273.
[6]
Barry, C.L. 1994. User-defined relevance criteria: An exploratory study. JASIS. 45, 3, 149--159.
[7]
Barry, C.L. and Schamber, L. 1998. Users' criteria for relevance evaluation: A cross-situational comparison. IP & M. 34, 2--3, 219--236.
[8]
Bateman, J. 1998. Changes in Relevance Criteria: A Longitudinal Study. Proceedings of the ASIS Annual Meeting. 35, 23--32.
[9]
Behrend, T.S. et al. 2011. The viability of crowdsourcing for survey research. Behavior research methods. 43, 3, 800--813.
[10]
Blanco, R. et al. 2011. Repeatable and Reliable Search System Evaluation Using Crowdsourcing. Proceedings of SIGIR'2011 New York, NY, USA, 923--932.
[11]
Borlund, P. 2003. The concept of relevance in IR. JASIST. 54, 10, 913--925.
[12]
Boyce, B. 1982. Beyond topicality: A two stage view of relevance and the retrieval process. IP & M 18, 3, 105--109.
[13]
Bradford, S.C. 1934. Sources of information on specific subjects. Engineering: An Illustrated Weekly Journal (London). 137, 26, 85--86.
[14]
Browne, M.W. 2000. Psychometrics. Journal of the American Statistical Association. 95, 450, 661--665.
[15]
Cacioppo, J.T. and Petty, R.E. 1984. The Elaboration Likelihood Model of Persuasion. Advances in Consumer Research. 11, 1 673--675.
[16]
Chouldechova, A. and Mease, D. 2013. Differences in Search Engine Evaluations Between Query Owners and Non-owners. Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, 103--112.
[17]
Cognitive Interviewing: http://www.uk.sagepub.com/textbooks/Book225856?prodId=Book225856. Accessed: 2014-01--24.
[18]
Cohen, J. 1988. Statistical power analysis for the behavioral sciences. L. Erlbaum Associates.
[19]
Cool, C. et al. 1993. Characteristics of Texts affecting relevance judgements. Proceedings of the 14th National Online Meeting, 77--84.
[20]
Da Costa Pereira, C. et al. 2012. Multidimensional relevance: Prioritized aggregation in a personalized Information Retrieval setting. IP & M. 48, 2, 340--357.
[21]
Cuadra, C.A. and Katter, R.V. 1967. Opening the Black Box of "Relevance." Journal of Documentation. 23, 4, 291--303.
[22]
Dwyer, J. 2002. Communication in Business: Strategies and Skills. Prentice Hall.
[23]
Eickhoff, C. et al. 2013. Copulas for Information Retrieval. Proceedings of SIGIR'2013 (New York, NY, USA), 663--672.
[24]
Eickhoff, C. and Vries, A.P. de 2013. Increasing cheat robustness of crowdsourcing tasks. Information Retrieval. 16, 2, 121--137.
[25]
Franklin, S.B. et al. 1995. Parallel Analysis: a method for determining significant principal components. Journal of Vegetation Science. 6, 1, 99--106.
[26]
Furr, M. 2011. Scale Construction and Psychometrics for Social and Personality Psychology. SAGE.
[27]
Goldberg, L.R. and Kilkowski, J.M. 1985. The prediction of semantic consistency in self-descriptions: characteristics of persons and of terms that affect the consistency of responses to synonym and antonym pairs. Journal of personality and social psychology. 48, 1, 82--98.
[28]
Green, R. 1995. Topical relevance relationships. I. Why topic matching fails. JASIS. 46, 9, 646--653.
[29]
Greisdorf, H. 2003. Relevance thresholds: a multi-stage predictive model of how users evaluate information. IP & M, 403--423.
[30]
Grice, H.P. 1989. Studies in the way of words. Harvard University Press.
[31]
Gwizdka, J. 2014. News Stories Relevance Effects on Eye-movements. Proceedings of the Symposium on Eye Tracking Research and Applications, 283--286.
[32]
Harter, S.P. 1992. Psychological relevance and information science. JASIS. 43, 9, 602--615.
[33]
Hatcher, L. 2013. Advanced statistics in research: reading, understanding, and writing up data analysis results. ShadowFinch Media, LLC.
[34]
Hjørland, B. and Christensen, F.S. 2002. Work tasks and socio-cognitive relevance: A specific example. JASIST. 53, 11, 960--965.
[35]
Hosseini, M. et al. 2012. On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents. Advances in Information Retrieval. R. Baeza-Yates et al., eds. Springer Berlin Heidelberg. 182--194.
[36]
Hox, J.J. and Bechger, T.M. 2007. An introduction to structural equation modeling.
[37]
Hu, L. and Bentler, P.M. 1999. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal. 6, 1, 1--55.
[38]
Huang, X. and Soergel, D. 2013. Relevance: An improved framework for explicating the notion. JASIST. 64, 1, 18--35.
[39]
Johnson, J.R. et al. 1981. Characteristics of Errors in Accounts Receivable and Inventory Audits. The Accounting Review. 56, 2, 270--293.
[40]
Kazai, G. et al. 2012. An Analysis of Systematic Judging Errors in Information Retrieval. Proceedings of CIKM'2012 (New York, NY, USA), 105--114.
[41]
Kazai, G. et al. 2011. Crowdsourcing for Book Search Evaluation: Impact of Hit Design on Comparative System Ranking. Proceedings of SIGIR'2011 (New York, NY, USA), 205--214.
[42]
Kittur, A. et al. 2008. Crowdsourcing User Studies with Mechanical Turk. Proceedings of SIGCHI'2008 (New York, NY, USA), 453--456.
[43]
Lancaster, F.W. 1968. Information retrieval systems: characteristics, testing, and evaluation. Wiley.
[44]
Lesk, M.E. and Salton, G. 1968. Relevance assessments and retrieval system evaluation. Information Storage and Retrieval. 4, 4, 343--359.
[45]
Levitin, A. and Redman, T. 1995. Quality dimensions of a conceptual view. IP & M. 31, 1, 81--88.
[46]
Little, G. 2009. TurKit: Tools for iterative tasks on mechanical turk. IEEE Symposium on Visual Languages and Human-Centric Computing, 252--253.
[47]
Liu, T.-Y. 2009. Learning to Rank for Information Retrieval. Found. Trends Inf. Retr. 3, 3, 225--331.
[48]
M, P. and Bonett, D.G. 1980. Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin. 88, 3, 588--606.
[49]
Maron, M.E. 1977. On indexing, retrieval and the meaning of about. JASIS. 28, 1, 38--43.
[50]
Marshall, C.C. and Shipman, F.M. 2013. Experiences Surveying the Crowd: Reflections on Methods, Participation, and Reliability. Proceedings of the 5th Annual ACM Web Science Conference, 234--243.
[51]
Mizzaro, S. 1997. Relevance: The whole history. JASIS. 48, 9, 810--832.
[52]
Moshfeghi, Y. et al. 2013. Understanding Relevance: An fMRI Study. Advances in Information Retrieval. P. Serdyukov et al., eds. Springer Berlin Heidelberg. 14--25.
[53]
Mueller, R.O. and Hancock, G.R. 2008. Best practices in structural equation modeling. Best practices in quantitative methods. 488--508.
[54]
Murphy, K.P. 2012. Machine Learning: A Probabilistic Perspective. Mit Press.
[55]
Pearson - Modern Measurement: Theory, Principles, and Applications of Mental Appraisal, 2/E - Steven J. Osterlind: http://www.pearsonhighered.com/educator/product/Modern-Measurement-Theory-Principles-and-Applications-of-Mental-Appraisal/9780137010257.page. Accessed: 2014-01--24.
[56]
Principles and Practice of Structural Equation Modeling: Third Edition: http://www.guilford.com/cgi-bin/cartscript.cgi?page=pr/kline.htm & dir=research/res_quant. Accessed: 2014-01--24.
[57]
Proceedings of the International Conference on Scientific Information -- Two Volumes: http://books.nap.edu/openbook.php?record_id=10866 & page=687. Accessed: 2014-01-26.
[58]
Rees, A.M. and Schultz, D.G. 1967. A Field Experimental Approach to the Study of Relevance Assessments in Relation to Document Searching. Final Report to the National Science Foundation. Volume I.
[59]
Relevance as process: judgements in the context of scholarly research: http://www.informationr.net/ir/10--2/paper226. Accessed: 2014-01--24.
[60]
Sanderson, M. 2010. Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval. 4, 4, 247--375.
[61]
Saracevic, T. 2007. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance. JASIST. 58, 13, 1915--1933.
[62]
Saracevic, T. 2007. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance. JASIST. 58, 13, 2126--2144.
[63]
Schamber, L. 1994. Relevance and Information Behavior. Annual Review of Information Science and Technology (ARIST). 29, 3--48.
[64]
Scheines, R. et al. 1999. Bayesian estimation and testing of structural equation models. Psychometrika. 64, 1, 37--52.
[65]
Tabachnick, B.G. and Fidell, L.S. 2012. Using Multivariate Statistics. Pearson Education, Limited.
[66]
Tang, R. and Solomon, P. 1998. Toward an understanding of the dynamics of relevance judgment: An analysis of one person's search behavior. IP & M. 34, 2--3, 237--256.
[67]
Taylor, A.R. et al. 2007. Relationships between categories of relevance criteria and stage in task completion. IP & M. 43, 4, 1071--1084.
[68]
The Social Construction of Meaning: An Alternative Perspective on Information Sharing: 2003. http://pubsonline.informs.org/doi/abs/10.1287/isre.14.1.87.14765. Accessed: 2014-01--24.
[69]
Tsikrika, T. and Lalmas, M. 2007. Combining Evidence for Relevance Criteria: A Framework and Experiments in Web Retrieval. Advances in Information Retrieval. G. Amati et al., eds. Springer Berlin Heidelberg. 481--493.
[70]
Vakkari, P. and Hakala, N. 2000. Changes in relevance criteria and problem stages in task performance. Journal of Documentation. 56, 5, 540--562.
[71]
Voorhees, E.M. 1998. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. Proceedings of SIGIR'1998 (New York, NY, USA), 315--323.
[72]
Wilson, D. and Sperber, D. 2002. Relevance Theory. Handbook of Pragmatics. G. Ward and L. Horn, eds. Blackwell.
[73]
De Winter, J.C.F. and Dodou, D. 2012. Factor recovery by principal axis factoring and maximum likelihood factor analysis as a function of factor pattern and sample size. Journal of Applied Statistics. 39, 4, 695--710.
[74]
Worthington, R.L. and Whittaker, T.A. 2006. Scale Development Research A Content Analysis and Recommendations for Best Practices. The Counseling Psychologist. 34, 6, 806--838.
[75]
Wright, S. Correlation and causation.
[76]
Xu, Y. (Calvin) and Chen, Z. 2006. Relevance judgment: What do information users consider beyond topicality? JASIST. 57, 7, 961--973.
[77]
Zuccon, G. et al. 2013. Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems. Information Retrieval. 16, 2, 267--305.

Cited By

View all
  • (2024)The State of Pilot Study Reporting in Crowdsourcing: A Reflection on Best Practices and GuidelinesProceedings of the ACM on Human-Computer Interaction10.1145/36410238:CSCW1(1-45)Online publication date: 26-Apr-2024
  • (2024)Evaluating Search System Explainability with Psychometrics and CrowdsourcingProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657796(1051-1061)Online publication date: 10-Jul-2024
  • (2023)Quantifying and Advancing Information Retrieval System ExplainabilityProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591792(3487-3487)Online publication date: 19-Jul-2023
  • Show More Cited By

Index Terms

  1. Multidimensional relevance modeling via psychometrics and crowdsourcing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
    July 2014
    1330 pages
    ISBN:9781450322577
    DOI:10.1145/2600428
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 July 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. crowdsourcing
    2. psychometrics
    3. relevance judgment

    Qualifiers

    • Research-article

    Conference

    SIGIR '14
    Sponsor:

    Acceptance Rates

    SIGIR '14 Paper Acceptance Rate 82 of 387 submissions, 21%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)The State of Pilot Study Reporting in Crowdsourcing: A Reflection on Best Practices and GuidelinesProceedings of the ACM on Human-Computer Interaction10.1145/36410238:CSCW1(1-45)Online publication date: 26-Apr-2024
    • (2024)Evaluating Search System Explainability with Psychometrics and CrowdsourcingProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657796(1051-1061)Online publication date: 10-Jul-2024
    • (2023)Quantifying and Advancing Information Retrieval System ExplainabilityProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591792(3487-3487)Online publication date: 19-Jul-2023
    • (2023)Predicting Crowd Workers Performance: An Information Quality CaseWeb Engineering10.1007/978-3-031-34444-2_6(75-90)Online publication date: 16-Jun-2023
    • (2022)The many dimensions of truthfulnessInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10271058:6Online publication date: 22-Apr-2022
    • (2022)On the effect of relevance scales in crowdsourcing relevance assessments for Information Retrieval evaluationInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10268858:6Online publication date: 22-Apr-2022
    • (2021)One-Way Delay Measurement From Traditional Networks to SDNACM Computing Surveys10.1145/346616754:7(1-35)Online publication date: 18-Jul-2021
    • (2021)Machine Learning–based Cyber Attacks Targeting on Controlled InformationACM Computing Surveys10.1145/346517154:7(1-36)Online publication date: 18-Jul-2021
    • (2021)A Survey of Smart Contract Formal Specification and VerificationACM Computing Surveys10.1145/346442154:7(1-38)Online publication date: 18-Jul-2021
    • (2021)Benchmarking Quantum Computers and the Impact of Quantum NoiseACM Computing Surveys10.1145/346442054:7(1-35)Online publication date: 18-Jul-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media