ABSTRACT
This poster discusses the main assumptions of classical probabilistic models in IR by means of a visual data analysis approach. Starting from the problem of classification of documents into relevant and non relevant classes, we derive the exact same formula of the relevance weight of the Binary Independence Model but with more degrees of interaction. With this approach, new factors can be taken into account to obtain a different ranking of the documents.
- P. Domingos and M. Pazzani. On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29(2-3):103--130, Nov. 1997. Google ScholarDigital Library
- C. Elkan. The foundations of cost-sensitive learning. In B. Nebel, editor, IJCAI, pages 973--978, 2001. Google ScholarDigital Library
- V. Lavrenko. Introduction to probabilistic models in IR. In SIGIR, page 905. ACM, 2010. Google ScholarDigital Library
- G. M. D. Nunzio and A. Sordoni. A visual tool for bayesian data analysis: the impact of smoothing on naive bayes text classifiers. In SIGIR, page 1002, 2012. Google ScholarDigital Library
- S. E. Robertson. The Probability Ranking Principle in IR. Journal of Documentation, 33(4):294--304, 1977.Google ScholarCross Ref
- S. E. Robertson and K. Sparck Jones. Relevance weighting of search terms. JASIS, 27(3):129--146, 1976.Google ScholarCross Ref
- S. E. Robertson and H. Zaragoza. The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr., 3(4):333--389, 2009. Google ScholarDigital Library
Index Terms
A Visual Analysis of the Effects of Assumptions of Classical Probabilistic Models
Recommendations
Discriminative probabilistic models for passage based retrieval
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalThe approach of using passage-level evidence for document retrieval has shown mixed results when it is applied to a variety of test beds with different characteristics. One main reason of the inconsistent performance is that there exists no unified ...
Two models of retrieval with probabilistic indexing
SIGIR '86: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrievalWe describe two retrieval models for probabilistic indexing. The binary independence indexing (BII) model is a generalized version of the Maron & Kuhns indexing model. In this model, the indexing weight of a descriptor in a document is an estimate of ...
Multidimensional analysis of geosciences literature for knowledge discovery
ICGDA '21: Proceedings of the 2021 4th International Conference on Geoinformatics and Data AnalysisWith the increasing volume of online geosciences data, geoscientists are now facing huge challenges in rapidly discovering and extracting valuable information from a large number of documents. Nowadays, it has become crucial to develop flexible and ...
Comments