Abstract
Web search engines play a significant role in answering users’ information needs based on the huge amount of data available on the internet. Although evaluating the performance of these systems is very important for their improvement, there is no comprehensive, unbiased, low-cost, and reusable method for this purpose. Previous works used a small and limited set of queries for their evaluation process that restricts the assessment domain. Moreover, these methods mainly rely on human evaluators for manual assessment of search engines which makes the results of the evaluation subjective to the opinion of human evaluators and also prone to error. In addition, repeating the evaluation would be a problem, as it requires the same level of human effort as of the first evaluation. Another drawback of the existing evaluations is that they score a search result based on its position in the retrieved list of relevant pages. This implies that these methods are only evaluating the ranker component of a web search engine, leaving all other components unevaluated. In this research, we propose an automatic approach for web search engine evaluation that can run with a query set that is multiple times bigger than the query sets used in the manual evaluations. The automatic nature of our proposed method makes repetition of the evaluation to be low cost in terms of the required human effort. Moreover, we designed this approach to be component based, meaning that we have different evaluation tasks for assessing different components of web search engines. For each component, queries are designed differently and are meant to assess the functionality of that component only. Similarly, the way that the retrieved results are scored is different for each component. For example, for assessing the spell correction component, the input query would contain a typo, and in the results, only instances of that word with the correct form will be scored positively. Experimental results of applying thousands of queries on two Persian and two language-independent web search engines show that none of the selected search engines dominates the other three across all components; instead, each search engine has its own points of strength and weakness that are highlighted through this evaluation.









Similar content being viewed by others
Notes
Information Retrieval.
Text REtrieval Conference.
In this system, the universal set contains a set of related webpages to a query. They are used to elicit a value interval for query features by which the relevancy score of a webpage can be calculated in terms of each of its features.
The U set.
P_5 means Precision after 5 docs retrieved.
If an irrelevant page is ranked higher than the relevant results of a query the search engine will receive − 1 score for that query.
References
Mahmoudi M, Badie R, Zahedi MS, Azimzadeh M (2014) Evaluating the retrieval effectiveness of search engines using Persian navigational queries. In: 7’th International Symposium on Telecommunications (IST’2014), pp 563–568
Sánchez D, Martínez-Sanahuja L, Batet M (2018) Survey and evaluation of web search engine hit counts as research tools in computational linguistics. Inf Syst 73:50–60
Wu S, Zhang Z, Xu C (2019) Evaluating the effectiveness of web search engines on results diversification. Inf Res Int Electron J 24(1):n1
Azimzadeh M, Badie R, Esnaashari MM (2016) A review on web search engines’ automatic evaluation methods and how to select the evaluation method. In: Second international conference on web research (ICWR) pp 78–83
Shoeleh F, Azimzadeh M, Mirzaei A, Farhoodi M (2016) Similarity based automatic web search engine evaluation. In: 2016 8th International Symposium on Telecommunications (IST), pp 643–648
Nowkarizi M, Zeinali M (2017) The overlap and coverage of 4 local search engines: Parsijoo, Yooz, Parseek and Rismoun. Hum Inf Interact 4(3):48–59
Zhang J, Cai X, Le T, Fei W, Ma F (2019) A study on effective measurement of search results from search engines. J Glob Inf Manage (JGIM) 27(1):196–221
Rahim I, Mushtaq H, Ahmad S (2019) Evaluation of Search Engines using Advanced Search: Comparative analysis of Yahoo and Bing. Library Philosophy and Practice, pp 1–12
Gul S, Ali S, Hussain A (2020) Retrieval performance of Google, Yahoo and Bing for navigational queries in the field of “life science and biomedicine.” Data Technologies and Applications
Tazehkandi MZ, Nowkarizi M (2020) Evaluating the effectiveness of Google, Parsijoo, Rismoon, and Yooz to retrieve Persian documents. Library Hi Tech
Can F, Nuray R, Sevdik AB (2004) Automatic performance evaluation of Web search engines. Inf Process Manage 40(3):495–514
Jiang S et al (2016) Learning query and document relevance from a web-scale click graph. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 185–194
Zeng W, Xu J, Lan Y, Guo J, Cheng X (2018) Multi page search with reinforcement learning to rank. In: Proceedings of the 2018 ACM SIGIR international conference on theory of information retrieval, pp 175–178
Bitirim Y, Tonta Y, Sever H (2002) Information retrieval effectiveness of Turkish search engines.’ In: International conference on advances in information systems, pp 93–103
Garoufallou E (2012) Evaluating search engines: a comparative study between international and Greek SE by Greek librarians. Program
Griesbaum J (2004) Evaluation of three German search engines: Altavista. de, Google. de and Lycos. de. Inf Res 9(4)
Tawileh W et al (2010) Evaluation of five web search engines in Arabic language. In: LWA, pp 221–228
Lazarinis F, Vilares J, Tait J, Efthimiadis EN (2009) Current research issues and trends in non-English Web searching. Inf Retrieval 12(3):230–250
Givi H, Anvari H (2003) Dastoore zabane Farsi
Callan JP, Croft WB, Harding SM (1992) The INQUERY retrieval system. Database and expert systems applications, pp 78–83
Metzler D, Manmatha R (2004) An inference network approach to image retrieval. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol 3115, pp 42–50
Acknowledgements
We are all thankful for the support and cooperation of all the group members. Additionally, this project was supported by Iran Telecommunication Research Center with project id 901952720.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supported by Iran Telecommunication Research Center.
Rights and permissions
About this article
Cite this article
Alashti, A.H., Rezaei, A.A., Elahi, A. et al. Parsisanj: an automatic component-based approach toward search engine evaluation. J Supercomput 78, 10690–10711 (2022). https://doi.org/10.1007/s11227-022-04306-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04306-9