Abstract
Data fusion has been widely used in information retrieval for various tasks. It has been found that two factors impact fusion performance significantly: performance of all component systems and dissimilarity among them. This leads to the classification of data fusion methods into four categories depending on which factors are considered, and methods in different categories are suitable for different situations. In this piece of work, we consider data fusion methods with performance weighting. Both proposed methods assign weights based on performance measured by P@10 for the retrieval system in question, while MAP values are used for such performance weighting in previous studies. Compared with other baseline methods in the same category, our experiment shows that the proposed methods are slightly more effective than the others. Some analysis is also done to justify the rationale for the proposed weighting scheme. Because much less human judgment effort is required for P@10 than for some system-oriented measures such as MAP, the proposed method has higher applicability than all the other methods involved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
TREC (Text REtrieval Conference) is an annual event on information retrieval evaluation. Its web site is located at https://trec.nist.gov/.
References
Aslam, J.A., Montague, M.: Models for metasearch. In: Proceedings of ACM SIGIR, pp. 276–284 (2001)
Budíková, P., Batko, M., Zezula, P.: Fusion strategies for large-scale multi-modal image retrieval. Trans. Large Scale Data Knowl. Centered Syst. 33, 146–184 (2017)
Clarke, C.L.A., Rizvi, S., Smucker, M.D., Maistro, M., Zuccon, G.: Overview of the TREC 2020 health misinformation track. In: Proceedings of TREC. National Institute of Standards and Technology (2020)
Cormack, G.V., Clarke, C.L.A., Buettcher, S.: Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. In: Proceedings of ACM SIGIR, pp. 758–759 (2009)
Fox, E.A., Koushik, M.P., Shaw, J., Modlin, R., Rao, D.: Combining evidence from multiple searches. In: The First Text REtrieval Conference (TREC-1), pp. 319–328 (1993)
Ghosh, K., Parui, S.K., Majumder, P.: Learning combination weights in data fusion using genetic algorithms. Inf. Process. Manage. 51(3), 306–328 (2015)
Kato, S., Shimizu, T., Fujita, S., Sakai, T.: Unsupervised answer retrieval with data fusion for community question answering. In: Wang, F.L., et al. (eds.) AIRS 2019. LNCS, vol. 12004, pp. 10–21. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-42835-8_2
Lillis, D., Zhang, L., Toolan, F., Collier, R., Leonard, D., Dunnion, J.: Estimating probabilities for effective data fusion. In: Proceeding of ACM SIGIR, pp. 347–354 (2010)
Lillis, D.: On the evaluation of data fusion for information retrieval. In: FIRE 2020: Forum for Information Retrieval Evaluation, pp. 54–57 (2020)
Lillis, D., Toolan, F., Collier, R., Dunnion, J.: Extending probabilistic data fusion using sliding windows. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 358–369. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_33
Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of machine learning. In: Adaptive Computation and Machine Learning, MIT Press, Cambridge (2012)
Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of ACM CIKM, pp. 538–548 (2002)
Roberts, K., Demner-Fushman, D., Voorhees, E.M., Hersh, W.R., Bedrick, S., Lazar, A.J.: Overview of the TREC 2018 precision medicine track. In Proceedings of TREC. National Institute of Standards and Technology, USA (2018)
Roberts, K., et al.: Overview of the TREC 2017 precision medicine track. In: Proceedings of TREC 2017. National Institute of Standards and Technology (2017)
Sanderson, M., Zobel, J.: Information retrieval system evaluation: effort, sensitivity, and reliability. In: SIGIR 2005: Proceedings of ACM SIGIR, pp. 162–169. ACM (2005)
Sivaram, M., Batri, K., Mohammed, A.S., Porkodi, V., Kousik, N.V.: Data fusion using Tabu crossover genetic algorithm in information retrieval. J. Intell. Fuzzy Syst. 39(4), 5407–5416 (2020)
Voorhees, E.M., Ellis, A. (eds.): Proceedings of TREC 2019. National Institute of Standards and Technology (2019)
Wu, S.: Linear combination of component results in information retrieval. Data Knowl. Eng. 71(1), 114–126 (2012)
Wu, S.: The weighted Condorcet fusion in information retrieval. Inf. Proc. Manage. 49(1), 114–126 (2013)
Wu, S., Bi, Y., Zeng, X., Han, L.: Assigning appropriate weights for the linear combination data fusion method in information retrieval. Inf. Proc. Manage. 45(4), 413–426 (2009)
Wu, S., McClean, S.: Data fusion with correlation weights. In: Proceedings of ECIR, pp. 275–286 (2005)
Xu, C., Huang, C., Wu, S.: Differential evolution-based fusion for results diversification of web search. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds.) WAIM 2016, Part I. LNCS, vol. 9658, pp. 429–440. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39937-9_33
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, Q., Huang, Y., Wu, S. (2022). Inexpensive and Effective Data Fusion Methods with Performance Weights. In: Pardede, E., Delir Haghighi, P., Khalil, I., Kotsis, G. (eds) Information Integration and Web Intelligence. iiWAS 2022. Lecture Notes in Computer Science, vol 13635. Springer, Cham. https://doi.org/10.1007/978-3-031-21047-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-21047-1_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21046-4
Online ISBN: 978-3-031-21047-1
eBook Packages: Computer ScienceComputer Science (R0)