Applying Co-training to Clickthrough Data for Search Engine Adaptation

Tan, Qingzhao; Chai, Xiaoyong; Ng, Wilfred; Lee, Dik-Lun

doi:10.1007/978-3-540-24571-1_48

Qingzhao Tan⁸,
Xiaoyong Chai⁸,
Wilfred Ng⁸ &
…
Dik-Lun Lee⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2973))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1020 Accesses
20 Citations

Abstract

The information on the World Wide Web is growing without bound. Users may have very diversified preferences in the pages they target through a search engine. It is therefore a challenging task to adapt a search engine to suit the needs of a particular community of users who share similar interests. In this paper, we propose a new algorithm, Ranking SVM in a Co-training Framework (RSCF). Essentially, the RSCF algorithm takes the clickthrough data containing the items in the search result that have been clicked on by a user as an input, and generates adaptive rankers as an output. By analyzing the clickthrough data, RSCF first categorizes the data as the labelled data set, which contains the items that have been scanned already, and the unlabelled data set, which contains the items that have not yet been scanned. The labelled data is then augmented with unlabelled data to obtain a larger data set for training the rankers. We demonstrate that the RSCF algorithm produces better ranking results than the standard Ranking SVM algorithm. Based on RSCF we develop a metasearch engine that comprises MSNSearch, Wisenut, and Overture, and carry out an online experiment to show that our metasearch engine outperforms Google.

This work is supported in part by grants from the Research Grant Council of Hong Kong, Grant No HKUST6079/01E, DAG01/02.EG05, and HKUST6185/02E.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

CF4CF-META: Hybrid Collaborative Filtering Algorithm Selection Framework

Stream-based semi-supervised learning for recommender systems

Article 02 February 2017

SS4CTR: a semi-supervised framework for enhancing click-through rate prediction in sparse and imbalanced data

Article 10 October 2024

References

Bartell, B., Cottrell, G., Belew, R.: Automatic combination of multiple ranked retrieval systemss. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Dublin, Ireland, pp. 173–181 (1994)
Google Scholar
Cohen, W., Shapire, R., Singer, Y.: Learning to order things. Journal of Artificial Intelligence Research 10, 243–270 (1999)
MATH MathSciNet Google Scholar
Fuhr, N.: Optimum polynomial retrieval functions based on the probability ranking principle. ACM Transactions on Information Systems 7, 183–204 (1989)
Article Google Scholar
Joachims, T.: Evaluating retrieval performance using clickthrough data. In: Proceedings of the ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval, Tampere, Finland (2002)
Google Scholar
Boyan, J., Freitag, D., Joachims, T.: A machine learning architecture for optimizing web search engines. In: Proceedings of the AAAI workshop on Internet-Based Information Systems, Portland, Oregon (1996)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Edmonton, Alberta, Canada, pp. 133–142 (2002)
Google Scholar
Bennet, K., Demiriz, A.: Semi-supervised support vector machines. Advances in Neural Information Processing Systems 11, 368–374 (1998)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory, Madison, Wisconsin, United States, pp. 92–100 (1998)
Google Scholar
Goutte, C.: Note on free lunches and cross-validation. Neural Computation 9, 1245–1249 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The Hong Kong University of Science and Technology, ‘
Qingzhao Tan, Xiaoyong Chai, Wilfred Ng & Dik-Lun Lee

Authors

Qingzhao Tan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Chai
View author publications
You can also search for this author in PubMed Google Scholar
Wilfred Ng
View author publications
You can also search for this author in PubMed Google Scholar
Dik-Lun Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, KAIST, 373-1 Guseong-dong Yuseong-gu, 305-701, Daejeon, Korea
YoonJoon Lee
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
Computer Science Department and, Advanced Information Technology Research Center(AITrc), KAIST, Korea
Kyu-Young Whang
Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, 305-701, Daejeon, Republic of Korea
Doheon Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, Q., Chai, X., Ng, W., Lee, DL. (2004). Applying Co-training to Clickthrough Data for Search Engine Adaptation. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_48

Download citation

DOI: https://doi.org/10.1007/978-3-540-24571-1_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21047-4
Online ISBN: 978-3-540-24571-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Applying Co-training to Clickthrough Data for Search Engine Adaptation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

CF4CF-META: Hybrid Collaborative Filtering Algorithm Selection Framework

Stream-based semi-supervised learning for recommender systems

SS4CTR: a semi-supervised framework for enhancing click-through rate prediction in sparse and imbalanced data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Applying Co-training to Clickthrough Data for Search Engine Adaptation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

CF4CF-META: Hybrid Collaborative Filtering Algorithm Selection Framework

Stream-based semi-supervised learning for recommender systems

SS4CTR: a semi-supervised framework for enhancing click-through rate prediction in sparse and imbalanced data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation