ABSTRACT
Many web applications such as ad matching systems, vertical search engines, and page categorization systems require the identification of a particular type or class of pages on the Web. The sheer number and diversity of the pages on the Web, however, makes the problem of obtaining a good sample of the class of interest hard. In this paper, we describe a successfully deployed end-to-end system that starts from a biased training sample and makes use of several state-of-the-art machine learning algorithms working in tandem, including a powerful active learning component, in order to achieve a good classification system. The system is evaluated on traffic from a real-world ad-matching platform and is shown to achieve high categorization effectiveness with a significant reduction in editorial effort and labeling time.
- Becker, Hila and Broder, Andrei and Gabrilovich, Evgeniy and Josifovski, Vanja and Pang, Bo, Context transfer in search advertising, SIGIR '09: Proc. of 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009: 656--657, Boston, MA, USA Google ScholarDigital Library
- Davis, J., and Goadrich, M., The Relationship Between Precision-Recall and ROC Curves, In ICML '06: Proceedings of the 23rd international conference on Machine learning, 2006: 233--240. Google ScholarDigital Library
- Steffen Bickel and Tobias Scheffer, Dirichlet-enhanced spam filtering based on biased samples, Advances in Neural Information Processing Systems 19. 2007:161--168, MIT Press.Google Scholar
- Anagnostopoulos, Aris and Broder, Andrei Z. and Gabrilovich, Evgeniy and Josifovski, Vanja and Riedel, Lance, Just-in-time contextual advertising, CIKM '07: Proc. of the 16th ACM conference on Conference on Information and Knowledge Management. 2007:331--340, Lisbon, Portugal. Google ScholarDigital Library
- Christopher J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery. 1998, 2:121--167. Google ScholarDigital Library
- Karsten M. Borgwardt and Arthur Gretton and Malte J. Rasch and Hans-Peter Kriegel and Bernhard Schölkopf and Alexander J. Smola. ISMB (Supplement of Bioinformatics), 49--57, Integrating structured biological data by Kernel Maximum Mean Discrepancy. 2006. Google ScholarDigital Library
- Philip Chan and Salvatore J. Stolfo. Toward Scalable Learning with Non-uniform Distributions: Effects and a Multi-classifier Approach, KDD '99: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, 1999:164--168, AAAI Press.Google Scholar
- Nitesh V. Chawla and Kevin W. Bowyer and Lawrence O. Hall and W. Philip Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research. 2002. 16:321--357. Google ScholarDigital Library
- Ertekin, Seyda and Huang, Jian and Bottou, Leon and Giles, Lee. Learning on the border: active learning in imbalanced data classification. CIKM '07: Proc. of the 16th ACM Conference on Information and Knowledge Management, 2007. isbn = 978-1-59593-803-9, pages = 127--136, Lisbon, Portugal. ACM, New York, NY, USA. Google ScholarDigital Library
- Fan, Rong-En and Chang, Kai-Wei and Hsieh, Cho-Jui and Wang, Xiang-Rui and Lin, Chih-Jen. LIBLINEAR: A Library for Large Linear Classification, J. Mach. Learn. Res., 9, 2008: 1871--1874. Google ScholarDigital Library
- Arindam Banerjee, Inderjit S. Dhillon, Joydeep Ghosh, Suvrit Sra. Clustering on the Unit Hypersphere using von Mises-Fisher Distributions. J. Mach. Learn. Res., 6, 2005. 1345--1382. Google ScholarDigital Library
- Brinker, Klaus. Incorporating diversity in Active Learning with Support Vector Machines. ICML '03: Proc. of the 20th International Conference on Machine learning. 2003: 408--415, Washington D.C., USA.Google Scholar
- Morris DeGroot and Stephen Fienberg. The Comparison and Evaluation of Forecasters. The Statistician, volume = 32, 1983. 12--22.Google Scholar
- Pedro Domingos. MetaCost: A General Method for Making Classifiers Cost-Sensitive. Proc. of the 5th International Conference on Knowledge Discovery and Data Mining, 1999:155--164. Google ScholarDigital Library
- Elkan, Charles and Noto, Keith. Learning classifiers from only positive and unlabeled data. KDD '08: Proc. of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008;213--220. Las Vegas, Nevada, USA. Google ScholarDigital Library
- Hsieh, Cho-Jui and Chang, Kai-Wei and Lin, Chih-Jen and Keerthi, S. Sathiya and Sundararajan, S. A dual coordinate descent method for large-scale linear SVM. ICML '08: Proc. of the 25th International Conference on Machine Learning, 2008:408--415. Helsinki, Finland. Google ScholarDigital Library
- Thorsten Joachims. Text categorization with support vector machines: learning with many relevant features. Proc. of 10th European Conference on Machine Learning, 1998:137--142. Google ScholarDigital Library
- Joshi, Mahesh V. and Agarwal, Ramesh C. and Kumar, Vipin. Predicting rare classes: can boosting make any weak learner strong? KDD '02: Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002. isbn = 1-58113-567-X. 297--306. Edmonton, Alberta, Canada. Google ScholarDigital Library
- Mahesh V. Joshi and Vipin Kumar and Ramesh C. Agarwal. Evaluating Boosting Algorithms to Classify Rare Classes: Comparison And Improvements. ICDM '01: Proc. of 1st IEEE International Conference on Data Mining, 2001. Google ScholarDigital Library
- Kubat, Miroslav and Holte, Robert C. and Matwin, Stan. Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Mach. Learn, 30:2--3, 1998:195--215. Google ScholarDigital Library
- Langford, John and Li, Lihong and Zhang, Tong. Sparse Online Learning via Truncated Gradient. J. Mach. Learn. Res., 10, 2009: 777--801. Google ScholarDigital Library
- Andrew Mccallum and Kamal Nigam. Employing EM in pool-based active learning for text classification. Proc. of the 15th International Conference on Machine Learning, 1998. 350--358. Google ScholarDigital Library
- Andrew McCallum and Kamal Nigam. A comparison of event models for Naive Bayes text classification. A. McCallum and K. Nigam. A comparison of event models for Naive Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998. 1998.Google Scholar
- Dunja Mladenic and Marko Grobelnik. Feature selection for unbalanced class distribution and Naive Bayes. Proc. of the 16th International Conference on Machine Learning (ICML), 1999: 258--267. Google ScholarDigital Library
- Kamal Nigam. Using maximum entropy for text classification. IJCAI-99 Workshop on Machine Learning for Information Filtering, 1999: 61--67.Google Scholar
- Foster Provost. Machine Learning from Imbalanced Data Sets 101 (Extended Abstract. Proc. of AAAI Workshop on Imbalanced Data Sets, 2000"Google Scholar
- Quinlan, J. Ross. C4.5: programs for machine learning, 1993. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- Masashi Sugiyama and Klaus-robert Müller. Model selection under covariate shift. Proc. of the International Conference on Artificial Neural Networks, 2005. Springer. Google ScholarDigital Library
- Yuchun Tang and S. Rrasser and P. Judge and Yan-Qing Zhang. Fast and Effective Spam Sender Detection with Granular SVM on Highly Imbalanced Mail Server Behavior Data. International Conference on Collaborative Computing: Networking, Applications and Worksharing, 2006: 27.Google Scholar
- Tong, Simon and Koller, Daphne. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res., 2, 2002: 45--66. Google ScholarDigital Library
- Kevin Woods and Jeffrey Solka and Carey Priebe and Christopher Doss and Kevin Bowyer and Larence Clarke. Comparative Evaluation of Pattern Recognition Techniques for Detection of Microcalcifications. J. of Intelligent Automation, 1993.Google Scholar
- Rong Yan and Yan Liu and Rong Jin and Alex Hauptmann. On Predicting Rare Classes With Svm Ensembles In Scene Classification. In ICASSP, 2003: 21--24.Google Scholar
- Zadrozny, Bianca and Elkan, Charles. Transforming classifier scores into accurate multiclass probability estimates. Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2002:694--699. Edmonton, Alberta, Canada. Google ScholarDigital Library
Index Terms
- A large-scale active learning system for topical categorization on the web
Recommendations
Multiple-Instance Active Learning for Image Categorization
MMM '09: Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia ModelingBoth multiple-instance learning and active learning are widely employed in image categorization, but generally they are applied separately. This paper studies the integration of these two methods. Different from typical active learning approaches, the ...
Combining active learning and semi-supervised for improving learning performance
ISABEL '11: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication TechnologiesIn many learning tasks, there are abundant unlabeled samples but the number of labeled training samples is limited, because labeling the samples requires the efforts of human annotators and expertise. There are three major techniques for labeling the ...
Large-scale text categorization by batch mode active learning
WWW '06: Proceedings of the 15th international conference on World Wide WebLarge-scale text categorization is an important research topic for Web data mining. One of the challenges in large-scale text categorization is how to reduce the human efforts in labeling text documents for building reliable classification models. In ...
Comments