Abstract
This paper describes an application of SVM (Support Vector Machines) to interactive document retrieval using active learning. Some works have been done to apply classification learning like SVM to relevance feedback and have obtained successful results. However they did not fully utilize characteristic of example distribution in document retrieval. We propose heuristics to bias document showing for user’s judgement according to distribution of examples in document retrieval. This heuristics is executed by selecting examples to show a user in neighbors of positive support vectors, and it improves learning efficiency. We implemented a SVM-based interactive document retrieval system using our proposed heuristics, and compared it with conventional systems like Rocchio-based system and a SVM-based system without the heuristics. We conducted systematic experiments using large data sets including over 500,000 newspaper articles and confirmed our system outperformed other ones.
Similar content being viewed by others
References
Cortes, C. and Vapnik,V., “Support vector networks,” Machine Learning, 20, pp. 273–297, 1995.
Drucker, H., Shahrary, B. and Gibbon, D. C. “Relevance feedback using support vector machines,” in Proc. of the 18th Int’l Conf. on Machine Learning, pp. 122–129, 2001.
Drucker, H., Wu, D. and Vapnik, V.N., “Support vector machines for spam categorization,” IEEE Transaction on Neural Networks, 10, pp. 1048–1054, 1999.
Dumais, S.T., Platt, J., Heckerman, D. and Sahami, M., “Inductive learning algorithms and representations for text categorization,” in Proc. of the 17th Int’l Conf. on Information and Knowledge Management pp. 148–155, 1998.
Joachim, T., “Text categorization with support vector machines: Learning with many relevant features,” in Proc. of the 10th European Conf. on Machine Learning pp. 137–142, 1998.
Melville, P. and Mooney, R.J., “Diverse ensembles for active learning,” in Proc. of the 21st Int’l Conf. on Machine Learning pp. 584–591, 2004.
Okabe, M. and Yamada, S., “Learning filtering rulesets for ranking refinement in relevance feedback,” Knowledge-Based Systems, 18(2-3) pp. 117–124, 2005.
Onoda, T., Murata, H. and Yamada, S., “Non-relevance feedback document retrieval based on one class svm and svdd,” in Proc. of 2006 IEEE World Cong. on Computational Intelligence pp. 2191–2198, 2006.
Onoda, T., Murata, H. and Yamada, S., “Support vector machines based active learning for the relevance feedback document retrieval,” in Proc. of the Int’l Workshop on Intelligent Web Interaction pp. 393–396, 2006.
Rocchio, J., “Relevance feedback in information retrieval,” in The smart system-experiments in automatic document processing pp. 313–323, Prentice Hall, Englewood Cliffs, N.J., 1971.
Roy, N. and McCallum, A., “Toward optimal active learning through sampling estimation of error reduction,” in Proc. of the 18th Int’l Conf. on Machine Learning pp. 441–448, 2001.
Salton, G. and McGill, J., Introduction to modern information retrieval McGraw-Hill, 1983.
Schapire, R.E., Singer, Y. and Singhal, A., “Boosting and rocchio applied to text filtering,” in Proc. of the 21st Annual Int’l ACM SIGIR pp. 215–223, 1998.
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. and Williamson, R., “Estimating the support for a high-dimensional distribution,” Technical Report MSRTR-99-87, Microsoft Research 1999.
Tong, S. and Koller, D., “Support vector machine active learning with applications to text classification,” Journal of Machine Learning Research, 2 pp. 45–66, 2001.
TREC Web page. http://trec.nist.gov/.
Tsuge, S., Shishibori, M., Kuroiwa, S., Tanaka, Y., Hirai, T., Okamoto, R. and Kita, K., “Relevance feedback with support vector machine for information retrieval,” in Proc. of Int’l Conf. on Computer Processing of Oriental Languages pp. 35–40, 2001.
Vapnik, V.N., The Nature of Statistical Learning Theory Springer, 1995.
Vapnik, V.N., Statistical Learning Theory Wiley, New York, 1998.
Warmuth, M.K., Rätsch, G., Mathieson, M., Liao, J. and Lemmen, C., “Active learning in the drug discovery process,” in Advances in Neural Information Processing Systems, Vol. 14 2002.
Yates, R.B. and Neto, B.R., Modern Information Retrieval Addison Wesley, 1999.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Onoda, T., Murata, H. & Yamada, S. SVM-based Interactive Document Retrieval with Active Learning. New Gener. Comput. 26, 49–61 (2007). https://doi.org/10.1007/s00354-007-0034-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00354-007-0034-4