Skip to main content
Log in

SVM-based Interactive Document Retrieval with Active Learning

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

This paper describes an application of SVM (Support Vector Machines) to interactive document retrieval using active learning. Some works have been done to apply classification learning like SVM to relevance feedback and have obtained successful results. However they did not fully utilize characteristic of example distribution in document retrieval. We propose heuristics to bias document showing for user’s judgement according to distribution of examples in document retrieval. This heuristics is executed by selecting examples to show a user in neighbors of positive support vectors, and it improves learning efficiency. We implemented a SVM-based interactive document retrieval system using our proposed heuristics, and compared it with conventional systems like Rocchio-based system and a SVM-based system without the heuristics. We conducted systematic experiments using large data sets including over 500,000 newspaper articles and confirmed our system outperformed other ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cortes, C. and Vapnik,V., “Support vector networks,” Machine Learning, 20, pp. 273–297, 1995.

    MATH  Google Scholar 

  2. Drucker, H., Shahrary, B. and Gibbon, D. C. “Relevance feedback using support vector machines,” in Proc. of the 18th Int’l Conf. on Machine Learning, pp. 122–129, 2001.

  3. Drucker, H., Wu, D. and Vapnik, V.N., “Support vector machines for spam categorization,” IEEE Transaction on Neural Networks, 10, pp. 1048–1054, 1999.

    Article  Google Scholar 

  4. Dumais, S.T., Platt, J., Heckerman, D. and Sahami, M., “Inductive learning algorithms and representations for text categorization,” in Proc. of the 17th Int’l Conf. on Information and Knowledge Management pp. 148–155, 1998.

  5. IREX. http://cs.nyu.edu/cs/projects/proteus/irex/.

  6. Joachim, T., “Text categorization with support vector machines: Learning with many relevant features,” in Proc. of the 10th European Conf. on Machine Learning pp. 137–142, 1998.

  7. Melville, P. and Mooney, R.J., “Diverse ensembles for active learning,” in Proc. of the 21st Int’l Conf. on Machine Learning pp. 584–591, 2004.

  8. NTCIR. http://www.rd.nacsis.ac.jp/~ntcadm/.

  9. Okabe, M. and Yamada, S., “Learning filtering rulesets for ranking refinement in relevance feedback,” Knowledge-Based Systems, 18(2-3) pp. 117–124, 2005.

    Article  Google Scholar 

  10. Onoda, T., Murata, H. and Yamada, S., “Non-relevance feedback document retrieval based on one class svm and svdd,” in Proc. of 2006 IEEE World Cong. on Computational Intelligence pp. 2191–2198, 2006.

  11. Onoda, T., Murata, H. and Yamada, S., “Support vector machines based active learning for the relevance feedback document retrieval,” in Proc. of the Int’l Workshop on Intelligent Web Interaction pp. 393–396, 2006.

  12. Rocchio, J., “Relevance feedback in information retrieval,” in The smart system-experiments in automatic document processing pp. 313–323, Prentice Hall, Englewood Cliffs, N.J., 1971.

  13. Roy, N. and McCallum, A., “Toward optimal active learning through sampling estimation of error reduction,” in Proc. of the 18th Int’l Conf. on Machine Learning pp. 441–448, 2001.

  14. Salton, G. and McGill, J., Introduction to modern information retrieval McGraw-Hill, 1983.

  15. Schapire, R.E., Singer, Y. and Singhal, A., “Boosting and rocchio applied to text filtering,” in Proc. of the 21st Annual Int’l ACM SIGIR pp. 215–223, 1998.

  16. Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. and Williamson, R., “Estimating the support for a high-dimensional distribution,” Technical Report MSRTR-99-87, Microsoft Research 1999.

  17. Tong, S. and Koller, D., “Support vector machine active learning with applications to text classification,” Journal of Machine Learning Research, 2 pp. 45–66, 2001.

    Article  Google Scholar 

  18. TREC Web page. http://trec.nist.gov/.

  19. Tsuge, S., Shishibori, M., Kuroiwa, S., Tanaka, Y., Hirai, T., Okamoto, R. and Kita, K., “Relevance feedback with support vector machine for information retrieval,” in Proc. of Int’l Conf. on Computer Processing of Oriental Languages pp. 35–40, 2001.

  20. Vapnik, V.N., The Nature of Statistical Learning Theory Springer, 1995.

  21. Vapnik, V.N., Statistical Learning Theory Wiley, New York, 1998.

    MATH  Google Scholar 

  22. Warmuth, M.K., Rätsch, G., Mathieson, M., Liao, J. and Lemmen, C., “Active learning in the drug discovery process,” in Advances in Neural Information Processing Systems, Vol. 14 2002.

  23. Yates, R.B. and Neto, B.R., Modern Information Retrieval Addison Wesley, 1999.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takashi Onoda.

About this article

Cite this article

Onoda, T., Murata, H. & Yamada, S. SVM-based Interactive Document Retrieval with Active Learning. New Gener. Comput. 26, 49–61 (2007). https://doi.org/10.1007/s00354-007-0034-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-007-0034-4

Keywords

Navigation