skip to main content
10.1145/1281192.1281258acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Practical learning from one-sided feedback

Published:12 August 2007Publication History

ABSTRACT

In many data mining applications, online labeling feedback is only available for examples which were predicted to belong to the positive class. Such applications includespam filtering in the case where users never checkemails marked "spam", document retrieval where users cannotgive relevance feedback on unretrieved documents,and online advertising where user behavior cannot beobserved for unshown advertisements. One-sided feedback can cripple the performance of classical mistake-driven online learners such as Perceptron. Previous work under the Apple Tasting framework showed how to transform standard online learners into successful learners from one sided feedback. However, we find in practice that this transformation may request more labels than necessary to achieve strong performance. In this paper,we employ two active learning methods which reduce the number of labels requested in practice. One method is the use of Label Efficient active learning. The other method,somewhat surprisingly, is the use of margin-based learners without modification, which we show combines implicit active learning and a greedy strategy to managing the exploration exploitation tradeoff. Experimental results show that these methods can be significantly more effective in practice than those using the Apple Tasting transformation, even on minority class problems.

References

  1. N. Abe and T. Kamba. A web marketing system with automatic pricing. Comput. Networks, 33(1--6): 775--788, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 7: 1205--1230, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Mach. Learn., 15(2):201--221, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. V. Cormack. TREC 2006 spam track overview. In The Fifteenth Text REtrieval Conference (TREC 2006) Proceedings, 2006.Google ScholarGoogle Scholar
  5. G. V. Cormack and T. R. Lynam. TREC 2005 spam track overview. In The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings, 2005.Google ScholarGoogle Scholar
  6. S. Dasgupta. Analysis of a greedy active learning strategy. NIPS: Advances in Neural Information Processing Systems, 2004.Google ScholarGoogle Scholar
  7. Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2--3):133----168, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Helmbold and S. Panizza. Some label efficient learning results. In COLT '97: Proceedings of the tenth annual conference on Computational learning theory, pages 218--230, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. P. Helmbold, N. Littlestone, and P. M. Long. Apple tasting. Inf. Comput., 161(2): 85--139, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Hettich and S. D. Bay. The UCI KDD archive. Technical report, 1999.Google ScholarGoogle Scholar
  11. W. Krauth and M. Mézard. Learning algorithms with optimal stability in neural networks. Journal of Physics A, 20(11):745--752, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3--12, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Mach. Learn., 2(4): 285--318, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Platt. Sequenital minimal optimization: A fast algorithm for training support vector machines. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods -- Support Vector Learning. MIT Press, 1998.Google ScholarGoogle Scholar
  15. F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386--407, 1958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In ICML '01: Proceedings of the Eighteenth International Conference on Machine Learning, pages 441--448, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Readings in information retrieval, pages 355--364, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In ICML'00: Proceedings of the Seventeenth International Conference on Machine Learning, pages 839--846, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Sculley and G. Wachman. Relaxed online support vector machines for spam filtering. In To appear in The Thirtieth Annual ACM SIGIR Conference Proceedings, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Practical learning from one-sided feedback

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2007
        1080 pages
        ISBN:9781595936097
        DOI:10.1145/1281192

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader