skip to main content
research-article

Efficient online learning for multitask feature selection

Authors Info & Claims
Published:02 August 2013Publication History
Skip Abstract Section

Abstract

Learning explanatory features across multiple related tasks, or MultiTask Feature Selection (MTFS), is an important problem in the applications of data mining, machine learning, and bioinformatics. Previous MTFS methods fulfill this task by batch-mode training. This makes them inefficient when data come sequentially or when the number of training data is so large that they cannot be loaded into the memory simultaneously. In order to tackle these problems, we propose a novel online learning framework to solve the MTFS problem. A main advantage of the online algorithm is its efficiency in both time complexity and memory cost. The weights of the MTFS models at each iteration can be updated by closed-form solutions based on the average of previous subgradients. This yields the worst-case bounds of the time complexity and memory cost at each iteration, both in the order of O(d × Q), where d is the number of feature dimensions and Q is the number of tasks. Moreover, we provide theoretical analysis for the average regret of the online learning algorithms, which also guarantees the convergence rate of the algorithms. Finally, we conduct detailed experiments to show the characteristics and merits of the online learning algorithms in solving several MTFS problems.

References

  1. Aaker, D. A., Kumar, V., and Day, G. S. 2006. Marketing Research 9th Ed. Wiley.Google ScholarGoogle Scholar
  2. Ando, R. K. and Zhang, T. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817--1853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Argyriou, A., Evgeniou, T., and Pontil, M. 2006. Multi-task feature learning. Adv. Neural Inf. Process. Syst. 19, 41--48.Google ScholarGoogle Scholar
  4. Argyriou, A., Evgeniou, T., and Pontil, M. 2008. Convex multi-task feature learning. Mach. Learn. 73, 3, 243--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bai, J., Zhou, K., Xue, G.-R., Zha, H., Sun, G., Tseng, B. L., Zheng, Z., and Chang, Y. 2009. Multi-task learning for learning to rank in web search. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09). 1549--1552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bakker, B. and Heskes, T. 2003. Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4, 83--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Balakrishnan, S. and Madigan, D. 2008. Algorithms for sparse linear classifiers in the massive data setting. J. Mach. Learn. Res. 9, 313--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ben-David, S. and Borbely, R. S. 2008. A notion of task relatedness yielding provable multiple-task learning guarantees. Mach. Learn. 73, 3, 273--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ben-David, S. and Schuller, R. 2003. Exploiting task relatedness for mulitple task learning. In Proceedings of the 16th Annual Conference on Learning Theory (COLT'03). 567--580.Google ScholarGoogle Scholar
  10. Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Caruana, R. 1997. Multitask learning. Mach. Learn. 28, 1, 41--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen, J., Liu, J., and Ye, J. 2010. Learning incoherent sparse and low-rank patterns from multiple tasks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'10). 1179--1188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chen, J., Liu, J., and Ye, J. 2012. Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data 5, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chen, J., Tang, L., Liu, J., and Ye, J. 2009a. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML'09). 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chen, X., Pan, W., Kwok, J. T., and Carbonell, J. G. 2009b. Accelerated gradient method for multi-task sparse learning problem. In Proceedings of the 9th IEEE International Conference on Data Mining (ICDM'09). 746--751. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dekel, O., Long, P. M., and Singer, Y. 2006. Online multitask learning. In Proceedings of the 19th Annual Conference on Learning Theory (COLT'06). 453--467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dhillon, P. S., Foster, D. P., and Ungar, L. H. 2011. Minimum description length penalization for group and multi-task sparse learning. J. Mach. Learn. Res. 12, 525--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Dhillon, P. S., Tomasik, B., Foster, D. P., and Ungar, L. H. 2009. Multi-task feature selection using the multiple inclusion criterion (mic). In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD'09). 276--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Duchi, J. and Singer, Y. 2009. Efficient learning using forward-backward splitting. J. Mach. Learn. Res. 10, 2873--2898. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Evgeniou, T., Micchelli, C. A., and Pontil, M. 2005. Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615--637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Evgeniou, T. and Pontil, M. 2004. Regularized multi--task learning. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). 109--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Friedman, J., Hastie, T., and Tibshirani, R. 2010. A note on the group lasso and a sparse group lasso. http://arxiv.org/pdf/1001.0736.pdf.Google ScholarGoogle Scholar
  23. Han, J. and Kamber, M. 2006. Data Mining: Concepts and Techniques 2nd Ed. Morgan Kaufmann, San Fransisco. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Han, Y., Wu, F., Jia, J., Zhuang, Y., and Yu, B. 2010. Multi-task sparse discriminant analysis (mtsda) with overlapping categories. In Proceedings of the 24th AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  25. Hazan, E., Agarwal, A., and Kale, S. 2007. Logarithmic regret algorithms for online convex optimization. Mach. Learn. 69, 2--3, 169--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hu, C., Kwok, J., and Pan, W. 2009. Accelerated gradient methods for stochastic optimization and online learning. Adv. Neural Inf. Process. Syst. 22, 781--789.Google ScholarGoogle Scholar
  27. Jebara, T. 2004. Multi-task feature and kernel selection for svms. In Proceedings of the 21st International Conference on Machine Learning (ICML'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jebara, T. 2011. Multitask sparsity via maximum entropy discrimination. J. Mach. Learn. Res. 12, 75--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Langford, J., Li, L., and Zhang, T. 2009. Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 777--801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lenk, P. J., Desarbo, W. S., Green, P. E., and Young, M. R. 1996. Hierarchical bayes conjoint analysis: Recovery of partworth heterogeneity from reduced experimental designs. Market. Sci. 15, 2, 173--191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ling, G., Yang, H., King, I., and Lyu, M. R. 2012. Online learning for collaborative filtering. In Proceedings of the IEEE World Congress on Computational Intelligence (WCCI'12). 1--8.Google ScholarGoogle Scholar
  32. Liu, H., Palatucci, M., and Zhang, J. 2009a. Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML'09). 649--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Liu, J., Chen, J., and Ye, J. 2009b. Large-scale sparse logistic regression. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 547--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Liu, J., Ji, S., and Ye, J. 2009c. Multi-task feature learning via efficient l2; 1 norm minimization. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Liu, J., Ji, S., and Ye., J. 2009d. Slep: Sparse learning with efficient projections. http://www.public.asu.edu/ye02/Software/SLEP.Google ScholarGoogle Scholar
  36. Nesterov, Y. 2009. Primal-dual subgradient methods for convex problems. Math. Program. 120, 1, 221--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Obozinski, G., Taskar, B., and Jordan, M. I. 2009. Joint covariate selection and joint subspace selection for multiple classification problems. Statist. Comput. 20, 2, 231--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Pong, T. K., Tseng, P., Ji, S., and Ye, J. 2010. Trace norm regularization: Reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20, 6, 3465--3489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Quattoni, A., Carreras, X., Collins, M., and Darrell, T. 2009. An efficient projection for l1, ∞ regularization. In Proceedings of the 26th International Conference on Machine Learning (ICML'09), 857--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shalev-Shwartz, S. and Singer, Y. 2007. A primal-dual perspective of online learning algorithms. Mach. Learn. 69, 2--3, 115--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Shalev-Shwartz, S., Singer, Y., and Srebro, N. 2007. Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the 24th International Conference on Machine learning (ICML'07). 807--814. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B58, 1, 267--288.Google ScholarGoogle Scholar
  43. Vapnik, V. 1999. The Nature of Statistical Learning Theory 2nd Ed. Springer, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xiao, L. 2010. Dual averaging method for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543--2596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xu, Z., Jin, R., Yang, H., King, I., and Lyu, M. R. 2010. Simple and efficient multiple kernel learning by group lasso. In Proceedings of the 27th International Conference on Machine Learning (ICML'10). 1175--1182.Google ScholarGoogle Scholar
  46. Yang, H., King, I., and Lyu, M. R. 2010a. Multi-task learning for one-class classification. In Proceedings of the International Joint Conference on Neural Networks (IJCNN'10). 1--8.Google ScholarGoogle Scholar
  47. Yang, H., King, I., and Lyu, M. R. 2010b. Online learning for multi-task feature selection. In Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM'10). 1693--1696. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yang, H., King, I., and Lyu, M. R. 2011a. Sparse Learning under Regularization Framework 1st Ed. Lambert Academic Publishing.Google ScholarGoogle Scholar
  49. Yang, H., Xu, Z., King, I., and Lyu, M. R. 2010c. Online learning for group lasso. In Proceedings of the 27th International Conference on Machine Learning (ICML'10). 1191--1198.Google ScholarGoogle Scholar
  50. Yang, H., Xu, Z., Ye, J., King, I., and Lyu, M. R. 2011b. Efficient sparse generalized multiple kernel learning. IEEE Trans. Neural Netw. 22, 3, 433--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zhang, Y. 2010. Multi-task active learning with output constraints. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI'10).Google ScholarGoogle Scholar
  52. Zhang, Y., Yeung, D.-Y., and Xu, Q. 2010. Probabilistic multi-task feature selection. Adv. Neural Inf. Process. Syst. 23, 2559--2567.Google ScholarGoogle Scholar
  53. Zhao, P., Hoi, S. C. H., and Jin, R. 2011a. Double updating online learning. J. Mach. Learn. Res. 12, 1587--1615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Zhao, P., Hoi, S. C. H., Jin, R., and Yang, T. 2011b. Online auc maximization. In Proceedings of the 28th International Conference on Machine Learning (ICML'11). 233--240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Zhou, Y., Jin, R., and Hoi, S. C. 2010. Exclusive lasso for multi-task feature selection. In Proceeding of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS'10).Google ScholarGoogle Scholar
  56. Zou, H. and Hastie, T. 2005. Regularization and variable selection via the elastic net. J. Royal Statist. Soc. B67, 301--320.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient online learning for multitask feature selection

      Recommendations

      Reviews

      Anca Doloc-Mihu

      Web search algorithms such as multitask feature selection (MTFS) aim to efficiently learn explanatory features from multiple related user tasks simultaneously. The algorithms extract shared information from these tasks and determine the relative weights, which are systematically adjusted when more information comes in. While existing MTFS algorithms use batch-mode training (learning the weights all at once), this paper proposes a new online learning method called dual-averaging MTFS (DA-MTFS), which is capable of learning the weights sequentially. The paper asserts that this algorithm outperforms the standard MTFS method in both time complexity and memory cost. The DA-MTFS algorithm consists of three steps at each iteration: calculate the subgradient of the loss function on the weights, calculate the average of the subgradients, and update the weights. The weights are updated sequentially by deriving closed-form solutions based on the average of previous gradients. The time complexity and memory cost at each iteration is a maximum of O ( d x Q ), where d is the number of feature dimensions and Q is the number of tasks. The authors prove the convergence rate of their algorithm via theoretical analysis and then apply it to real-world data about student ratings of personal computers [1]. Experimental data shows that the DA-MTFS algorithm has a performance close to the performance of the corresponding batch-trained algorithms, but at a lower time and memory cost. For those interested in online web-based learning algorithms, this paper is worth reading. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 7, Issue 2
        July 2013
        107 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/2499907
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 August 2013
        • Accepted: 1 November 2012
        • Revised: 1 July 2012
        • Received: 1 January 2011
        Published in tkdd Volume 7, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader