research-article

Efficient online learning for multitask feature selection

Authors:
Haiqin Yang

The Chinese University of Hong Kong, Shenzhen, China

The Chinese University of Hong Kong, Shenzhen, China
View Profile

,
Michael R. Lyu

The Chinese University of Hong Kong, Shenzhen, China

The Chinese University of Hong Kong, Shenzhen, China
View Profile

,
Irwin King

The Chinese University of Hong Kong, Shenzhen, China

The Chinese University of Hong Kong, Shenzhen, China
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 7 Issue 2Article No.: 6pp 1–27https://doi.org/10.1145/2499907.2499909

Published:02 August 2013Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Learning explanatory features across multiple related tasks, or MultiTask Feature Selection (MTFS), is an important problem in the applications of data mining, machine learning, and bioinformatics. Previous MTFS methods fulfill this task by batch-mode training. This makes them inefficient when data come sequentially or when the number of training data is so large that they cannot be loaded into the memory simultaneously. In order to tackle these problems, we propose a novel online learning framework to solve the MTFS problem. A main advantage of the online algorithm is its efficiency in both time complexity and memory cost. The weights of the MTFS models at each iteration can be updated by closed-form solutions based on the average of previous subgradients. This yields the worst-case bounds of the time complexity and memory cost at each iteration, both in the order of O(d × Q), where d is the number of feature dimensions and Q is the number of tasks. Moreover, we provide theoretical analysis for the average regret of the online learning algorithms, which also guarantees the convergence rate of the algorithms. Finally, we conduct detailed experiments to show the characteristics and merits of the online learning algorithms in solving several MTFS problems.

References

Aaker, D. A., Kumar, V., and Day, G. S. 2006. Marketing Research 9^th Ed. Wiley.Google Scholar
Ando, R. K. and Zhang, T. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817--1853. Google ScholarDigital Library
Argyriou, A., Evgeniou, T., and Pontil, M. 2006. Multi-task feature learning. Adv. Neural Inf. Process. Syst. 19, 41--48.Google Scholar
Argyriou, A., Evgeniou, T., and Pontil, M. 2008. Convex multi-task feature learning. Mach. Learn. 73, 3, 243--272. Google ScholarDigital Library
Bai, J., Zhou, K., Xue, G.-R., Zha, H., Sun, G., Tseng, B. L., Zheng, Z., and Chang, Y. 2009. Multi-task learning for learning to rank in web search. In Proceedings of the 18^th ACM Conference on Information and Knowledge Management (CIKM'09). 1549--1552. Google ScholarDigital Library
Bakker, B. and Heskes, T. 2003. Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4, 83--99. Google ScholarDigital Library
Balakrishnan, S. and Madigan, D. 2008. Algorithms for sparse linear classifiers in the massive data setting. J. Mach. Learn. Res. 9, 313--337. Google ScholarDigital Library
Ben-David, S. and Borbely, R. S. 2008. A notion of task relatedness yielding provable multiple-task learning guarantees. Mach. Learn. 73, 3, 273--287. Google ScholarDigital Library
Ben-David, S. and Schuller, R. 2003. Exploiting task relatedness for mulitple task learning. In Proceedings of the 16^th Annual Conference on Learning Theory (COLT'03). 567--580.Google Scholar
Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press. Google ScholarDigital Library
Caruana, R. 1997. Multitask learning. Mach. Learn. 28, 1, 41--75. Google ScholarDigital Library
Chen, J., Liu, J., and Ye, J. 2010. Learning incoherent sparse and low-rank patterns from multiple tasks. In Proceedings of the 16^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'10). 1179--1188. Google ScholarDigital Library
Chen, J., Liu, J., and Ye, J. 2012. Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data 5, 4. Google ScholarDigital Library
Chen, J., Tang, L., Liu, J., and Ye, J. 2009a. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the 26^th Annual International Conference on Machine Learning (ICML'09). 18. Google ScholarDigital Library
Chen, X., Pan, W., Kwok, J. T., and Carbonell, J. G. 2009b. Accelerated gradient method for multi-task sparse learning problem. In Proceedings of the 9^th IEEE International Conference on Data Mining (ICDM'09). 746--751. Google ScholarDigital Library
Dekel, O., Long, P. M., and Singer, Y. 2006. Online multitask learning. In Proceedings of the 19^th Annual Conference on Learning Theory (COLT'06). 453--467. Google ScholarDigital Library
Dhillon, P. S., Foster, D. P., and Ungar, L. H. 2011. Minimum description length penalization for group and multi-task sparse learning. J. Mach. Learn. Res. 12, 525--564. Google ScholarDigital Library
Dhillon, P. S., Tomasik, B., Foster, D. P., and Ungar, L. H. 2009. Multi-task feature selection using the multiple inclusion criterion (mic). In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD'09). 276--289. Google ScholarDigital Library
Duchi, J. and Singer, Y. 2009. Efficient learning using forward-backward splitting. J. Mach. Learn. Res. 10, 2873--2898. Google ScholarDigital Library
Evgeniou, T., Micchelli, C. A., and Pontil, M. 2005. Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615--637. Google ScholarDigital Library
Evgeniou, T. and Pontil, M. 2004. Regularized multi--task learning. In Proceedings of the 10^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). 109--117. Google ScholarDigital Library
Friedman, J., Hastie, T., and Tibshirani, R. 2010. A note on the group lasso and a sparse group lasso. http://arxiv.org/pdf/1001.0736.pdf.Google Scholar
Han, J. and Kamber, M. 2006. Data Mining: Concepts and Techniques 2^nd Ed. Morgan Kaufmann, San Fransisco. Google ScholarDigital Library
Han, Y., Wu, F., Jia, J., Zhuang, Y., and Yu, B. 2010. Multi-task sparse discriminant analysis (mtsda) with overlapping categories. In Proceedings of the 24^th AAAI Conference on Artificial Intelligence.Google Scholar
Hazan, E., Agarwal, A., and Kale, S. 2007. Logarithmic regret algorithms for online convex optimization. Mach. Learn. 69, 2--3, 169--192. Google ScholarDigital Library
Hu, C., Kwok, J., and Pan, W. 2009. Accelerated gradient methods for stochastic optimization and online learning. Adv. Neural Inf. Process. Syst. 22, 781--789.Google Scholar
Jebara, T. 2004. Multi-task feature and kernel selection for svms. In Proceedings of the 21^st International Conference on Machine Learning (ICML'04). Google ScholarDigital Library
Jebara, T. 2011. Multitask sparsity via maximum entropy discrimination. J. Mach. Learn. Res. 12, 75--110. Google ScholarDigital Library
Langford, J., Li, L., and Zhang, T. 2009. Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 777--801. Google ScholarDigital Library
Lenk, P. J., Desarbo, W. S., Green, P. E., and Young, M. R. 1996. Hierarchical bayes conjoint analysis: Recovery of partworth heterogeneity from reduced experimental designs. Market. Sci. 15, 2, 173--191.Google ScholarDigital Library
Ling, G., Yang, H., King, I., and Lyu, M. R. 2012. Online learning for collaborative filtering. In Proceedings of the IEEE World Congress on Computational Intelligence (WCCI'12). 1--8.Google Scholar
Liu, H., Palatucci, M., and Zhang, J. 2009a. Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In Proceedings of the 26^th Annual International Conference on Machine Learning (ICML'09). 649--656. Google ScholarDigital Library
Liu, J., Chen, J., and Ye, J. 2009b. Large-scale sparse logistic regression. In Proceedings of the 15^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 547--556. Google ScholarDigital Library
Liu, J., Ji, S., and Ye, J. 2009c. Multi-task feature learning via efficient l2; 1 norm minimization. In Proceedings of the 25^th Conference on Uncertainty in Artificial Intelligence (UAI'09). Google ScholarDigital Library
Liu, J., Ji, S., and Ye., J. 2009d. Slep: Sparse learning with efficient projections. http://www.public.asu.edu/ye02/Software/SLEP.Google Scholar
Nesterov, Y. 2009. Primal-dual subgradient methods for convex problems. Math. Program. 120, 1, 221--259. Google ScholarDigital Library
Obozinski, G., Taskar, B., and Jordan, M. I. 2009. Joint covariate selection and joint subspace selection for multiple classification problems. Statist. Comput. 20, 2, 231--252. Google ScholarDigital Library
Pong, T. K., Tseng, P., Ji, S., and Ye, J. 2010. Trace norm regularization: Reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20, 6, 3465--3489. Google ScholarDigital Library
Quattoni, A., Carreras, X., Collins, M., and Darrell, T. 2009. An efficient projection for l1, ∞ regularization. In Proceedings of the 26^th International Conference on Machine Learning (ICML'09), 857--864. Google ScholarDigital Library
Shalev-Shwartz, S. and Singer, Y. 2007. A primal-dual perspective of online learning algorithms. Mach. Learn. 69, 2--3, 115--142. Google ScholarDigital Library
Shalev-Shwartz, S., Singer, Y., and Srebro, N. 2007. Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the 24^th International Conference on Machine learning (ICML'07). 807--814. Google ScholarDigital Library
Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B58, 1, 267--288.Google Scholar
Vapnik, V. 1999. The Nature of Statistical Learning Theory 2^nd Ed. Springer, New York. Google ScholarDigital Library
Xiao, L. 2010. Dual averaging method for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543--2596. Google ScholarDigital Library
Xu, Z., Jin, R., Yang, H., King, I., and Lyu, M. R. 2010. Simple and efficient multiple kernel learning by group lasso. In Proceedings of the 27^th International Conference on Machine Learning (ICML'10). 1175--1182.Google Scholar
Yang, H., King, I., and Lyu, M. R. 2010a. Multi-task learning for one-class classification. In Proceedings of the International Joint Conference on Neural Networks (IJCNN'10). 1--8.Google Scholar
Yang, H., King, I., and Lyu, M. R. 2010b. Online learning for multi-task feature selection. In Proceedings of the 19^th ACM Conference on Information and Knowledge Management (CIKM'10). 1693--1696. Google ScholarDigital Library
Yang, H., King, I., and Lyu, M. R. 2011a. Sparse Learning under Regularization Framework 1^st Ed. Lambert Academic Publishing.Google Scholar
Yang, H., Xu, Z., King, I., and Lyu, M. R. 2010c. Online learning for group lasso. In Proceedings of the 27^th International Conference on Machine Learning (ICML'10). 1191--1198.Google Scholar
Yang, H., Xu, Z., Ye, J., King, I., and Lyu, M. R. 2011b. Efficient sparse generalized multiple kernel learning. IEEE Trans. Neural Netw. 22, 3, 433--446. Google ScholarDigital Library
Zhang, Y. 2010. Multi-task active learning with output constraints. In Proceedings of the 24^th AAAI Conference on Artificial Intelligence (AAAI'10).Google Scholar
Zhang, Y., Yeung, D.-Y., and Xu, Q. 2010. Probabilistic multi-task feature selection. Adv. Neural Inf. Process. Syst. 23, 2559--2567.Google Scholar
Zhao, P., Hoi, S. C. H., and Jin, R. 2011a. Double updating online learning. J. Mach. Learn. Res. 12, 1587--1615. Google ScholarDigital Library
Zhao, P., Hoi, S. C. H., Jin, R., and Yang, T. 2011b. Online auc maximization. In Proceedings of the 28^th International Conference on Machine Learning (ICML'11). 233--240.Google ScholarDigital Library
Zhou, Y., Jin, R., and Hoi, S. C. 2010. Exclusive lasso for multi-task feature selection. In Proceeding of the 14^th International Conference on Artificial Intelligence and Statistics (AISTATS'10).Google Scholar
Zou, H. and Hastie, T. 2005. Regularization and variable selection via the elastic net. J. Royal Statist. Soc. B67, 301--320.Google ScholarCross Ref

Index Terms

Efficient online learning for multitask feature selection
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Online learning for multi-task feature selection
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Multi-task feature selection (MTFS) is an important tool to learn the explanatory features across multiple related tasks. Previous MTFS methods fulfill this task in batch-mode training. This makes them inefficient when data come in sequence or when the ...
Read More
Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Lifelong multitask learning is a multitask learning framework in which a learning agent faces the tasks that need to be learnt in an online manner. Lifelong multitask learning framework may be applied to a variety of applications such as image ...
Read More
Multitask Learning
Special issue on inductive transfer

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared ...
Read More

Reviews

Reviewer: Anca Doloc-Mihu

Web search algorithms such as multitask feature selection (MTFS) aim to efficiently learn explanatory features from multiple related user tasks simultaneously. The algorithms extract shared information from these tasks and determine the relative weights, which are systematically adjusted when more information comes in. While existing MTFS algorithms use batch-mode training (learning the weights all at once), this paper proposes a new online learning method called dual-averaging MTFS (DA-MTFS), which is capable of learning the weights sequentially. The paper asserts that this algorithm outperforms the standard MTFS method in both time complexity and memory cost. The DA-MTFS algorithm consists of three steps at each iteration: calculate the subgradient of the loss function on the weights, calculate the average of the subgradients, and update the weights. The weights are updated sequentially by deriving closed-form solutions based on the average of previous gradients. The time complexity and memory cost at each iteration is a maximum of O ( d x Q ), where d is the number of feature dimensions and Q is the number of tasks. The authors prove the convergence rate of their algorithm via theoretical analysis and then apply it to real-world data about student ratings of personal computers [1]. Experimental data shows that the DA-MTFS algorithm has a performance close to the performance of the corresponding batch-trained algorithms, but at a lower time and memory cost. For those interested in online web-based learning algorithms, this paper is worth reading. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 7, Issue 2
July 2013
107 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/2499907
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 August 2013
- Accepted: 1 November 2012
- Revised: 1 July 2012
- Received: 1 January 2011
Published in tkdd Volume 7, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Supervised learning
dual averaging method
feature selection
multitask learning
online learning
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 46
  Total Citations
  View Citations
- 1,047
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient online learning for multitask feature selection

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Online learning for multi-task feature selection

Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning

Multitask Learning

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient online learning for multitask feature selection

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Online learning for multi-task feature selection

Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning

Multitask Learning

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media