Abstract
Learning explanatory features across multiple related tasks, or MultiTask Feature Selection (MTFS), is an important problem in the applications of data mining, machine learning, and bioinformatics. Previous MTFS methods fulfill this task by batch-mode training. This makes them inefficient when data come sequentially or when the number of training data is so large that they cannot be loaded into the memory simultaneously. In order to tackle these problems, we propose a novel online learning framework to solve the MTFS problem. A main advantage of the online algorithm is its efficiency in both time complexity and memory cost. The weights of the MTFS models at each iteration can be updated by closed-form solutions based on the average of previous subgradients. This yields the worst-case bounds of the time complexity and memory cost at each iteration, both in the order of O(d × Q), where d is the number of feature dimensions and Q is the number of tasks. Moreover, we provide theoretical analysis for the average regret of the online learning algorithms, which also guarantees the convergence rate of the algorithms. Finally, we conduct detailed experiments to show the characteristics and merits of the online learning algorithms in solving several MTFS problems.
- Aaker, D. A., Kumar, V., and Day, G. S. 2006. Marketing Research 9th Ed. Wiley.Google Scholar
- Ando, R. K. and Zhang, T. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817--1853. Google ScholarDigital Library
- Argyriou, A., Evgeniou, T., and Pontil, M. 2006. Multi-task feature learning. Adv. Neural Inf. Process. Syst. 19, 41--48.Google Scholar
- Argyriou, A., Evgeniou, T., and Pontil, M. 2008. Convex multi-task feature learning. Mach. Learn. 73, 3, 243--272. Google ScholarDigital Library
- Bai, J., Zhou, K., Xue, G.-R., Zha, H., Sun, G., Tseng, B. L., Zheng, Z., and Chang, Y. 2009. Multi-task learning for learning to rank in web search. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09). 1549--1552. Google ScholarDigital Library
- Bakker, B. and Heskes, T. 2003. Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4, 83--99. Google ScholarDigital Library
- Balakrishnan, S. and Madigan, D. 2008. Algorithms for sparse linear classifiers in the massive data setting. J. Mach. Learn. Res. 9, 313--337. Google ScholarDigital Library
- Ben-David, S. and Borbely, R. S. 2008. A notion of task relatedness yielding provable multiple-task learning guarantees. Mach. Learn. 73, 3, 273--287. Google ScholarDigital Library
- Ben-David, S. and Schuller, R. 2003. Exploiting task relatedness for mulitple task learning. In Proceedings of the 16th Annual Conference on Learning Theory (COLT'03). 567--580.Google Scholar
- Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press. Google ScholarDigital Library
- Caruana, R. 1997. Multitask learning. Mach. Learn. 28, 1, 41--75. Google ScholarDigital Library
- Chen, J., Liu, J., and Ye, J. 2010. Learning incoherent sparse and low-rank patterns from multiple tasks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'10). 1179--1188. Google ScholarDigital Library
- Chen, J., Liu, J., and Ye, J. 2012. Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data 5, 4. Google ScholarDigital Library
- Chen, J., Tang, L., Liu, J., and Ye, J. 2009a. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML'09). 18. Google ScholarDigital Library
- Chen, X., Pan, W., Kwok, J. T., and Carbonell, J. G. 2009b. Accelerated gradient method for multi-task sparse learning problem. In Proceedings of the 9th IEEE International Conference on Data Mining (ICDM'09). 746--751. Google ScholarDigital Library
- Dekel, O., Long, P. M., and Singer, Y. 2006. Online multitask learning. In Proceedings of the 19th Annual Conference on Learning Theory (COLT'06). 453--467. Google ScholarDigital Library
- Dhillon, P. S., Foster, D. P., and Ungar, L. H. 2011. Minimum description length penalization for group and multi-task sparse learning. J. Mach. Learn. Res. 12, 525--564. Google ScholarDigital Library
- Dhillon, P. S., Tomasik, B., Foster, D. P., and Ungar, L. H. 2009. Multi-task feature selection using the multiple inclusion criterion (mic). In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD'09). 276--289. Google ScholarDigital Library
- Duchi, J. and Singer, Y. 2009. Efficient learning using forward-backward splitting. J. Mach. Learn. Res. 10, 2873--2898. Google ScholarDigital Library
- Evgeniou, T., Micchelli, C. A., and Pontil, M. 2005. Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615--637. Google ScholarDigital Library
- Evgeniou, T. and Pontil, M. 2004. Regularized multi--task learning. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). 109--117. Google ScholarDigital Library
- Friedman, J., Hastie, T., and Tibshirani, R. 2010. A note on the group lasso and a sparse group lasso. http://arxiv.org/pdf/1001.0736.pdf.Google Scholar
- Han, J. and Kamber, M. 2006. Data Mining: Concepts and Techniques 2nd Ed. Morgan Kaufmann, San Fransisco. Google ScholarDigital Library
- Han, Y., Wu, F., Jia, J., Zhuang, Y., and Yu, B. 2010. Multi-task sparse discriminant analysis (mtsda) with overlapping categories. In Proceedings of the 24th AAAI Conference on Artificial Intelligence.Google Scholar
- Hazan, E., Agarwal, A., and Kale, S. 2007. Logarithmic regret algorithms for online convex optimization. Mach. Learn. 69, 2--3, 169--192. Google ScholarDigital Library
- Hu, C., Kwok, J., and Pan, W. 2009. Accelerated gradient methods for stochastic optimization and online learning. Adv. Neural Inf. Process. Syst. 22, 781--789.Google Scholar
- Jebara, T. 2004. Multi-task feature and kernel selection for svms. In Proceedings of the 21st International Conference on Machine Learning (ICML'04). Google ScholarDigital Library
- Jebara, T. 2011. Multitask sparsity via maximum entropy discrimination. J. Mach. Learn. Res. 12, 75--110. Google ScholarDigital Library
- Langford, J., Li, L., and Zhang, T. 2009. Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 777--801. Google ScholarDigital Library
- Lenk, P. J., Desarbo, W. S., Green, P. E., and Young, M. R. 1996. Hierarchical bayes conjoint analysis: Recovery of partworth heterogeneity from reduced experimental designs. Market. Sci. 15, 2, 173--191.Google ScholarDigital Library
- Ling, G., Yang, H., King, I., and Lyu, M. R. 2012. Online learning for collaborative filtering. In Proceedings of the IEEE World Congress on Computational Intelligence (WCCI'12). 1--8.Google Scholar
- Liu, H., Palatucci, M., and Zhang, J. 2009a. Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML'09). 649--656. Google ScholarDigital Library
- Liu, J., Chen, J., and Ye, J. 2009b. Large-scale sparse logistic regression. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 547--556. Google ScholarDigital Library
- Liu, J., Ji, S., and Ye, J. 2009c. Multi-task feature learning via efficient l2; 1 norm minimization. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI'09). Google ScholarDigital Library
- Liu, J., Ji, S., and Ye., J. 2009d. Slep: Sparse learning with efficient projections. http://www.public.asu.edu/ye02/Software/SLEP.Google Scholar
- Nesterov, Y. 2009. Primal-dual subgradient methods for convex problems. Math. Program. 120, 1, 221--259. Google ScholarDigital Library
- Obozinski, G., Taskar, B., and Jordan, M. I. 2009. Joint covariate selection and joint subspace selection for multiple classification problems. Statist. Comput. 20, 2, 231--252. Google ScholarDigital Library
- Pong, T. K., Tseng, P., Ji, S., and Ye, J. 2010. Trace norm regularization: Reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20, 6, 3465--3489. Google ScholarDigital Library
- Quattoni, A., Carreras, X., Collins, M., and Darrell, T. 2009. An efficient projection for l1, ∞ regularization. In Proceedings of the 26th International Conference on Machine Learning (ICML'09), 857--864. Google ScholarDigital Library
- Shalev-Shwartz, S. and Singer, Y. 2007. A primal-dual perspective of online learning algorithms. Mach. Learn. 69, 2--3, 115--142. Google ScholarDigital Library
- Shalev-Shwartz, S., Singer, Y., and Srebro, N. 2007. Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the 24th International Conference on Machine learning (ICML'07). 807--814. Google ScholarDigital Library
- Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B58, 1, 267--288.Google Scholar
- Vapnik, V. 1999. The Nature of Statistical Learning Theory 2nd Ed. Springer, New York. Google ScholarDigital Library
- Xiao, L. 2010. Dual averaging method for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543--2596. Google ScholarDigital Library
- Xu, Z., Jin, R., Yang, H., King, I., and Lyu, M. R. 2010. Simple and efficient multiple kernel learning by group lasso. In Proceedings of the 27th International Conference on Machine Learning (ICML'10). 1175--1182.Google Scholar
- Yang, H., King, I., and Lyu, M. R. 2010a. Multi-task learning for one-class classification. In Proceedings of the International Joint Conference on Neural Networks (IJCNN'10). 1--8.Google Scholar
- Yang, H., King, I., and Lyu, M. R. 2010b. Online learning for multi-task feature selection. In Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM'10). 1693--1696. Google ScholarDigital Library
- Yang, H., King, I., and Lyu, M. R. 2011a. Sparse Learning under Regularization Framework 1st Ed. Lambert Academic Publishing.Google Scholar
- Yang, H., Xu, Z., King, I., and Lyu, M. R. 2010c. Online learning for group lasso. In Proceedings of the 27th International Conference on Machine Learning (ICML'10). 1191--1198.Google Scholar
- Yang, H., Xu, Z., Ye, J., King, I., and Lyu, M. R. 2011b. Efficient sparse generalized multiple kernel learning. IEEE Trans. Neural Netw. 22, 3, 433--446. Google ScholarDigital Library
- Zhang, Y. 2010. Multi-task active learning with output constraints. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI'10).Google Scholar
- Zhang, Y., Yeung, D.-Y., and Xu, Q. 2010. Probabilistic multi-task feature selection. Adv. Neural Inf. Process. Syst. 23, 2559--2567.Google Scholar
- Zhao, P., Hoi, S. C. H., and Jin, R. 2011a. Double updating online learning. J. Mach. Learn. Res. 12, 1587--1615. Google ScholarDigital Library
- Zhao, P., Hoi, S. C. H., Jin, R., and Yang, T. 2011b. Online auc maximization. In Proceedings of the 28th International Conference on Machine Learning (ICML'11). 233--240.Google ScholarDigital Library
- Zhou, Y., Jin, R., and Hoi, S. C. 2010. Exclusive lasso for multi-task feature selection. In Proceeding of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS'10).Google Scholar
- Zou, H. and Hastie, T. 2005. Regularization and variable selection via the elastic net. J. Royal Statist. Soc. B67, 301--320.Google ScholarCross Ref
Index Terms
- Efficient online learning for multitask feature selection
Recommendations
Online learning for multi-task feature selection
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementMulti-task feature selection (MTFS) is an important tool to learn the explanatory features across multiple related tasks. Previous MTFS methods fulfill this task in batch-mode training. This makes them inefficient when data come in sequence or when the ...
Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementLifelong multitask learning is a multitask learning framework in which a learning agent faces the tasks that need to be learnt in an online manner. Lifelong multitask learning framework may be applied to a variety of applications such as image ...
Multitask Learning
Special issue on inductive transferMultitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared ...
Comments