research-article

Grafting-light: fast, incremental feature selection and structure learning of Markov random fields

Authors:
Jun Zhu

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Ni Lao

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Eric P. Xing

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2010Pages 303–312https://doi.org/10.1145/1835804.1835845

Published:25 July 2010Publication History

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 303–312

ABSTRACT

Feature selection is an important task in order to achieve better generalizability in high dimensional learning, and structure learning of Markov random fields (MRFs) can automatically discover the inherent structures underlying complex data. Both problems can be cast as solving an l₁-norm regularized parameter estimation problem. The existing Grafting method can avoid doing inference on dense graphs in structure learning by incrementally selecting new features. However, Grafting performs a greedy step to optimize over free parameters once new features are included. This greedy strategy results in low efficiency when parameter learning is itself non-trivial, such as in MRFs, in which parameter learning depends on an expensive subroutine to calculate gradients. The complexity of calculating gradients in MRFs is typically exponential to the size of maximal cliques.

In this paper, we present a fast algorithm called Grafting-Light to solve the l₁-norm regularized maximum likelihood estimation of MRFs for efficient feature selection and structure learning. Grafting-Light iteratively performs one-step of orthant-wise gradient descent over free parameters and selects new features. This lazy strategy is guaranteed to converge to the global optimum and can effectively select significant features. On both synthetic and real data sets, we show that Grafting-Light is much more efficient than Grafting for both feature selection and structure learning, and performs comparably with the optimal batch method that directly optimizes over all the features for feature selection but is much more efficient and accurate for structure learning of MRFs.

Supplemental Material

kdd2010_zhu_glfi_01.mov

mov

118.2 MB

Download

References

G. Andrew and J.-F. Gao. Scalable training of l₁-regularized log-linear models. In ICML, 2007. Google ScholarDigital Library
O. Banerjee, L. Ghaoui, and A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. JMLR, (9):485--516, 2008. Google ScholarDigital Library
J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 0(0):1--10, 2007.Google Scholar
N. Friedman. Learning bayesian networks in the presence of missing values and hidden variables. In ML, 1997. Google ScholarDigital Library
N. Friedman. The bayesian structural em algorithm. In UAI, 1998. Google ScholarDigital Library
A. Globerson, T. Koo, X. Carreras, and M. Collins. Exponentiated gradient algorithms for log-linear structured prediction. In ICML, 2007. Google ScholarDigital Library
I. Guyon and A. Elisseeff. An introduction to variable and feature selection. JMLR, (3):1157--1182, 2003. Google ScholarDigital Library
M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. An introduction to variational methods for graphical models. M. I. Jordan (Ed.), Learning in Graphical Models, MIT Press, Cambridge, MA, 1999. Google ScholarDigital Library
K. Kira and L. Rendell. A practical approach to feature selection. In ICML, 1992. Google ScholarDigital Library
R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97:273--324, 1997. Google ScholarDigital Library
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001. Google ScholarDigital Library
S.-I. Lee, V. Ganapathi, and D. Koller. Efficient structure learning of Markov networks using $\ell_1$-regularization. In NIPS, 2006.Google Scholar
D.-C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, (45):503--528, 1989. Google ScholarDigital Library
A. McCallum. Efficient inducing features of conditional random fields. In UAI, 2003. Google ScholarDigital Library
S. Parise and M. Welling. Structure learning in markov random fields. In NIPS, 2006.Google Scholar
S. Perkins, K. Lacker, and J. Theiler. Grafting: Fast, incremental feature selection by gradient descent in function spaces. JMLR, (3):1333--1356, 2003. Google ScholarDigital Library
S. Perkins and J. Theiler. Online feature selection using grafting. In ICML, 2003.Google Scholar
S. D. Pietra, V. D. Pietra, and J. Lafferty. Inducing features of random fields. IEEE Trans. on PAMI, 19(4):380--393, 1997. Google ScholarDigital Library
T. Sang and S. Buchholz. Introduction to the conll-2000 shared task: Chunking. In CoNLL, 2000. Google ScholarDigital Library
M. Schmidt, G. Fung, and R. Rosales. Fast optimization methods for l₁ regularization: A comparative study and two new approaches. In ECML, 2007. Google ScholarDigital Library
F. Sha and F. Pereira. Shallow parsing with conditional random fields. In HLT/NAACL, 2003. Google ScholarDigital Library
S. Shevade and S. Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19(17):2246--2253, 2003.Google ScholarCross Ref
B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In NIPS, 2003.Google ScholarDigital Library
R. Tibshirani. Regression shrinkage and selection via the Lasso. J. Royal. Statist. Soc., B(58):267--288, 1996.Google Scholar
Y. Tsuruoka, J. Tsujii, and S. Ananiadou. Stochastic gradient descent training for l1-regularized log-linear models with cumulative penalty. In ACL, 2009. Google ScholarDigital Library
S. Vishvanathan, N. N. Shraudolph, M. W. Schmidt, and K. P. Murphy. Accelerated training of conditional random fields with stochastic gradient methods. In ICML, 2006. Google ScholarDigital Library
M. Wainwright, P. Ravikumar, and J. Lafferty. High-dimensional graphical model selection using l₁-regularized logistic regression. In NIPS, 2006.Google Scholar
J. Weston, A. Elisseeff, B. Scholkopf, and M. Tipping. Use of the zero norm with linear models and kernel methods. JMLR, (3):1439--1461, 2003. Google ScholarDigital Library
A. Willsky. Multiresolution Markov models for signal and image processing. In Proc. of the IEEE, 2002.Google Scholar
E. P. Xing, M. I. Jordan, and R. M. Karp. Feature selection for high-dimensional genomic microarray data. In ICML, 2001. Google ScholarDigital Library
J. Yedidia, W. Freeman, and Y. Weiss. Generalized belief propagation. In NIPS, 2000.Google ScholarDigital Library
M. Yuan and Y. Lin. Model selection and estimation in gaussian graphical model. Biometrika, 1(94):19--35, 2007.Google ScholarCross Ref
J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 1-norm support vector machines. In NIPS, 2003.Google Scholar
J. Zhu, E. Xing, and B. Zhang. Primal sparse maximum margin Markov networks. In SIGKDD, 2009. Google ScholarDigital Library
H. Zou and T. Hastie. Regularization and variable selection via the elastic net. J. Royal. Statist. Soc., B(67):301--320, 2005.Google Scholar

Recommendations

Unsupervised feature selection via low-rank approximation and structure learning

Feature selection is an important research topic in machine learning and computer vision in that it can reduce the dimensionality of input data and improve the performance of learning algorithms. Low-rank approximation techniques can well exploit the ...
Read More
Structure learning with consensus label information for multi-view unsupervised feature selection
Abstract
Structure learning based feature selection has attracted increasing attention for selecting these features which can preserve the learned structures. However, existing methods fail to effectively explore the heterogeneous and homogeneous ...
Highlights
- The feature selector simultaneously considers the graph heterogeneity and indicator consistency.
- A graph for each view is guided by the clustering task, which can preserve the partial clustering structure.
- The structure learning, ...
Read More
Learning Flexible Features for Conditional Random Fields

Extending traditional models for discriminative labeling of structured data to include higher-order structure in the labels results in an undesirable exponential increase in model complexity. In this paper, we present a model that is capable of learning ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
July 2010
1240 pages
ISBN:9781450300551
DOI:10.1145/1835804
General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Markov random fields
feature selection
structure learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 544
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Grafting-light: fast, incremental feature selection and structure learning of Markov random fields

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Unsupervised feature selection via low-rank approximation and structure learning

Structure learning with consensus label information for multi-view unsupervised feature selection

Learning Flexible Features for Conditional Random Fields