Abstract
Predicting labels of structured data such as sequences or images is a very important problem in statistical machine learning and data mining. The conditional random field (CRF) is perhaps one of the most successful approaches for structured label prediction via conditional probabilistic modeling. In such models, it is traditionally assumed that each label is a random variable from a nominal category set (e.g., class categories) where all categories are symmetric and unrelated from one another. In this paper we consider a different situation of ordinal-valued labels where each label category bears a particular meaning of preference or order. This setup fits many interesting problems/datasets for which one is interested in predicting labels that represent certain degrees of intensity or relevance. We propose a fairly intuitive and principled CRF-like model that can effectively deal with the ordinal-scale labels within an underlying correlation structure. Unlike standard log-linear CRFs, learning the proposed model incurs non-convex optimization. However, the new model can be learned accurately using efficient gradient search. We demonstrate the improved prediction performance achieved by the proposed model on several intriguing sequence/image label prediction tasks.
Similar content being viewed by others
Notes
It is an extension of our earlier work (conference paper) published in Kim and Pavlovic (2010). Compared to the previous work which is limited only to sequence data focusing on the task of facial emotion intensity prediction for video sequences, we provide extension to lattice-structured image data, along with detailed exposition and more evaluations.
It is mainly due to the density integrability issues.
We use the notation \(\mathbf{x}\) interchangeably for both a structured observation \(\mathbf{x}=\{\mathbf{x}_r\}\) and a vector, which is clearly distinguished by context.
This can be seen as a general form of the popular one-vs-all or one-vs-one treatment for the multi-class problem.
For simplicity, we often drop the dependency on \({\varvec{\theta }}\) in notations.
A clique of a graph is a maximal subset of nodes that are fully connected.
The potential function is a product between the model parameters and the feature vectors, which expresses the goodness of the state/label configuration with respect to the current model. In particular, the node potentials measure this for individual sites (nodes) while the edge potentials aim to capture the relation between adjoining sites (e.g., smoothness in label variation).
It is also possible to use \(\exp (\delta _k)\) in place of \(\delta _k^2\), which can be beneficial for avoiding additional modality to the objective function.
We also tested the static approach (Chu and Ghahramani 2005), the Gaussian process ordinal regressor (GPOR). However, its test performance on this dataset was far worse than that of the SVOR.
We performed the paired \(t\) test for CRF and CORF. The \(p\)-value was 0.0020 for both 0/1 loss and absolute loss.
Facial emotion intensity prediction is particularly important for better understanding of facial emotions. A typical problem is the facial action unit (AU) analysis in computer vision and cognitive science where one aims to identify/recognize which actions of individual muscles or activations of groups of muscles cause a specific facial emotion. The intensity labeling by human experts is accurate but very costly, and hence, automatic emotion intensity prediction is highly advantageous.
Due to our undirected graphical model, one needs to include edge potentials for both directions, i.e., for \(e=(r,s)\), one for \(r \rightarrow s\) and the other for \(s \rightarrow r\).
References
Buffoni D, Calauzenes C, Gallinari P, Usunier N (2011) Learning scoring functions with order-preserving losses and standardized supervision. In: Getoor L, Scheffer T (eds) Proceedings of the 28th international conference on machine learning (ICML-11), ICML ’11, ACM, New York, pp 825–832
Chu W, Ghahramani Z (2005) Gaussian processes for ordinal regression. J Mach Learn Res 6:1019–1041
Chu W, Keerthi SS (2005) New approaches to support vector ordinal regression. In: De Raedt L, Wrobel S (eds) Proceedings of the 22nd international machine learning conference, ACM Press, New York
Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
Gunawardana A, Mahajan M, Acero A, Platt JC (2005) Hidden conditional random fields for phone classification. In: International conference on speech communication and technology, Lisbon, pp 1117–1120
He X, Zemel RS, Perpin a’n MAC (2004) Multiscale conditional random fields for image labeling. In: IEEE conference on computer vision and pattern recognition, pp 695–702
Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. In: Smola AJ, Bartlett PL (eds) Advances in large margin classifiers. MIT Press, Cambridge
Hu Y, Li M, Yu N (2008) Multiple-instance ranking: learning to rank images for image retrieval. In: Computer vision and pattern recognition, Anchorage, USA
Ionescu C, Bo L, Sminchisescu C (2009) Structural SVM for visual localization and continuous state estimation. In: International conference on computer vision, pp 1157–1164
Jing Y, Baluja S (2008) Pagerank for product image search. In: Proceeding of the 17th international conference on World Wide Web, Beijing, China
Jordan MI (2004) Graphical models. Stat Sci 19:140–155
Kim M, Pavlovic V (2009) Discriminative learning for dynamic state prediction. IEEE Trans Pattern Anal Mach Intell 31(10):1847–1861
Kim M, Pavlovic V (2010) Structured output ordinal regression for dynamic facial emotion intensity prediction. In: Daniilidis K, Maragos P, Paragios N (eds) European conference on computer vision, Crete, Greece, pp 649–662
Kumar S, Hebert M (2006) Discriminative random fields. Int J Comput Vis 68:179–201
Lafferty J (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Kaufmann M (ed) International conference on machine learning, Williamstown, pp 282–289
Lien J, Kanade T, Cohn J, Li C (2000) Detection, tracking, and classification of action units in facial expression. J Robot Auton Syst 31(3):131–146
Liu Y, Liu Y, Chan KCC (2011) Ordinal regression via manifold learning. In: Twenty-Fifth AAAI conference on, artificial intelligence, pp 398–403
Locarnini RA, Mishonov AV, Antonov JI, Boyer TP, Garcia HE (2006) World ocean atlas 2005. In: Levitus S (ed) NOAA Atlas NESDIS. US Government Printing Office, Washington, DC, pp 61–64
Mao Y, Lebanon G (2009) Generalized isotonic conditional random fields. Mach Learn 77(2–3):225–248
Pavlovic V, Rehg JM, Maccormick J (2000) Learning switching linear models of human motion. In: Advances in neural information processing systems, pp 981–987
Qin T, Liu TY, Zhang XD, Wang DS, Li H (2008) Global ranking using continuous conditional random fields. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems. Morgan Kaufmann, San Francisco
Shan C, Gong S, WMcOwan P (2005) Conditional mutual information based boosting for facial expression recognition. In: British machine vision conference
Shashua A, Levin A (2003) Ranking with large margin principle: two approaches. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems
Tian TP, Li R, Sclaroff S (2005) Articulated pose estimation in a learned smooth space of feasible solutions. In: IEEE workshop in computer vision and pattern recognition
Tian Y (2004) Evaluation of face resolution for expression analysis. In: IEEE computer vision and pattern recognition workshop on face processing in video
Viola P, Jones M (2001) Robust real-time object detection. Int J Comput Vis 57(2):137–154
Vishwanathan S, Schraudolph N, Schmidt M, Murphy K (2006) Accelerated training of conditional random fields with stochastic meta-descent. In: Cohen W, Moore A (eds) Proceedings of the 23nd international machine learning conference, Omni Press, Edinburgh
Wang S, Quattoni A, Morency LP, Demirdjian D, Darrell T (2006) Hidden conditional random fields for gesture recognition. In: Computer vision and pattern recognition
Weiss Y (2001) Comparing the mean field method and belief propagation for approximate inference in MRFs. In: Saad D, Opper M (eds) Advanced mean field methods. MIT Press, Cambridge
Weston J, Wang C, Weiss R, Berenzweig A (2012) Latent collaborative retrieval. In: Langford J, Pineau J (eds) Proceedings of the 29th international conference on machine learning (ICML-12), Omnipress, ICML ’12, pp 9–16
Yang P, Liu Q, Metaxas DN (2009) Rankboost with l1 regularization for facial expression recognition and intensity estimation. In: International conference on computer vision, pp 1018–1025
Yedidia J, Freeman W, Weiss Y (2003) Understanding belief propagation and its generalizations. In: Exploring artificial intelligence in the new millennium, chap 8. Science and Technology Books, Cambridge, pp 239–269
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Johannes Fürnkranz.
Appendix
Appendix
The gradients \(\frac{\partial z_k(r,c)}{\partial \mu }\), for \(k=0,1\) and \(j=1,\dots ,R-2\), in (16) are summarized as follows:
Rights and permissions
About this article
Cite this article
Kim, M. Conditional ordinal random fields for structured ordinal-valued label prediction. Data Min Knowl Disc 28, 378–401 (2014). https://doi.org/10.1007/s10618-013-0305-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-013-0305-2