Conditional ordinal random fields for structured ordinal-valued label prediction

Kim, Minyoung

doi:10.1007/s10618-013-0305-2

Conditional ordinal random fields for structured ordinal-valued label prediction

Published: 12 March 2013

Volume 28, pages 378–401, (2014)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Minyoung Kim¹

544 Accesses
2 Citations
Explore all metrics

Abstract

Predicting labels of structured data such as sequences or images is a very important problem in statistical machine learning and data mining. The conditional random field (CRF) is perhaps one of the most successful approaches for structured label prediction via conditional probabilistic modeling. In such models, it is traditionally assumed that each label is a random variable from a nominal category set (e.g., class categories) where all categories are symmetric and unrelated from one another. In this paper we consider a different situation of ordinal-valued labels where each label category bears a particular meaning of preference or order. This setup fits many interesting problems/datasets for which one is interested in predicting labels that represent certain degrees of intensity or relevance. We propose a fairly intuitive and principled CRF-like model that can effectively deal with the ordinal-scale labels within an underlying correlation structure. Unlike standard log-linear CRFs, learning the proposed model incurs non-convex optimization. However, the new model can be learned accurately using efficient gradient search. We demonstrate the improved prediction performance achieved by the proposed model on several intriguing sequence/image label prediction tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

Learning from positive and unlabeled data: a survey

Article 02 April 2020

A survey on missing data in machine learning

Article Open access 27 October 2021

Notes

It is an extension of our earlier work (conference paper) published in Kim and Pavlovic (2010). Compared to the previous work which is limited only to sequence data focusing on the task of facial emotion intensity prediction for video sequences, we provide extension to lattice-structured image data, along with detailed exposition and more evaluations.
It is mainly due to the density integrability issues.
We use the notation $\mathbf{x}$ interchangeably for both a structured observation $\mathbf{x}=\{\mathbf{x}_r\}$ and a vector, which is clearly distinguished by context.
This can be seen as a general form of the popular one-vs-all or one-vs-one treatment for the multi-class problem.
For simplicity, we often drop the dependency on ${\varvec{\theta }}$ in notations.
A clique of a graph is a maximal subset of nodes that are fully connected.
The potential function is a product between the model parameters and the feature vectors, which expresses the goodness of the state/label configuration with respect to the current model. In particular, the node potentials measure this for individual sites (nodes) while the edge potentials aim to capture the relation between adjoining sites (e.g., smoothness in label variation).
It is also possible to use $\exp (\delta _k)$ in place of $\delta _k^2$, which can be beneficial for avoiding additional modality to the objective function.
We also tested the static approach (Chu and Ghahramani 2005), the Gaussian process ordinal regressor (GPOR). However, its test performance on this dataset was far worse than that of the SVOR.
We performed the paired $t$ test for CRF and CORF. The $p$-value was 0.0020 for both 0/1 loss and absolute loss.
Facial emotion intensity prediction is particularly important for better understanding of facial emotions. A typical problem is the facial action unit (AU) analysis in computer vision and cognitive science where one aims to identify/recognize which actions of individual muscles or activations of groups of muscles cause a specific facial emotion. The intensity labeling by human experts is accurate but very costly, and hence, automatic emotion intensity prediction is highly advantageous.
Due to our undirected graphical model, one needs to include edge potentials for both directions, i.e., for $e=(r,s)$, one for $r \rightarrow s$ and the other for $s \rightarrow r$.

References

Buffoni D, Calauzenes C, Gallinari P, Usunier N (2011) Learning scoring functions with order-preserving losses and standardized supervision. In: Getoor L, Scheffer T (eds) Proceedings of the 28th international conference on machine learning (ICML-11), ICML ’11, ACM, New York, pp 825–832
Chu W, Ghahramani Z (2005) Gaussian processes for ordinal regression. J Mach Learn Res 6:1019–1041
MATH MathSciNet Google Scholar
Chu W, Keerthi SS (2005) New approaches to support vector ordinal regression. In: De Raedt L, Wrobel S (eds) Proceedings of the 22nd international machine learning conference, ACM Press, New York
Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
Google Scholar
Gunawardana A, Mahajan M, Acero A, Platt JC (2005) Hidden conditional random fields for phone classification. In: International conference on speech communication and technology, Lisbon, pp 1117–1120
He X, Zemel RS, Perpin a’n MAC (2004) Multiscale conditional random fields for image labeling. In: IEEE conference on computer vision and pattern recognition, pp 695–702
Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. In: Smola AJ, Bartlett PL (eds) Advances in large margin classifiers. MIT Press, Cambridge
Hu Y, Li M, Yu N (2008) Multiple-instance ranking: learning to rank images for image retrieval. In: Computer vision and pattern recognition, Anchorage, USA
Ionescu C, Bo L, Sminchisescu C (2009) Structural SVM for visual localization and continuous state estimation. In: International conference on computer vision, pp 1157–1164
Jing Y, Baluja S (2008) Pagerank for product image search. In: Proceeding of the 17th international conference on World Wide Web, Beijing, China
Jordan MI (2004) Graphical models. Stat Sci 19:140–155
Article MATH Google Scholar
Kim M, Pavlovic V (2009) Discriminative learning for dynamic state prediction. IEEE Trans Pattern Anal Mach Intell 31(10):1847–1861
Article Google Scholar
Kim M, Pavlovic V (2010) Structured output ordinal regression for dynamic facial emotion intensity prediction. In: Daniilidis K, Maragos P, Paragios N (eds) European conference on computer vision, Crete, Greece, pp 649–662
Kumar S, Hebert M (2006) Discriminative random fields. Int J Comput Vis 68:179–201
Article Google Scholar
Lafferty J (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Kaufmann M (ed) International conference on machine learning, Williamstown, pp 282–289
Lien J, Kanade T, Cohn J, Li C (2000) Detection, tracking, and classification of action units in facial expression. J Robot Auton Syst 31(3):131–146
Google Scholar
Liu Y, Liu Y, Chan KCC (2011) Ordinal regression via manifold learning. In: Twenty-Fifth AAAI conference on, artificial intelligence, pp 398–403
Locarnini RA, Mishonov AV, Antonov JI, Boyer TP, Garcia HE (2006) World ocean atlas 2005. In: Levitus S (ed) NOAA Atlas NESDIS. US Government Printing Office, Washington, DC, pp 61–64
Google Scholar
Mao Y, Lebanon G (2009) Generalized isotonic conditional random fields. Mach Learn 77(2–3):225–248
Article Google Scholar
Pavlovic V, Rehg JM, Maccormick J (2000) Learning switching linear models of human motion. In: Advances in neural information processing systems, pp 981–987
Qin T, Liu TY, Zhang XD, Wang DS, Li H (2008) Global ranking using continuous conditional random fields. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems. Morgan Kaufmann, San Francisco
Shan C, Gong S, WMcOwan P (2005) Conditional mutual information based boosting for facial expression recognition. In: British machine vision conference
Shashua A, Levin A (2003) Ranking with large margin principle: two approaches. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems
Tian TP, Li R, Sclaroff S (2005) Articulated pose estimation in a learned smooth space of feasible solutions. In: IEEE workshop in computer vision and pattern recognition
Tian Y (2004) Evaluation of face resolution for expression analysis. In: IEEE computer vision and pattern recognition workshop on face processing in video
Viola P, Jones M (2001) Robust real-time object detection. Int J Comput Vis 57(2):137–154
Article Google Scholar
Vishwanathan S, Schraudolph N, Schmidt M, Murphy K (2006) Accelerated training of conditional random fields with stochastic meta-descent. In: Cohen W, Moore A (eds) Proceedings of the 23nd international machine learning conference, Omni Press, Edinburgh
Wang S, Quattoni A, Morency LP, Demirdjian D, Darrell T (2006) Hidden conditional random fields for gesture recognition. In: Computer vision and pattern recognition
Weiss Y (2001) Comparing the mean field method and belief propagation for approximate inference in MRFs. In: Saad D, Opper M (eds) Advanced mean field methods. MIT Press, Cambridge
Google Scholar
Weston J, Wang C, Weiss R, Berenzweig A (2012) Latent collaborative retrieval. In: Langford J, Pineau J (eds) Proceedings of the 29th international conference on machine learning (ICML-12), Omnipress, ICML ’12, pp 9–16
Yang P, Liu Q, Metaxas DN (2009) Rankboost with l1 regularization for facial expression recognition and intensity estimation. In: International conference on computer vision, pp 1018–1025
Yedidia J, Freeman W, Weiss Y (2003) Understanding belief propagation and its generalizations. In: Exploring artificial intelligence in the new millennium, chap 8. Science and Technology Books, Cambridge, pp 239–269

Download references

Author information

Authors and Affiliations

Department of Electronics and IT Media Engineering, Seoul National University of Science & Technology, Seoul, South Korea
Minyoung Kim

Authors

Minyoung Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minyoung Kim.

Additional information

Communicated by Johannes Fürnkranz.

Appendix

The gradients $\frac{\partial z_k(r,c)}{\partial \mu }$, for $k=0,1$ and $j=1,\dots ,R-2$, in (16) are summarized as follows:

$$\begin{aligned} \frac{\partial z_k(r,c)}{\partial \mathbf{a}}&= -\frac{1}{\sigma _0^2} {\varvec{\phi }}(\mathbf{x}_r), \end{aligned}$$

(20)

$$\begin{aligned} \frac{\partial z_k(r,c)}{\partial \sigma _0}&= -\frac{2\big (b_{c-k} - \mathbf{a}^{\top }{\varvec{\phi }}(\mathbf{x}_r)\big )}{\sigma _0^3}, \end{aligned}$$

(21)

$$\begin{aligned} \frac{\partial z_0(r,c)}{\partial b_1}&= \left\{ \begin{array}{ll} 0 \quad \quad \mathrm{if} \quad c=R \\ \frac{1}{\sigma _0^2} \quad \mathrm{otherwise} \end{array} \right. , \end{aligned}$$

(22)

$$\begin{aligned} \frac{\partial z_1(r,c)}{\partial b_1}&= \left\{ \begin{array}{ll} 0 \quad \quad \mathrm{if} \quad c=1 \\ \frac{1}{\sigma _0^2} \quad \mathrm{otherwise} \end{array} \right. , \end{aligned}$$

(23)

$$\begin{aligned} \frac{\partial z_0(r,c)}{\partial \delta _j}&= \left\{ \begin{array}{ll} 0 \quad \quad \mathrm{if} \quad c \in \{1,\dots ,j,R\} \\ \frac{2\delta _j}{\sigma _0^2} \quad \mathrm{otherwise} \end{array} \right. , \end{aligned}$$

(24)

$$\begin{aligned} \frac{\partial z_1(r,c)}{\partial \delta _j}&= \left\{ \begin{array}{ll} 0 \quad \quad \mathrm{if} \quad c \in \{1,\dots ,j+1\} \\ \frac{2\delta _j}{\sigma _0^2} \quad \mathrm{otherwise} \end{array} \right. . \end{aligned}$$

(25)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, M. Conditional ordinal random fields for structured ordinal-valued label prediction. Data Min Knowl Disc 28, 378–401 (2014). https://doi.org/10.1007/s10618-013-0305-2

Download citation

Received: 30 January 2012
Accepted: 26 February 2013
Published: 12 March 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10618-013-0305-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conditional ordinal random fields for structured ordinal-valued label prediction

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Learning from positive and unlabeled data: a survey

A survey on missing data in machine learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Conditional ordinal random fields for structured ordinal-valued label prediction

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Learning from positive and unlabeled data: a survey

A survey on missing data in machine learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation