Skip to main content
Log in

Nonlinear distance function learning using neural network: an iterative framework

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we extend several existing methods that apply distance function learning to regression problems. We discover that these methods may be viewed as approximating a matrix consisting of desired distances among all training samples. Based on this understanding, we propose an iterative framework where outlier samples are corrected by their neighbors via asymptotically increasing the correlation coefficients between the desired distances and the distances of sample labels. Moreover, using this framework, we find that most existing methods iterate only once. As another extension, we adopt a nonlinear distance function and approximate it with neural network. For a fair comparison, we conduct an experiment on age estimation from face images as a regression problem, and the results are comparable to the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Manifold learning assumes that data are homogenously sampled [3], as another words, the data lie on or close to a low-dimensional manifold embedded in the ambient space. For most applications, the data is generated by continuously varying a set of parameters.

  2. However, non-trivial extensions are possible, e.g. Taylor et al. [32] extended the Neighborhood Component Analysis (NCA) to the regression setting.

  3. Strictly speaking, the distance functions proposed in [3, 17] can not be referred to as a metric, because they do not satisfy the triangle inequality as one of the metric axioms. Instead, they should be called non-metric distance or semi-metrics, in conformance to most existing literature such as [31]. We will discuss more on this in Section 4.

  4. Other NN topologies with similar size only lead to a slight performance difference. Investigation of the optimal topology is a pure machine learning problem, which is out of the scope of this work. Here we only present a good network configuration, but its optimality is not guaranteed.

  5. It is referred to as Mean Square Error (MSE) in the context of NN.

  6. Also the training labels are integers owing to the limitation of dataset collection, but any intermediate label values and the final predicted labels are real numbers

  7. Here, the dimensionality of a regressor refers to the Vapnik–Chervonenkis dimension based complexity, see [4] for details.

  8. \( \widehat{d}\left(i,j\right)={\left(\frac{\left|L\left(i,j\right)\right|}{C-\left|L\left(i,j\right)\right|}\right)}^p\times d\left(i,j\right) \), where L(i,j) is the label difference between two data. C is a constant greater than any label value in the train set which ensures the denominator to be greater than zero. p is selected to be 2 to make data easier to discriminate. d(i, j) is the Euclidean distance between two samples X i and X j .

  9. \( \widehat{d}\left(i,j\right)={\left(\frac{\left|L\left(i,j\right)\right|+\upgamma}{C-\left|L\left(i,j\right)\right|}\right)}^p\times d\left(i,j\right) \), where L(i,j) is the absolute label difference between two data. γ refers to the labeling noise, more specifically, a human face image labeled as 7 years actually ranges within 7–8 years, so the labeling noise is 1 year in this case. C = max L(i,j) + ε, ε > 0 ensuring the denominator not to be zero. p = 2 is selected to make data easier to discriminate. The meaning of d(i, j) is the same as in Eq.(27) in [17] (see the previous footnote).

  10. \( \widehat{d}\left(i,j\right)=\left\{\begin{array}{c}\hfill \frac{\upalpha \left(L\left(i,j\right)\right)}{C-L\left(i,j\right)}\times d\left(i,j\right)L\left(i,j\right)\ne 0\hfill \\ {}\hfill 0L\left(i,j\right)=0\hfill \end{array}\right. \), where the function α(∙) is directly proportional to the label distance (in this case, the pose distance). The meanings of L(i,j), d(i, j) and C are the same as in Eq.(2) in [3] (see the previous footnote).

  11. An obvious counterexample is to combine two three-point metrics both with d(a,b) = 1, d(b,c) = 1, d(a,c) = 2.

  12. It is scaled so that the mean of such Euclidean distance equals to the mean of id ij . Note that, distance itself is first order derivative and we do not need to scale according to its variance.

  13. Particularly, δ(dd,ad) = δ(dd,ad′) implies that, the NN is not updated and ad = ad′. Denote dd* as the desired distance in the next iteration, then dd* = dd, the iterative algorithm has converged to ad′ already.

References

  1. Balasubramanian VN, Ye J, Panchanathan S (2007) Biased manifold embedding: A framework for person-independent head pose estimation, Proc. CVPR, pp.1–7

  2. Bar-Hillel AD (2007) Weinshall, Learning distance function by coding similarity, Proc. ICML, pp.65–72

  3. Castillo E, Berdinas BG, Romero OF, Betanzos AA (2006) A very fast learning method for neural networks based on sensitivity analysis. J Mach Learn Res 7:1159–1182

    MATH  MathSciNet  Google Scholar 

  4. Cherkassky V, Shao X, Mulier FM, Vapnik VN (1999) Model complexity control for regression using VC generalization bounds. IEEE Trans Neural Netw 10(5):1075–1089

    Article  Google Scholar 

  5. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification, Proc. CVPR, pp.539–546

  6. Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans on PAMI 23(6):681–685

    Article  Google Scholar 

  7. Davis JV, Kulis B, Jain P, Sra V, Dhillon IS (2007) Information-theoretic metric learning, Proc. ICML, pp.209–216

  8. Fan N (2011) Learning nonlinear distance functions using neural network for regression with application to robust human age estimation, Proc. ICCV, pp.249–254

  9. FG-NET Aging Database, http://www.fgnet.rsunit.com

  10. Geng X, Miles KS, Zhou ZZ (2008) Facial age estimation by nonlinear aging pattern subspace, Proc. ACM Multimedia, pp.721–724

  11. Geng X, Zhou ZH, Miles KS (2007) Automatic age estimation based on facial aging patterns. IEEE Trans PAMI 29(12)):2234–2240

    Article  Google Scholar 

  12. Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2005) Neighbourhood components analysis, Proc. NIPS, pp.513–520

  13. Guo GD, Mu G, Fu Y, Dyer C, Huang TS (2009) A study on automatic age estimation using a large database, Proc. ICCV, pp.1–8

  14. Guo GD, Mu G, Fu Y, Huang (2009) Human age estimation using bio-inspired features, Proc. CVPR, pp.1–8

  15. He X, Ma WY, Zhang HJ (2004) Learning an image manifold for retrieval, Proc. ACM Multimedia, pp.17–23

  16. Huang YZ, Long YJ (2008) Demosaicking recognition with applications in digital photo authentication based on a quadratic pixel correlation model, Proceedings of CVPR, pp.1–8

  17. Jin C, Long YJ (2010) On label information incorporated metric learning for regressions. Int J Comput Intell Appl 9(4):339–351

    Article  MATH  Google Scholar 

  18. Lanitis A, Draganova C, Christodoulou C (2004) Comparing different classifiers for automatic age estimation. IEEE Trans SMC-B 34(1):621–628

    Google Scholar 

  19. Long YJ, Huang YZ (2006) Image based source camera identification using demosaicking, Proceedings of the 8th International conference on Workshop Multimedia Signal Processing, pp. 419–424

  20. Macskassy SA, Hirsh H, Banerjee A, Dayanik AA (2003) Converting numerical classification into text classification. Artif Intell 143(1):51–77

    Article  MATH  MathSciNet  Google Scholar 

  21. McCullagh P (1980) Regression models for ordinal data. J R Stat Soc Ser B 42(2):109–142

    MATH  MathSciNet  Google Scholar 

  22. Min R, van der Maaten LJP, Yuan Z, Bonner A, Zhang Z (2010) Deep supervised t-distributed embedding, Proc. ICML, pp.791–798

  23. Moller AF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6(4):525–533

    Article  Google Scholar 

  24. Pan (2010) Human age estimation by metric learning for regression problems. Proc. EMM CVPR, pp. 455–465

  25. Ramanathan N, Chellappa R, Biswas S (2009) Age progression in human faces: a survey. J Vis Lang Comput 20:131–144

    Article  Google Scholar 

  26. Salakhutdinov R, Hinton G (2007) Learning a nonlinear embedding by preserving class neighbourhood structure, Proc. AI and Statistics, pp. 412–419

  27. Shalev-Shwartz S, Singer Y, Ng AY (2004) Online and batch learning of pseudo-metrics, Proc. ICML, pp.743–750

  28. Shental N, Hertz T, Weinshall D, Pavel M (2002) Adjustment learning and relevant component analysis, Proc. ECCV, pp.776–792

  29. Smith L (2002) A tutorial on Principal Components Analysis http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

  30. Stanley KO (2007) Compositional pattern producing networks: a novel abstraction of development. Genet Program Evolvable Mach 8(2):131–162

    Article  Google Scholar 

  31. Tan X, Chen S, Li J, Zhou Z (2006) Learning non-metric partial similarity based on maximal margin criterion, Proc. CVPR, pp.138–145

  32. Taylor G, Fergus R, Williams G, Spiro I, Bregler C (2010) Pose-sensitive embedding by nonlinear NCA regression. Proc, NIPS

    Google Scholar 

  33. Weinberger K, Blitzer J, Saul L (2006) Distance metric learning for large margin nearest neighbor classification, Proc. NIPS, pp.1475–1482

  34. Xing E, Ng A, Jordan MI, Russell S (2002) Distance metric learning with application to clustering with side-information, Proc. NIPS, pp.505–512

  35. Yan S, Wang H, Huang TS, Tang X (2007) Ranking with uncertain labels, Proc. ICME, pp.96–99

  36. Yan S, Wang H, Tang X, Huang T (2007) Learning auto-structured regressor from uncertain nonnegative labels. Proc. ICCV, pp.1–8

  37. Yan S, Zhou X, M. Liu, M. H. Johnson, T. Huang (2008) Regression from patch-kernel, Proc. CVPR, pp.1–8

  38. Yang L, Jin R (2006) Distance metric learning: a comprehensive survey, Technical report, Michigan State University. http://www.cs.cmu.edu/~liuy/frame_survey_v2.pdf

  39. Yeung DY, Chang H (2007) A kernel approach for semi-supervised metric learning. IEEE Trans on Neural Net 18(1):141–149

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junying Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Zeng, H. & Fan, N. Nonlinear distance function learning using neural network: an iterative framework. Multimed Tools Appl 74, 671–688 (2015). https://doi.org/10.1007/s11042-014-1944-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-1944-z

Keywords

Navigation