Skip to main content
Log in

Learning weighted distance metric from group level information and its parallel implementation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The performance of many machine learning algorithms heavily relies on the distance metrics. Usually a distance metric is learned from a training set, while other valuable information, such as group structure, is not. Samples within a short distance form a group, which may contain several classes; each sample may have partial memberships to multiple groups. The group structure exists in both training and test sets. Additionally, outliers have negative effects on a distance metric. Increasing the number of noisy samples during the learning phase may increase the negative effects of outliers. Use of weights is one way to alleviate this problem when more similar samples are given more weight. This paper introduces a learning technique for weighted-distance metric. This semi-supervised method learns labeled information from training set and identifies groups among the samples from test set to form a metric space. In the experiments, the nearest neighbors algorithm is used as a classifier. The proposed weighted-distance metric improves the classification accuracy by more than 10 %. Furthermore, parallel computing with optimized CPU and GPU code is developed to speed up the computing time. Two parallel implementations with Matlab and CUDA are compared in this research. Parallel code that uses both CPU and the GPU achieves more than 3.7 times speedup compared to the traditional CPU code in the experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Blake C, Merz CJ (1998) {UCI} repository of machine learning databases

  2. Cai D, He X, Han J (2007) Semi-supervised discriminant analysis. In: Computer vision, 2007. ICCV 2007. IEEE 11th international conference on, pp. 1–7. IEEE

  3. Chapelle O, Schölkopf B, Zien A et al. (2006) Semi-supervised learning

  4. Coates A, Huval B, Wang T, Wu D, Catanzaro B, Andrew N (2013) Deep learning with cots hpc systems. In: Proceedings of the 30th international conference on machine learning, pp 1337–1345

  5. cuBLAS (2015) cuBLAS the nvidia cuda basic linear algebra subroutines (cublas) library @ONLINE. https://developer.nvidia.com/cuBLAS

  6. Gou J, Du L, Zhang Y, Xiong T (2012) A new distance-weighted k-nearest neighbor classifier. J Inf Comput Sci 9:1429–1436

    Google Scholar 

  7. Guyon I, Gunn S, Ben-Hur A, Dror G (2004) Result analysis of the nips 2003 feature selection challenge. In: Advances in neural information processing systems, pp 545–552

  8. Higuera C, Gardiner KJ, Cios KJ (2015) Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome. PloS one 10(6):e0129,126

    Article  Google Scholar 

  9. Hoshida Y, Brunet JP, Tamayo P, Golub TR, Mesirov JP (2007) Subclass mapping: identifying common subtypes in independent disease data sets

  10. Mathworks (2015) Matlab @ONLINE. http://www.mathworks.com/products/matlab/

  11. Mathworks (2015) Matlab parallel toolbox @ONLINE. https://www.mathworks.com/parallel-computing

  12. Mu Y, Ding W, Tao D (2013) Local discriminative distance metrics ensemble learning. Pattern Recogn 46(8):2337–2349

    Article  MATH  Google Scholar 

  13. Mu Y, Lo H, Ding W, Tao D (2014) Face recognition from multiple images per subject. In: Proceedings of the ACM international conference on multimedia, pp. 889–892. ACM

  14. Mu Y, Lo H Z, Ding W, Amaral K, Crouter SE (2014) Bipart: Learning block structure for activity detection. IEEE Trans Knowl Data Eng 26(10):2397–2409

    Article  Google Scholar 

  15. NVIDIA (2015) CUDA cuda instructions @ONLINE. https://developer.nvidia.com/cuda-zone

  16. NVIDIA (2015) cuDNN nvidia cudnn - gpu accelerated deep learning @ONLINE. https://developer.nvidia.com/cuDNN

  17. Reese J, Zaranek S (2012) Gpu programming in matlab. Mathworks News&Notes Natick, MA: The MathWorks Inc pp. 22–5

  18. Schölkopf B, Smola AJ (2002) Learning with kernels: Support vector machines, regularization, optimization, and beyond MIT press

  19. Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8:1027–1061

    MATH  Google Scholar 

  20. Tishby N, Pereira FC, Bialek W (2000) The information bottleneck method arXiv preprint physics/0004057

  21. Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    MATH  Google Scholar 

  22. Whitehead N, Fit-Florea A (2011) Precision & performance: Floating point and ieee 754 compliance for nvidia gpus. rn (A+ B) 21:1–1874919,424

    Google Scholar 

  23. Wilkinson JH, Wilkinson JH, Wilkinson JH (1965) The algebraic eigenvalue problem, vol 87. Clarendon Press, Oxford

    MATH  Google Scholar 

  24. Xing EP, Jordan MI, Russell S, Ng AY (2002) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, pp 505–512

  25. Zavrel J (1997) An empirical re-examination of weighted voting for k-nn. In: Proceedings of the 7th belgian-dutch conference on machine learning, pp. 139–148. Citeseer

Download references

Acknowledgments

This study could not completed with the effort and co-operation of Professor Ming Ouyang from Computer Science Department of the University of Massachusetts Boston. His comments greatly improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamidreza Mohebbi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohebbi, H., Mu, Y. & Ding, W. Learning weighted distance metric from group level information and its parallel implementation. Appl Intell 46, 180–196 (2017). https://doi.org/10.1007/s10489-016-0826-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0826-7

Keywords

Navigation