Abstract
The performance of many machine learning algorithms heavily relies on the distance metrics. Usually a distance metric is learned from a training set, while other valuable information, such as group structure, is not. Samples within a short distance form a group, which may contain several classes; each sample may have partial memberships to multiple groups. The group structure exists in both training and test sets. Additionally, outliers have negative effects on a distance metric. Increasing the number of noisy samples during the learning phase may increase the negative effects of outliers. Use of weights is one way to alleviate this problem when more similar samples are given more weight. This paper introduces a learning technique for weighted-distance metric. This semi-supervised method learns labeled information from training set and identifies groups among the samples from test set to form a metric space. In the experiments, the nearest neighbors algorithm is used as a classifier. The proposed weighted-distance metric improves the classification accuracy by more than 10 %. Furthermore, parallel computing with optimized CPU and GPU code is developed to speed up the computing time. Two parallel implementations with Matlab and CUDA are compared in this research. Parallel code that uses both CPU and the GPU achieves more than 3.7 times speedup compared to the traditional CPU code in the experiments.
Similar content being viewed by others
References
Blake C, Merz CJ (1998) {UCI} repository of machine learning databases
Cai D, He X, Han J (2007) Semi-supervised discriminant analysis. In: Computer vision, 2007. ICCV 2007. IEEE 11th international conference on, pp. 1–7. IEEE
Chapelle O, Schölkopf B, Zien A et al. (2006) Semi-supervised learning
Coates A, Huval B, Wang T, Wu D, Catanzaro B, Andrew N (2013) Deep learning with cots hpc systems. In: Proceedings of the 30th international conference on machine learning, pp 1337–1345
cuBLAS (2015) cuBLAS the nvidia cuda basic linear algebra subroutines (cublas) library @ONLINE. https://developer.nvidia.com/cuBLAS
Gou J, Du L, Zhang Y, Xiong T (2012) A new distance-weighted k-nearest neighbor classifier. J Inf Comput Sci 9:1429–1436
Guyon I, Gunn S, Ben-Hur A, Dror G (2004) Result analysis of the nips 2003 feature selection challenge. In: Advances in neural information processing systems, pp 545–552
Higuera C, Gardiner KJ, Cios KJ (2015) Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome. PloS one 10(6):e0129,126
Hoshida Y, Brunet JP, Tamayo P, Golub TR, Mesirov JP (2007) Subclass mapping: identifying common subtypes in independent disease data sets
Mathworks (2015) Matlab @ONLINE. http://www.mathworks.com/products/matlab/
Mathworks (2015) Matlab parallel toolbox @ONLINE. https://www.mathworks.com/parallel-computing
Mu Y, Ding W, Tao D (2013) Local discriminative distance metrics ensemble learning. Pattern Recogn 46(8):2337–2349
Mu Y, Lo H, Ding W, Tao D (2014) Face recognition from multiple images per subject. In: Proceedings of the ACM international conference on multimedia, pp. 889–892. ACM
Mu Y, Lo H Z, Ding W, Amaral K, Crouter SE (2014) Bipart: Learning block structure for activity detection. IEEE Trans Knowl Data Eng 26(10):2397–2409
NVIDIA (2015) CUDA cuda instructions @ONLINE. https://developer.nvidia.com/cuda-zone
NVIDIA (2015) cuDNN nvidia cudnn - gpu accelerated deep learning @ONLINE. https://developer.nvidia.com/cuDNN
Reese J, Zaranek S (2012) Gpu programming in matlab. Mathworks News&Notes Natick, MA: The MathWorks Inc pp. 22–5
Schölkopf B, Smola AJ (2002) Learning with kernels: Support vector machines, regularization, optimization, and beyond MIT press
Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8:1027–1061
Tishby N, Pereira FC, Bialek W (2000) The information bottleneck method arXiv preprint physics/0004057
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
Whitehead N, Fit-Florea A (2011) Precision & performance: Floating point and ieee 754 compliance for nvidia gpus. rn (A+ B) 21:1–1874919,424
Wilkinson JH, Wilkinson JH, Wilkinson JH (1965) The algebraic eigenvalue problem, vol 87. Clarendon Press, Oxford
Xing EP, Jordan MI, Russell S, Ng AY (2002) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, pp 505–512
Zavrel J (1997) An empirical re-examination of weighted voting for k-nn. In: Proceedings of the 7th belgian-dutch conference on machine learning, pp. 139–148. Citeseer
Acknowledgments
This study could not completed with the effort and co-operation of Professor Ming Ouyang from Computer Science Department of the University of Massachusetts Boston. His comments greatly improved the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mohebbi, H., Mu, Y. & Ding, W. Learning weighted distance metric from group level information and its parallel implementation. Appl Intell 46, 180–196 (2017). https://doi.org/10.1007/s10489-016-0826-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0826-7