Abstract
The purpose of the article is to develop a new metric learning algorithm for combination of continuous and nominal data. We start with Euclidean metric for continuous and Hamming metric for nominal part of data. The impact of specific feature is modeled with corresponding weight in the metric definition. A new algorithm for automatic weights detection is proposed. The weighted metric is then used in the standard knn classification algorithm. Series of numerical experiments show that the algorithm can successfully classify raw, non-normalized data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html
Bellet, A., Habrard, A., Sebban, M.: Metric learning. Springer Cham (2015). https://doi.org/10.1007/978-3-031-01572-4
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 209–216. Association for Computing Machinery, New York (2007). https://doi.org/10.1145/1273496.1273523
Denisiuk, A., Grabowski, M.: Embedding of the hamming space into a sphere with weighted quadrance metric and c-means clustering of nominal-continuous data. Intell. Data Anal. 22(6), 1297001314 (2018). https://doi.org/10.3233/IDA-173645
Fawcett, T.: An introduction to roc analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006). rOC Analysis in Pattern Recognition
Goldberger, J., Hinton, G.E., Roweis, S., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Saul, L., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 513–520. MIT Press (2004)
Karayiannis, N.B., Randolph-Gips, M.M.: Non-euclidean c-means clustering algorithms. Intell. Data Anal. 7(5), 405–425 (2003)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, Wiley Series in Probability and Statistics, vol. 344. John Wiley (2008). https://doi.org/10.1002/9780470316801
Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2013). https://doi.org/10.1561/2200000019
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien (2022)
Norouzi, M., Fleet, D.J., Salakhutdinov, R.R.: Hamming distance metric learning. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1061–1069. Curran Associates, Inc. (2012)
Robin, X., et al.: proc: an open-source package for r and s+ to analyze and compare roc curves. BMC Bioinform. 12, 77 (2011)
Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16, pp. 41–48. MIT Press (2003)
Shi, Y., Bellet, A., Sha, F.: Sparse compositional metric learning. In: Proceedings of the AAAI Conference on Artificial Intelligence 28(1) (June 2014). https://doi.org/10.1609/aaai.v28i1.8968
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10(9), 207–244 (2009)
Xing, E., Jordan, M., Russell, S.J., Ng, A.: Distance metric learning with application to clustering with side-information. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 521–528. MIT Press (2002)
Zhai, D., et al.: Parametric local multiview hamming distance metric learning. Pattern Recogn. 75, 250–262 (2018). https://doi.org/10.1016/j.patcog.2017.06.018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Denisiuk, A. (2023). Weighted Hamming Metric and KNN Classification of Nominal-Continuous Data. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14074. Springer, Cham. https://doi.org/10.1007/978-3-031-36021-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-36021-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36020-6
Online ISBN: 978-3-031-36021-3
eBook Packages: Computer ScienceComputer Science (R0)