Calibrating Distance Metrics Under Uncertainty

Li, Wenye; Yu, Fangchen

doi:10.1007/978-3-031-26409-2_14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13715))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

698 Accesses
1 Citations

Abstract

Estimating distance metrics for given data samples is essential in machine learning algorithms with various applications. Accurately determining the metric becomes impossible if there are observation noises or missing values. In this work, we proposed an approach to calibrating distance metrics. Compared with standard practices that primarily reside on data imputation, our proposal makes fewer assumptions about the data. It provides a solid theoretical guarantee in improving the quality of the estimate. We developed a simple, efficient, yet effective computing procedure that scales up to realize the calibration process. The experimental results from a series of empirical evaluations justified the benefits of the proposed approach and demonstrated its high potential in practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We set \(\mu = \max \left\{ d^{0}_{ij}\right\} \) and \(\epsilon =0.02\) in the study.
2.
Implementation downloaded from http://optml.mit.edu/software.html.
3.
Implementation downloaded from https://candes.su.domains/software/.

References

Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)
Book MATH Google Scholar
Brickell, J., Dhillon, I., Sra, S., Tropp, J.: The metric nearness problem. SIAM J. Matrix Anal. Appl. 30(1), 375–396 (2008)
Article MathSciNet MATH Google Scholar
Cai, J.F., Candès, E., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Article MathSciNet MATH Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
Article Google Scholar
Cline, A., Dhillon, I.: Computation of the singular value decomposition. In: Handbook of Linear Algebra, pp. 45–1. Chapman and Hall/CRC (2006)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Deutsch, F.: Best Approximation in Inner Product Spaces. Springer, New York (2001)
Book MATH Google Scholar
Duarte, M., Hu, Y.: Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64(7), 826–838 (2004)
Article Google Scholar
Duda, R., Hart, P.: Pattern Classification. Wiley, Hoboken (2000)
MATH Google Scholar
Dykstra, R.: An algorithm for restricted least squares regression. J. Am. Stat. Assoc. 78(384), 837–842 (1983)
Article MathSciNet MATH Google Scholar
Enders, C.: Applied Missing Data Analysis. Guilford Press (2010)
Google Scholar
Escalante, R., Raydan, M.: Alternating Projection Methods. SIAM, Philadelphia (2011)
Book MATH Google Scholar
Ghahramani, Z., Jordan, M.: Supervised learning from incomplete data via an EM approach. Adv. Neural. Inf. Process. Syst. 6, 120–127 (1994)
Google Scholar
Gilbert, G.: Positive definite matrices and Sylvester’s criterion. Am. Math. Mon. 98(1), 44–46 (1991)
Article MathSciNet MATH Google Scholar
Golub, G., Van Loan, C.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Higham, N.: Computing the nearest correlation matrix - a problem from finance. IMA J. Numer. Anal. 22, 329–343 (2002)
Article MathSciNet MATH Google Scholar
Horn, R., Johnson, C.: Matrix Analysis. Cambridge University Press, Cambridge (2012)
Book Google Scholar
Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Jannach, D., Resnick, P., Tuzhilin, A., Zanker, M.: Recommender systems—beyond matrix completion. Commun. ACM 59(11), 94–102 (2016)
Article Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lewis, D., Yang, Y., Rose, T., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr), 361–397 (2004)
Google Scholar
Li, W.: Estimating Jaccard index with missing observations: a matrix calibration approach. Adv. Neural. Inf. Process. Syst. 28, 2620–2628 (2015)
Google Scholar
Li, W.: Scalable calibration of affinity matrices from incomplete observations. In: Asian Conference on Machine Learning, pp. 753–768 (2020)
Google Scholar
Little, R., Rubin, D.: Statistical Analysis with Missing Data, vol. 793. Wiley, Hoboken (2019)
MATH Google Scholar
Murphy, K.: Machine Learning: a Probabilistic Perspective. MIT Press, Cambridge (2012)
MATH Google Scholar
Muzellec, B., Josse, J., Boyer, C., Cuturi, M.: Missing data imputation using optimal transport. In: International Conference on Machine Learning, pp. 7130–7140. PMLR (2020)
Google Scholar
Qi, H., Sun, D.: An augmented Lagrangian dual approach for the H-weighted nearest correlation matrix problem. IMA J. Numer. Anal. 31(2), 491–511 (2011)
Article MathSciNet MATH Google Scholar
Schoenberg, I.: Metric spaces and positive definite functions. Trans. Am. Math. Soc. 44(3), 522–536 (1938)
Article MathSciNet MATH Google Scholar
Schölkopf, B., Smola, A., Bach, F., et al.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Google Scholar
Sonthalia, R., Gilbert, A.C.: Project and forget: solving large-scale metric constrained problems. arXiv preprint arXiv:2005.03853 (2020)
Stockham, C., Wang, L.S., Warnow, T.: Statistically based postprocessing of phylogenetic analysis by clustering. Bioinformatics 18(suppl_1), S285–S293 (2002)
Google Scholar
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, R., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Article Google Scholar
Wells, J., Williams, L.: Embeddings and Extensions in Analysis, vol. 84. Springer, Heidelberg (1975). https://doi.org/10.1007/978-3-642-66037-5
Book MATH Google Scholar
Xing, E., Jordan, M., Russell, S., Ng, A.: Distance metric learning with application to clustering with side-information. Adv. Neural. Inf. Process. Syst. 15, 521–528 (2002)
Google Scholar

Download references

Acknowledgments

We thank the reviewers for the helpful comments. The work is supported by Guangdong Basic and Applied Basic Research Foundation (2021A1515011825), Shenzhen Science and Technology Program (CUHKSZWDZC0004), and Shenzhen Research Institute of Big Data.

Author information

Authors and Affiliations

The Chinese University of Hong Kong, Shenzhen, China
Wenye Li & Fangchen Yu
Shenzhen Research Institute of Big Data, Shenzhen, China
Wenye Li

Authors

Wenye Li
View author publications
You can also search for this author in PubMed Google Scholar
Fangchen Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenye Li .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d'Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, W., Yu, F. (2023). Calibrating Distance Metrics Under Uncertainty. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13715. Springer, Cham. https://doi.org/10.1007/978-3-031-26409-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-26409-2_14
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26408-5
Online ISBN: 978-3-031-26409-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)