Abstract
This paper introduces a new framework for classifying probability density functions. The proposed method fits in the class of constrained Gaussian processes indexed by distribution functions. Firstly, instead of classifying observations directly, we consider their isometric transformations which enables us to satisfy both positiveness and unit integral hard constraints. Secondly, we introduce the theoretical proprieties and give numerical details of how to decompose each transformed observation in an appropriate orthonormal basis. As a result, we show that the coefficients are belonging to the unit sphere when equipped with the standard Euclidean metric as a natural metric. Lastly, the proposed methods are illustrated and successfully evaluated in different configurations and with various dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alexander, A., Lee, J., Lazar, M., Field, A.: Diffusion tensor imaging of the brain. Neurother. J. Am. Soc. Exp. NeuroTher 4, 316–29 (2007)
Alpaydin, E.: Introduction to Machine Learning, 2nd edn. MIT Press, Cambridge, MA (2010)
Amari, Si.: Differential geometry of statistical inference. In: Prokhorov, J.V., Itô, K. (eds.) Probability Theory and Mathematical Statistics. Lecture Notes in Mathematics. vol 1021. Springer, Berlin, Heidelberg (1983). https://doi.org/10.1007/BFb0072900
Bachoc, F., Gamboa, F., Loubes, J.M., Venet, N.: A gaussian process regression model for distribution inputs. IEEE Trans. Inf. Theor. 64, 6620–6637 (2018)
Botev, Z.I., Grotowski, J.F., Kroese, D.P.: Kernel density estimation via diffusion. Ann. Stat. 38, 2916–2957 (2010)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont, CA (1984)
Cardot, H., Ferraty, F., Sarda, P.: Classification of functional data: a segmentation approach. Comput. Stat. Data Anal. 44, 315–337 (2003)
Cencov, N.N.: Evaluation of an unknown distribution density from observations. Doklady 3, 1559–1562 (1962)
Chen, L., Li, J.: Fraud detection for credit cards using random forest. J. Financ. Data Science 1, 83–94 (2018)
Djolonga, J., Krause, A., Cevher, V.: High-dimensional Gaussian process bandits, pp. 1025–1033. NIPS2013, Curran Associates Inc., Red Hook, NY, USA (2013)
Fradi, A., Feunteun, Y., Samir, C., Baklouti, M., Bachoc, F., Loubes, J.M.: Bayesian regression and classification using gaussian process priors indexed by probability density functions. Inf. Sci. 548, 56–68 (2021)
Holbrook, A., Lan, S., Streets, J., Shahbaba, B.: Nonparametric fisher geometry with application to density estimation. In: Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 101–110. Proceedings of Machine Learning Research (2020)
Hosmer, D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, 3rd edn. Wiley, Hoboken, NJ (2013)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. STS, vol. 103. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7
Julian, P.R., Murphy, A.H.: Probability and statistics in meteorology: a review of some recent developments. Bull. Am. Meteorol. Soc. 53, 957–965 (1972)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 3rd edn. Pearson Education, Harlow, England (2020)
Kanagawa, M., Hennig, P., Sejdinovic, D., Sriperumbudur, B.K.: Gaussian processes and kernel methods: A review on connections and equivalences (2018)
Kim, H.J., et al.: Multivariate general linear models (MGLM) on Riemannian manifolds with applications to statistical analysis of diffusion weighted images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2705–2712. IEEE Computer Society, Los Alamitos, CA, USA (2014)
Kneip, A., Ramsay, J.O.: Combining registration and fitting for functional models. J. Am. Stat. Assoc. 103, 1155–1165 (2008)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Lagani, V., Fotiadis, D.I., Likas, A.: Functional data analysis via neural networks: an application to speaker identification. Expert Syst. Appl. 39, 9188–9194 (2012)
Mallasto, A., Feragen, A.: Wrapped Gaussian process regression on Riemannian manifolds. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5580–5588. IEEE Computer Society, Los Alamitos, CA, USA (2018)
Oliva, J.B., Neiswanger, W., Póczos, B., Schneider, J.G., Xing, E.P.: Fast distribution to real regression. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, pp. 706–714 (2014)
Patrangenaru, V., Ellingson, L.: Nonparametric Statistics on Manifolds and their Applications to Object Data Analysis, 1st edn. Chapman & Hall/CRC Monographs on Statistics & Applied Probability, CRC Press Inc, USA (2015)
Póczos, B., Singh, A., Rinaldo, A., Wasserman, L.: Distribution-free distribution regression. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pp. 507–515. Proceedings of Machine Learning Research, Scottsdale, Arizona, USA (2013)
Lopez de Prado, M.: Advances in financial machine learning. Wiley, Hoboken, New Jersey (2018)
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, New York (2005). https://doi.org/10.1007/b98888
Rasmussen, C.E., Williams, C.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge, London (2006)
Samir, C., Loubes, J.-M., Yao, A.-F., Bachoc, F.: Learning a gaussian process model on the Riemannian manifold of non-decreasing distribution functions. In: Nayak, A.C., Sharma, A. (eds.) PRICAI 2019. LNCS (LNAI), vol. 11671, pp. 107–120. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29911-8_9
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Adaptive computation and machine learning (2002)
Srivastava, A., Klassen, E., Joshi, S.H., Jermyn, I.H.: Shape analysis of Elastic curves in Euclidean spaces. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1415–1428 (2011)
Terenin, A.: Gaussian processes and statistical decision-making in non-Euclidean spaces. arXiv (2022). https://arxiv.org/abs/2202.10613
Yao, F., Müller, H.G., ling Wang, J.: Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc. 100, 577–590 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fradi, A., Samir, C. (2023). A New Framework for Classifying Probability Density Functions. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14169. Springer, Cham. https://doi.org/10.1007/978-3-031-43412-9_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-43412-9_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43411-2
Online ISBN: 978-3-031-43412-9
eBook Packages: Computer ScienceComputer Science (R0)