Levelwise Data Disambiguation by Cautious Superset Classification

Rodemann, Julian; Kreiss, Dominik; Hüllermeier, Eyke; Augustin, Thomas

doi:10.1007/978-3-031-18843-5_18

Julian Rodemann¹⁰,
Dominik Kreiss¹⁰,
Eyke Hüllermeier¹¹ &
…
Thomas Augustin¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13562))

Included in the following conference series:

International Conference on Scalable Uncertainty Management

378 Accesses

Abstract

Drawing conclusions from set-valued data calls for a trade-off between caution and precision. In this paper, we propose a way to construct a hierarchical family of subsets within set-valued categorical observations. Each subset corresponds to a level of cautiousness, the smallest one as a singleton representing the most optimistic choice. To achieve this, we extend the framework of Optimistic Superset Learning (OSL), which disambiguates set-valued data by determining the singleton corresponding to the most predictive model. We utilize a variant of OSL for classification with 0/1 loss to find the instantiations whose corresponding empirical risks are below context-depending thresholds. Varying this threshold induces a hierarchy among those instantiations. In order to rule out ties corresponding to the same classification error, we utilize a hyperparameter of Support Vector Machines (SVM) that controls the model’s complexity. We twist the tuning of this hyperparameter to find instantiations whose optimal separations have the greatest generality. Finally, we apply our method on the prototypical example of yet undecided political voters as set-valued observations. To this end, we use both simulated data and pre-election polls by Civey including undecided voters for the 2021 German federal election.

We sincerely thank the polling institute Civey for providing the data as well as the anonymous reviewers for their valuable feedback and stimulating remarks. DK further thanks the LMU mentoring program for its support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that this formalization allows \(Y_i\) to also (partially) consist of singletons.
2.
Notably, \(q = |\mathcal {Y}| - k\), where k is the number of categories in \(\mathcal {Y}\) that are not present in the data.
3.
This subsetting of \(\textbf{Y}\) can be seen as a form of “data choice” similar to model choice.
4.
Criterion (1) aims at a unique minimum. In general, in the light of the next section, we understand \(arg\,min\) potentially in a set-valued manner, i.e. giving the set of all elements where the minimum is attained.
5.
The loss is called optimistic due to the minimum in (2): each prediction \(\hat{y}_i\) is assessed optimistically by assuming the most favorable ground-truth \(y \in Y_i\).
6.
Notably, some models can be more informative on certain aspects of the data generating process than others. For instance, naive Bayes classifiers model the joint distribution \(\mathbb {P}(x,y)\) as opposed to standard regression models that are typically concerned with the conditional distribution \(\mathbb {P}(y|x)\).
7.
Note that \(n \cdot \mathcal {R}_{emp}(\textbf{h}, \textbf{x}, \textbf{y}) \in \mathbb {N}\).
8.
However, in [9, sect. 3.1] the class of models, thus the model’s hyperparameters, is fixed.
9.
For multi-class classification (as in Sect. 5), hyperplanes from one-versus-all classifications are combined by a voting scheme and Platt scaling, for details see [11, pages 8–9]. When tuning with regard to C, one common C-value is used for all one-versus-all classifications.
10.
For kernelized versions of SVMs this hyperplane is generally only linear in the transformed feature space. However, we can still think of C as a proxy for the generality of optimal separation in that transformed space.
11.
We use Grid Search for solving this minimization problem. When evaluations are rather expensive, Bayesian Optimization, Simulated Annealing or Evolutionary Algorithms might be preferred. For an overview of these heuristic optimizers and their limitations, see [23, chapter 10].
12.
Any clustering algorithm can be used. In our applications in Sect. 5, we opt for k-means clustering as proposed by [15].
13.
The covariates appear to be generally of rather low predictive power : Training and generalization error, even exclusively for the decided, are high.

References

Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Article Google Scholar
Couso, I., Dubois, D.: Statistical reasoning with set-valued information: Ontic vs. epistemic views. Int. J. Approximate Reasoning 55, 1502–1518 (2014)
Google Scholar
Couso, I., Sánchez, L.: Machine learning models, epistemic set-valued data and generalized loss functions: an encompassing approach. Inf. Sci. 358, 129–150 (2016)
Article Google Scholar
Denœux, T.: Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans. Knowl. Data Eng. 25(1), 119–130 (2011)
Article Google Scholar
Faas, T., Klingelhöfer, T.: The more things change, the more they stay the same? The German federal election of 2017 and its consequences. West Eur. Polit. 42(4), 914–926 (2019)
Article Google Scholar
Gentile, C., Warmuth, M.: Linear hinge loss and average margin. In: Advances in Neural Information Processing Systems, vol. 11 (1998)
Google Scholar
Hüllermeier, E.: Learning from imprecise and fuzzy observations: data disambiguation through generalized loss minimization. Int. J. Approximate Reasoning 55, 1519–1534 (2014)
Article MathSciNet Google Scholar
Hüllermeier, E., Cheng, W.: Superset learning based on generalized loss minimization. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9285, pp. 260–275. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23525-7_16
Chapter Google Scholar
Hüllermeier, E., Destercke, S., Couso, I.: Learning from imprecise data: adjustments of optimistic and pessimistic variants. In: Ben Amor, N., Quost, B., Theobald, M. (eds.) SUM 2019. LNCS (LNAI), vol. 11940, pp. 266–279. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35514-2_20
Chapter Google Scholar
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab-an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004)
Article Google Scholar
Kreiss, D., Augustin, T.: Undecided voters as set-valued information – towards forecasts under epistemic imprecision. In: Davis, J., Tabia, K. (eds.) SUM 2020. LNCS (LNAI), vol. 12322, pp. 242–250. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58449-8_18
Chapter Google Scholar
Kreiss, D., Augustin, T.: Towards a paradigmatic shift in pre-election polling adequately including still undecided voters-some ideas based on set-valued data for the 2021 German federal election. arXiv preprint arXiv:2109.12069 (2021)
Kreiss, D., Nalenz, M., Augustin, T.: Undecided voters as set-valued information, machine learning approaches under complex uncertainty. In: ECML/PKDD 2020 Tutorial and Workshop on Uncertainty in Machine Learning (2020)
Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28, 129–137 (1982)
Article MathSciNet Google Scholar
Manski, C.: Partial Identification of Probability Distributions. Springer, Cham (2003)
MATH Google Scholar
Molchanov, I., Molinari, F.: Random Sets in Econometrics. Cambridge University Press, Cambridge (2018)
Book Google Scholar
Nguyen, N., Caruana, R.: Classification with partial labels. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–559 (2008)
Google Scholar
Oscarsson, H., Oskarson, M.: Sequential vote choice: applying a consideration set model of heterogeneous decision processes. Electoral Stud. 57, 275–283 (2019)
Google Scholar
Plass, J., Cattaneo, M., Augustin, T., Schollmeyer, G., Heumann, C.: Reliable inference in categorical regression analysis for non-randomly coarsened observations. Int. Stat. Rev. 87, 580–603 (2019)
Article MathSciNet Google Scholar
Plass, J., Fink, P., Schöning, N., Augustin, T.: Statistical modelling in surveys without neglecting the undecided. In: ISIPTA 15, pp. 257–266. SIPTA (2015)
Google Scholar
Ponomareva, M., Tamer, E.: Misspecification in moment inequality models: back to moment equalities? Econometrics J. 14, 186–203 (2011)
Article MathSciNet Google Scholar
Rodemann, J.: Robust generalizations of stochastic derivative-free optimization. Master’s thesis, LMU Munich (2021)
Google Scholar
Schollmeyer, G., Augustin, T.: Statistical modeling under partial identification: distinguishing three types of identification regions in regression analysis with interval data. Int. J. Approximate Reasoning 56, 224–248 (2015)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, LMU Munich, Munich, Germany
Julian Rodemann, Dominik Kreiss & Thomas Augustin
Department of Computer Science, LMU Munich, Munich, Germany
Eyke Hüllermeier

Authors

Julian Rodemann
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Kreiss
View author publications
You can also search for this author in PubMed Google Scholar
Eyke Hüllermeier
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Augustin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julian Rodemann .

Editor information

Editors and Affiliations

IRIT-Université Toulouse, Toulouse, France
Florence Dupin de Saint-Cyr
Université Paris-Dauphine - PSL, Paris, France
Meltem Öztürk-Escoffier
Imperial College London, London, UK
Nico Potyka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodemann, J., Kreiss, D., Hüllermeier, E., Augustin, T. (2022). Levelwise Data Disambiguation by Cautious Superset Classification. In: Dupin de Saint-Cyr, F., Öztürk-Escoffier, M., Potyka, N. (eds) Scalable Uncertainty Management. SUM 2022. Lecture Notes in Computer Science(), vol 13562. Springer, Cham. https://doi.org/10.1007/978-3-031-18843-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-18843-5_18
Published: 10 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18842-8
Online ISBN: 978-3-031-18843-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Levelwise Data Disambiguation by Cautious Superset Classification