Subconcept perturbation-based classifier for within-class multimodal data

Cavalcanti, George D. C.; Soares, Rodolfo J. O.; Araújo, Edson L.

doi:10.1007/s00521-023-09144-1

Subconcept perturbation-based classifier for within-class multimodal data

Original Article
Published: 21 November 2023

Volume 36, pages 2479–2491, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

George D. C. Cavalcanti ORCID: orcid.org/0000-0001-7714-2283¹,
Rodolfo J. O. Soares¹ &
Edson L. Araújo^1,2

113 Accesses
Explore all metrics

Abstract

In classification, it is generally assumed that data from one class consist of one pure compact data cluster. However, in many cases, this cluster might consist of multiple subclusters, in other words, within-class multimodality. In such a scenario, it may be difficult or even impossible for a single classifier to find a suitable model using limited data. So, training a model using smaller chunks of data is an alternative that helps avoid complex models and reduces the task’s complexity. This paper proposes the subconcept Perturbation-based Classifier (sPerC) that finds the best clusters per class using cluster validation measures, and one meta-classifier is trained per subcluster. This way, each class is represented by a set of meta-classifiers instead of one classifier. Such a design diminishes the complexity of the task, and using a divide-to-conquer strategy favors the precision of each meta-classifier. Through a set of comprehensive experiments on 30 datasets, the sPerC results compared favorably to other classifiers in multi-class classification tasks, showing that creating specialized classifiers per class in different regions of the feature space can be advantageous.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 10

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Article 15 April 2024

A review of unsupervised feature selection methods

Article 29 January 2019

Availability of data and materials

All data supporting the findings of this study are available the Knowledge Extraction based on Evolutionary Learning (KEEL) [39] and UCI Machine Learning (UCI) [40] repositories. Table 1 shows the datasets.

Code Availability

Source code and supplementary data can be found in the GitHub repository: https://github.com/rjos/perturbation-classifiers.

References

Silva ER, Cavalcanti GDC, Ren TI (2016) Class-wise feature extraction technique for multimodal data. Neurocomputing 214:1001–1010
Article Google Scholar
Sugiyama M, Cohen WW, Moore AW (2006) Local fisher discriminant analysis for supervised dimensionality reduction. In: Cohen WW, Moore AW ((eds) ICML, ACM international conference proceeding series, vol 148, pp 905–912. http://dblp.uni-trier.de/db/journals/ijon/ijon214.html
Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8:1027–1061
Google Scholar
Wang T, Tian S, Huang H, Deng D (2009) Learning by local kernel polarization. Neurocomputing 72:3077–3084
Article Google Scholar
Sharma S, Somayaji A, Japkowicz N (2018) Learning over subconcepts: strategies for 1-class classification. Comput Intell 34:440–467
Article MathSciNet Google Scholar
Taheri M, Moslehi Z, Mirzaei A, Safayani M (2019) A self-adaptive local metric learning method for classification. Pattern Recognit. https://doi.org/10.1016/j.patcog.2019.106994
Article Google Scholar
Krawczyk B, Wozniak M, Cyganek B (2014) Clustering-based ensembles for one-class classification. Inf Sci 264:182–195
Article MathSciNet Google Scholar
Guo H, Zhou J, Wu CA (2018) Imbalanced learning based on data-partition and smote. Information 9:238
Article Google Scholar
Abdallah L, Badarna M, Khalifa W, Yousef M (2021) Multikoc: multi-one-class classifier based k-means clustering. Algorithms 14:134
Article MathSciNet Google Scholar
Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of internal clustering validation measures. In: Society IC (eds) Proceedings of the 2010 IEEE international conference on data mining, vol 10. ICDM, Washington, DC, USA, pp 911–916
Fragoso RC, Cavalcanti GD, Pinheiro RH, Oliveira LS (2021) Dynamic selection and combination of one-class classifiers for multi-class classification. Knowl Based Syst 228:107290
Article Google Scholar
Marcelino CG, Pedreira CE (2022) Feature space partition: a local-global approach for classification. Neural Comput Appl 34:21877–21890. https://doi.org/10.1007/s00521-022-07647-x
Article Google Scholar
Ezugwu AE et al (2021) Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature. Neural Comput Appl 33:6247–6306. https://doi.org/10.1007/s00521-020-05395-4
Article Google Scholar
Hassan BA, Rashid TA (2021) A multidisciplinary ensemble algorithm for clustering heterogeneous datasets. Neural Comput Appl 33:10987–11010. https://doi.org/10.1007/s00521-020-05649-1
Article Google Scholar
Zhang H, Li P, Meng F, Fan W, Xue Z (2023) Mapreduce-based distributed tensor clustering algorithm. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08415-1
Article Google Scholar
Mousavian Anaraki SA, Haeri A, Moslehi F (2022) Generating balanced and strong clusters based on balance-constrained clustering approach (strong balance-constrained clustering) for improving ensemble classifier performance. Neural Comput Appl 34:21139–21155. https://doi.org/10.1007/s00521-022-07595-6
Article Google Scholar
Karna A, Gibert K (2022) Automatic identification of the number of clusters in hierarchical clustering. Neural Comput Appl 34:119–134. https://doi.org/10.1007/s00521-021-05873-3
Article Google Scholar
Nidheesh N, Nazeer KAA, Ameer PM (2020) A hierarchical clustering algorithm based on silhouette index for cancer subtype discovery from genomic data. Neural Comput Appl 32:11459–11476. https://doi.org/10.1007/s00521-019-04636-5
Article Google Scholar
Araújo EL, Cavalcanti GDC, Ren TI (2020) Perturbation-based classifier. Soft Comput 24:16565–16576
Article Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, Hoboken
Google Scholar
Fukunaga K (1972) Introduction to statistical pattern recognition. Academic Press, New York
Google Scholar
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22:4–37. https://doi.org/10.1109/34.824819
Article Google Scholar
Ade MRR, Deshmukh PR (2013) Methods for incremental learning: a survey. Semantic Scholar, New York
Google Scholar
Kivinen J, Smola AJ, Williamson RC (2004) Online learning with kernels. IEEE Trans Signal Process 52:2165–2176
Article MathSciNet Google Scholar
Lutz A, Rodner E, Denzler J (2011) Efficient multi-class incremental learning using gaussian processes. In: Open German–Russian workshop on pattern recognition and image understanding, pp 182-185
Lütz A, Rodner E, Denzler J (2013) I want to know more–efficient multi-class incremental learning using gaussian processes. Pattern Recogn Image Anal 23:402–407. https://doi.org/10.1134/S1054661813030103
Article Google Scholar
Hämäläinen J, Jauhiainen S, Kärkkäinen T (2017) Comparison of internal clustering validation indices for prototype-based clustering. Algorithms 10:105
Article MathSciNet Google Scholar
Arbelaitz O, Gurrutxaga I, Muguerza J, Pèrez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46:243–256
Article Google Scholar
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Simul Comput 3:1–27
Article MathSciNet Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227
Article Google Scholar
Tibshirani R, Guenther W, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B 63(2):411–423
Article MathSciNet Google Scholar
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Ünlü R, Xanthopoulos P (2019) Estimating the number of clusters in a dataset via consensus clustering. Expert Syst Appl 125:33–39
Article Google Scholar
Rendon E, Abundez I, Arizmendi A, Quiroz E (2011) Internal versus external cluster validation indexes. Int J Comput Commun 5(1):27–34
Google Scholar
Weiss GM (2010) The impact of small disjuncts on classifier learning. In: Stahlbock R, Crone SF, Lessmann S (eds) Data mining annals of information systems, vol 8. Springer, Cham, pp 193–226
Google Scholar
Weiss GM, Prieditis A (1995) Learning with rare cases and small disjuncts. In: Prieditis A, Russell SJ (eds) ICML. Morgan Kaufmann, Burlington, pp 558–565
Google Scholar
He Z, Xu X (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24:1641–1650
Article Google Scholar
Valentini G (2005) An experimental bias-variance analysis of svm ensembles based on resampling techniques. IEEE Trans Syst Man Cybern Part B 35:1252–1271
Article Google Scholar
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple Valued Log. Soft Comput. 17:255–287
Google Scholar
Dua D, Graff C (2017) Uci machine learning repository. http://archive.ics.uci.edu/ml
Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn 30:1145–1159
Article Google Scholar
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Article Google Scholar
Cheng B, Titterington DM (1994) Neural networks: a review from a statistical perspective. Stat Sci 9:2–30
MathSciNet Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/a:1010933404324
Article Google Scholar
Schölkopf B, Smola AJ (2002) Learning with kernels:support vector machines, regularization, optimization, and beyond. Adaptive computation and machine learning. MIT Press, Cambridge
Google Scholar
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181
MathSciNet Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701. https://doi.org/10.1080/01621459.1937.10503522
Article Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1:80–83
Article Google Scholar
Holte RC, Acker L, Porter BW, Sridharan NS (1989) Concept learning and the problem of small disjuncts. In: Sridharan NS (ed) IJCAI. Morgan Kaufmann, Burlington, pp 813–818
Google Scholar
Weiss GM, Hirsh H, Kautz HA, Porter BW (2000) A quantitative study of small disjuncts. In: Kautz HA, Porter BW (eds) AAAI/IAAI. AAAI Press / The MIT Press, New York, pp 665–670
Google Scholar
Goder A, Filkov V (2008) Consensus clustering algorithms: comparison and refinement. In: Proceedings of the meeting on algorithm engineering & expermiments, Society for Industrial and Applied Mathematics, USA, pp 109-117

Download references

Acknowledgements

This research has been partially supported by the following Brazilian agencies: CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) and FACEPE (Fundação de Amparo à Ciência e Tecnologia de Pernambuco).

Author information

Authors and Affiliations

Centro de Informática (CIn), Universidade Federal de Pernambuco (UFPE), Av. Jornalista Anibal Fernandes s/n, Recife, Brazil
George D. C. Cavalcanti, Rodolfo J. O. Soares & Edson L. Araújo
Universidade Federal do Vale do São Francisco, Av. Antonio Carlos Magalhães, 510, Juazeiro, Brazil
Edson L. Araújo

Authors

George D. C. Cavalcanti
View author publications
You can also search for this author in PubMed Google Scholar
Rodolfo J. O. Soares
View author publications
You can also search for this author in PubMed Google Scholar
Edson L. Araújo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George D. C. Cavalcanti.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cavalcanti, G.D.C., Soares, R.J.O. & Araújo, E.L. Subconcept perturbation-based classifier for within-class multimodal data. Neural Comput & Applic 36, 2479–2491 (2024). https://doi.org/10.1007/s00521-023-09144-1

Download citation

Received: 10 July 2023
Accepted: 16 October 2023
Published: 21 November 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00521-023-09144-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Subconcept perturbation-based classifier for within-class multimodal data

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

A review of unsupervised feature selection methods

Availability of data and materials

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Subconcept perturbation-based classifier for within-class multimodal data

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

A review of unsupervised feature selection methods

Availability of data and materials

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation