Electrostatic field framework for supervised and semi-supervised learning from incomplete data

Budka, Marcin; Gabrys, Bogdan

doi:10.1007/s11047-010-9182-4

Electrostatic field framework for supervised and semi-supervised learning from incomplete data

Published: 17 February 2010

Volume 10, pages 921–945, (2011)
Cite this article

Natural Computing Aims and scope Submit manuscript

Marcin Budka¹ &
Bogdan Gabrys¹

236 Accesses
6 Citations
Explore all metrics

Abstract

In this paper a classification framework for incomplete data, based on electrostatic field model is proposed. An original approach to exploiting incomplete training data with missing features, involving extensive use of electrostatic charge analogy, has been used. The framework supports a hybrid supervised and unsupervised training scenario, enabling learning simultaneously from both labelled and unlabelled data using the same set of rules and adaptation mechanisms. Classification of incomplete patterns has been facilitated by introducing a local dimensionality reduction technique, which aims at exploiting all available information using the data ‘as is’, rather than trying to estimate the missing values. The performance of all proposed methods has been extensively tested in a wide range of missing data scenarios, using a number of standard benchmark datasets in order to make the results comparable with those available in current and future literature. Several modifications to the original Electrostatic Field Classifier aiming at improving speed and robustness in higher dimensional spaces have also been introduced and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

We use the term ‘sample’ to refer to a single object/instance and not to the whole dataset, which is common in statistics literature.
Deficiency level is the level of missingness of a dataset, with 0 for complete data and 1 for maximally incomplete data, taking into account the constraints given.

References

Aggarwal C (2001) Re-designing distance functions and distance-based applications for high dimensional data. ACM SIGMOD Rec 30(1):13–18
Article Google Scholar
Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. Lect Notes Comput Sci 2001:420–435
Article Google Scholar
Asuncion A, Newman D (2007) UCI machine learning repository
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful. Lect Notes Comput Sci 1540:217–235
Article Google Scholar
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. ACM, New York, NY, USA, pp 92–100
Budka M, Gabrys B (2009) Electrostatic field classifier for deficient data. In: Computer recognition systems 3: Proceedings of 6th international conference on computer recognition systems cores 09. Springer, pp 311–318
Chuang I, Nielsen M (2000) Quantum information. Cambridge University Press
Dara R, Kremer S, Stacey D (2002) Clustering unlabeled data with SOMs improves classification of labeled real-world data. In: Neural networks, 2002. IJCNN’02. Proceedings of the 2002 international joint conference on, vol 3, pp 2237–2242
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38
MathSciNet MATH Google Scholar
Duin R, Juszczak P, Paclik P, Pekalska E, de Ridder D, Tax D, Verzakov S (2007) Pr-tools 4.1, a matlab toolbox for pattern recognition. http://prtools.org
Francois D, Wertz V, Verleysen M (2005) Non-Euclidean metrics for similarity search in noisy datasets. In: Proceedings of the European symposium on artificial neural networks, pp 339–334
Gabrys B (2002) Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. Int J Approx Reason 30(3):149–179
Article MathSciNet MATH Google Scholar
Gabrys B, Petrakieva L (2004) Combining labelled and unlabelled data in the design of pattern classification systems. Int J Approx Reason 35(3):251–273
Article MathSciNet MATH Google Scholar
Ghahramani Z, Jordan M, Cowan J, Tesauro G, Alspector J (1994) Supervised learning from incomplete data via an EM approach. Adv Neural Inf Process Syst 6:120–127
Google Scholar
Graham J, Cumsille P, Elek-Fisk E (2003) Methods for handling missing data. Handb Psychol 2:87–114
Google Scholar
Hakkoymaz H, Chatzimilioudis G, Gunopulos D, Mannila H (2009) Applying electromagnetic field theory concepts to clustering with constraints. In: Proceedings of the European conference on machine learning and knowledge discovery in databases: part I. Springer, p 500
Hild K, Erdogmus D, Principe J (2001) Blind source separation using Renyi’s mutual information. IEEE Signal Process Lett 8(6):174–176
Article Google Scholar
Hochreiter S, Mozer M (2001) Coulomb classifiers: reinterpreting SVMs as electrostatic systems. Technical report CU-CS-921-01. Department of Computer Science, University of Colorado, Boulder
Hochreiter S, Mozer M, Obermayer K (2003) Coulomb classifiers: generalizing support vector machines via an analogy to electrostatic systems. Adv Neural Inf Process Syst 15:545–552
Google Scholar
Jenssen R, Eltoft T, Erdogmus D, Principe J (2006) Some equivalences between kernel methods and information theoretic methods. J VLSI Signal Process 45(1):49–65
Article Google Scholar
Kothari R, Jain V (2002) Learning from labeled and unlabeled data. In: Neural networks, 2002. IJCNN’02. Proceedings of the 2002 international joint conference on, vol 3
Kuncheva L (2000) Fuzzy classifier design. Physica Verlag
Loss D, DiVincenzo D (1998) Quantum computation with quantum dots. Phys Rev A 57(1):120–126
Article Google Scholar
Madow W, Olkin I (1983) Incomplete data in sample surveys, vol 3, Proceedings of the symposium. Academic Press, New York
Mitchell T (1999) The role of unlabeled data in supervised learning. In: Proceedings of the sixth international colloquium on cognitive science
Nigam K, Ghani R (2000) Understanding the behavior of co-training. In: Proceedings of KDD-2000 workshop on text mining
Outhwaite W, Turner SP (2007) Handbook of social science methodology. SAGE Publications Ltd
Pedrycz W, Waletzky J (1997) Fuzzy clustering with partial supervision. IEEE Trans Syst Man Cybern B 27(5):787–795
Article Google Scholar
Principe J, Xu D, Fisher J (2000a) Information theoretic learning, chapter 7. Wiley, New York, pp 265–319
Principe J, Xu D, Zhao Q, Fisher J (2000b) Learning from examples with information theoretic criteria. J VLSI Signal Process 26(1):61–77
Article MATH Google Scholar
Ripley B (1996) Pattern recognition and neural networks. Cambridge University Press
Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the 18th international conference on machine learning, pp 441–448
Rubin D (1976) Inference and missing data. Biometrika 63(3):581–592
Article MathSciNet MATH Google Scholar
Rubin D (1987) Multiple imputation for nonresponse in surveys. Wiley-Interscience
Ruta D, Gabrys B (2003) Physical field models for pattern classification. Soft Comput 8(2):126–141
MathSciNet Google Scholar
Ruta D, Gabrys B (2005) Nature inspired learning models. In: Proceedings of the European symposium on nature inspired smart information systems, Albufeira, Portugal
Ruta D, Gabrys B (2009) A framework for machine learning based on dynamic physical fields. Nat Comput 8(2):219–237
Article MathSciNet MATH Google Scholar
Sarle W (1998) Prediction with missing inputs. JCIS 98:399–402
Google Scholar
Schafer J, Graham J (2002) Missing data: our view of the state of the art. Psychol Methods 7(2):147–177
Article Google Scholar
Schafer J, Schenker N (2000) Inference with imputed conditional means. J Am Stat Assoc 95(449):144–154
Article MathSciNet MATH Google Scholar
Sg SG, Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. Proceedings of the 17th international conference on machine learning, pp 327–334
Torkkola K (2003) Feature extraction by non parametric mutual information maximization. J Mach Learn Res 3:1415–1438
Article MathSciNet MATH Google Scholar
Tresp V, Ahmad S, Neuneier R (1994) Training neural networks with deficient data. Adv Neural Inf Process Syst 6:128–135
Google Scholar
Walther P, Resch K, Rudolph T, Schenck E, Weinfurter H, Vedral V, Aspelmeyer M, Zeilinger A (2005) Experimental one-way quantum computing. Nature 434:169–176
Article Google Scholar
Zurek W (1989) Complexity, entropy and the physics of information. Westview Press

Download references

Author information

Authors and Affiliations

Computational Intelligence Research Group, School of Design, Engineering & Computing, Bournemouth University, Poole House, Talbot Campus, Fern Barrow, Poole, BH12 5BB, UK
Marcin Budka & Bogdan Gabrys

Authors

Marcin Budka
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Gabrys
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcin Budka.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Budka, M., Gabrys, B. Electrostatic field framework for supervised and semi-supervised learning from incomplete data. Nat Comput 10, 921–945 (2011). https://doi.org/10.1007/s11047-010-9182-4

Download citation

Published: 17 February 2010
Issue Date: June 2011
DOI: https://doi.org/10.1007/s11047-010-9182-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Electrostatic field framework for supervised and semi-supervised learning from incomplete data

Abstract

Access this article

Similar content being viewed by others

A Semi-supervised Clustering for Incomplete Data

Mining Incomplete Data with Many Lost and Attribute-Concept Values

ITCI:An Information Theory Based Classification Algorithm for Incomplete Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Electrostatic field framework for supervised and semi-supervised learning from incomplete data

Abstract

Access this article

Similar content being viewed by others

A Semi-supervised Clustering for Incomplete Data

Mining Incomplete Data with Many Lost and Attribute-Concept Values

ITCI:An Information Theory Based Classification Algorithm for Incomplete Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation