Unsupervised Joint Feature Discretization and Selection

Ferreira, Artur; Figueiredo, Mário

doi:10.1007/978-3-642-21257-4_25

Artur Ferreira^19,21 &
Mário Figueiredo^20,21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6669))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

3054 Accesses
6 Citations
1 Altmetric

Abstract

In many applications, we deal with high dimensional datasets with different types of data. For instance, in text classification and information retrieval problems, we have large collections of documents. Each text is usually represented by a bag-of-words or similar representation, with a large number of features (terms). Many of these features may be irrelevant (or even detrimental) for the learning tasks. This excessive number of features carries the problem of memory usage in order to represent and deal with these collections, clearly showing the need for adequate techniques for feature representation, reduction, and selection, to both improve the classification accuracy and the memory requirements. In this paper, we propose a combined unsupervised feature discretization and feature selection technique. The experimental results on standard datasets show the efficiency of the proposed techniques as well as improvement over previous similar techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, Heidelberg (2001)
Book MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction, Foundations and Applications. Springer, Heidelberg (2006)
Google Scholar
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier, Morgan Kauffmann (2005)
Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. on Information Theory IT-28, 127–135 (1982)
MathSciNet MATH Google Scholar
Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer Academic Publishers, Dordrecht (2001)
Google Scholar
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
MATH Google Scholar
Yan, X.: A formal study of feature selection in text categorization. Journal of Communication and Computer 6(4) (April 2009)
Google Scholar
Ferreira, A., Figueiredo, M.: Unsupervised feature selection for sparse data. In: 19th Europ. Symp. on Art. Neural Networks-ESANN 2011, Belgium (April 2011)
Google Scholar
Ferreira, A., Figueiredo, M.: Feature transformation and reduction for text classification. In: 10th Int. Workshop PRIS 2010, Portugal (June 2010)
Google Scholar
Liu, L., Kang, J., Yu, J., Wang, Z.: A comparative study on unsupervised feature selection methods for text clustering. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering, pp. 597–601 (2005)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Article Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory. John Wiley, Chichester (1991)
Book MATH Google Scholar
Duin, R., Paclik, P., Juszczak, P., Pekalska, E., Ridder, D., Tax, D., Verzakov, S.: PRTools4.1, a Matlab Toolbox for Pattern Recognition. Technical report, Delft University of Technology (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Superior de Engenharia de Lisboa, Lisboa, Portugal
Artur Ferreira
Instituto Superior Técnico, Lisboa, Portugal
Mário Figueiredo
Instituto de Telecomunicações, Lisboa, Portugal
Artur Ferreira & Mário Figueiredo

Authors

Artur Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Mário Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departament de Matemàtica Aplicada i Anàlisi, Universitat de Barcelona, Facultat de Matemàtiques, Gran Via de les Corts Catalanes 585, 08007, Barcelona, Spain
Jordi Vitrià
Instituto de Sistemas e Robótica / Instituto Superior Técnico, Av. Rovisco Pais, 1, 1049-001, Lisbon, Portugal
João Miguel Sanches
Institute for Intelligent Systems and Numerical Applications in Engineering (SIANI), Edificio de Informática y Matemáticas, University of Las Palmas de Gran Canaria, Campus Universitario de Tafira, 35017, Las Palmas, Spain
Mario Hernández

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferreira, A., Figueiredo, M. (2011). Unsupervised Joint Feature Discretization and Selection. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds) Pattern Recognition and Image Analysis. IbPRIA 2011. Lecture Notes in Computer Science, vol 6669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21257-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-21257-4_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21256-7
Online ISBN: 978-3-642-21257-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics