A Supervised Methodology to Measure the Variables Contribution to a Clustering

Ismaili, Oumaima Alaoui; Lemaire, Vincent; Cornuéjols, Antoine

doi:10.1007/978-3-319-12637-1_20

Oumaima Alaoui Ismaili^20,21,
Vincent Lemaire²¹ &
Antoine Cornuéjols²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8834))

Included in the following conference series:

International Conference on Neural Information Processing

4903 Accesses
3 Citations

Abstract

This article proposes a supervised approach to evaluate the contribution of explanatory variables to a clustering. The main idea is to learn to predict the instance membership to the clusters using each individual variable. All variables are then sorted with respect to their predictive power, which is measured using two evaluation criteria, i.e. accuracy (ACC) or Adjusted Rand Index (ARI). Once the relevant variables which contribute to the clustering discrimination have been determined, we filter out the redundant ones thanks to a supervised method. The aim of this work is to help end-users to easily understand a clustering of high-dimensional data. Experimental results show that our proposed method is competitive with existing methods from the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
MathSciNet MATH Google Scholar
Liu, P., Zhu, J., Liu, L., Li, Y., Zhang, X.: Application of feature selection for unsupervised learning in prosecutors’ office. In: Wang, L., Jin, Y. (eds.) FSKD 2005. LNCS (LNAI), vol. 3614, pp. 35–38. Springer, Heidelberg (2005)
Chapter Google Scholar
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)
Article Google Scholar
Vesanto, J., Ahola, J.: Hunting for correlations in data using the self-organizing map. In: Proceeding of the International ICSC Congress on Computational Intelligence Methods and Applications (CIMA 1999), pp. 279–285. ICSC Academic Press (1999)
Google Scholar
Guérif, S., Bennani, Y., Janvier, E.: μ-SOM: Weighting features during clustering. In: Proceeding of the 5th Workshop on Self-organizing Maps (WSOM 2005), pp. 397–404 (2005)
Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2-3), 107–145 (2001)
Article MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)
Article Google Scholar
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ICDM), pp. 16–22. ACM, New York (1999)
Chapter Google Scholar
Alok, A.K., Sriparna, S., Ekbal, A.: A min-max distance based external cluster validity index: MMI. In: HIS, pp. 354–359. IEEE (2012)
Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1(2), 224–227 (1979)
Google Scholar
Rousseeuw, P.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. (1), 53–65 (1987)
Google Scholar
Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cybernetics 3(3), 32–57 (1973)
Article MathSciNet MATH Google Scholar
Halkidi, M., Vazirgiannis, M., Batistakis, Y.: Quality scheme assessment in the clustering process. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 265–276. Springer, Heidelberg (2000)
Chapter Google Scholar
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991)
Article Google Scholar
Raftery, A.: A note on Bayes factors for log-linear contingency table models with vague prior information. Journal of the Royal Statistical Society, 249–250 (1986)
Google Scholar
Santos, J.M., Embrechts, M.: On the use of the adjusted rand index as a metric for evaluating supervised classification. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part II. LNCS, vol. 5769, pp. 175–184. Springer, Heidelberg (2009)
Chapter Google Scholar
Blake, C.L., Merz, C.J.: Uci repository of machine learning databases (1998)
Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Google Scholar
Celeux, G., Martin-Magniette, M.L., Maugis, C., Raftery, A.E.: Comparing model selection and regularization approaches to variable selection in model-based clustering. Journal de la Société Française de Statistique (2014)
Google Scholar
Arthur, D., Vassilvitskii, S.: K-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035 (2007)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth International Group (1984)
Google Scholar

Download references

Author information

Authors and Affiliations

AgroParisTech, 16, rue Claude Bernard, 75005, Paris, France
Oumaima Alaoui Ismaili & Antoine Cornuéjols
Orange Labs, 2 av. Pierre Marzin, 22300, Lannion, France
Oumaima Alaoui Ismaili & Vincent Lemaire

Authors

Oumaima Alaoui Ismaili
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Lemaire
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Cornuéjols
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Artificial Intelligence, Faculty of Computer Science and Information Technology Building, University of Malaya, 50603, Kuala Lumpur, Malaysia
Chu Kiong Loo
Department of Electronics and Communication Engineering,College of Engineering, Jalan IKRAM-UNITEN, Universiti Tenaga Nasional, 43009, Kajang, Selangor, Malaysia
Keem Siah Yap
School of Engineering and Information Technology, Murdoch University, South St, 6150, Murdoch, Western Australia, Australia
Kok Wai Wong
Department of Electrical and Electronics Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, 120-749, Seoul, South Korea
Andrew Teoh
Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, Ren’ai Road 111, SIP 215123, Suzhou, Jiangsu Province, China
Kaizhu Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ismaili, O.A., Lemaire, V., Cornuéjols, A. (2014). A Supervised Methodology to Measure the Variables Contribution to a Clustering. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8834. Springer, Cham. https://doi.org/10.1007/978-3-319-12637-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-12637-1_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12636-4
Online ISBN: 978-3-319-12637-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics