Feature selection based on community detection in feature correlation networks

Savić, Miloš; Kurbalija, Vladimir; Bosnić, Zoran; Ivanović, Mirjana

doi:10.1007/s00607-019-00705-8

Feature selection based on community detection in feature correlation networks

Published: 21 January 2019

Volume 101, pages 1513–1538, (2019)
Cite this article

Computing Aims and scope Submit manuscript

Miloš Savić¹,
Vladimir Kurbalija¹,
Zoran Bosnić² &
…
Mirjana Ivanović¹

479 Accesses
4 Citations
Explore all metrics

Abstract

Feature selection is an important data preprocessing step in data mining and machine learning tasks, especially in the case of high dimensional data. In this paper, we propose a novel feature selection method based on feature correlation networks, i.e. complex weighted networks describing the strongest correlations among features in a dataset. The method utilizes community detection techniques to identify cohesive groups of features in feature correlation networks. A subset of features exhibiting a strong association with the class variable is selected according to the identified community structure taking into account the size of feature communities and connections within them. The proposed method is experimentally evaluated on a high dimensional dataset containing signaling protein features related to the diagnosis of Alzheimer’s disease. We compared the performance of seven commonly used classifiers that were trained without feature selection, after feature selection by four variants of our method determined by different community detection techniques, and after feature selection by four widely used state-of-the-art feature selection methods available in the WEKA machine learning library. The results of the experimental evaluation indicate that our method improves the classification accuracy of several classification models while greatly reducing the dimensionality of the dataset. Additionally, our method tends to outperform traditional feature selection methods provided by the WEKA library.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Feature Selection Method Based on Feature Correlation Networks

Estimation of Discriminative Feature Subset Using Community Modularity

Article Open access 28 April 2016

Detecting Communities in Feature-Rich Networks with a K-Means Method

Notes

The source code of FSFCN can be downloaded from https://github.com/milsav/FSFCN.
https://github.com/Craigacp/JavaMI.

References

Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008
Article Google Scholar
Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU (2006) Complex networks: structure and dynamics. Phys Rep 424(4–5):175–308. https://doi.org/10.1016/j.physrep.2005.10.009
Article MathSciNet MATH Google Scholar
Butterworth R, Piatetsky-Shapiro G, Simovici DA (2005) On feature selection through clustering. In: Proceedings of the Fifth IEEE international conference on data mining, ICDM ’05. IEEE Computer Society, Washington, pp. 581–584. https://doi.org/10.1109/ICDM.2005.106
Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111. https://doi.org/10.1103/PhysRevE.70.066111
Article Google Scholar
Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst 1695(5):1–9
Google Scholar
Duch W (2006) Filter methods. Springer, Berlin, pp 89–117. https://doi.org/10.1007/978-3-540-35488-8_4
Book Google Scholar
Fortunato S (2010) Community detection in graphs. Phys Rep 486(35):75–174. https://doi.org/10.1016/j.physrep.2009.11.002
Article MathSciNet Google Scholar
Frank E, Hall M, Holmes G, Kirkby R, Pfahringer B, Witten IH, Trigg L (2010) Weka–a machine learning workbench for data mining. Springer, Boston, pp 1269–1277
Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hall MA (1998) Correlation-based feature subset selection for machine learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand
Horvath S (2011) Correlation and gene co-expression networks. Springer, New York, pp 91–121. https://doi.org/10.1007/978-1-4419-8819-5_5
Book Google Scholar
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. Springer, Berlin, pp 171–182. https://doi.org/10.1007/3-540-57868-4_57
Book Google Scholar
Kononenko I, Šimec E, Robnik-Šikonja M (1997) Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl Intell 7(1):39–55. https://doi.org/10.1023/A:1008280620621
Article Google Scholar
Krier C, Franois D, Rossi F, Verleysen M (2007) Feature clustering and mutual information for the selection of variables in spectral data. In: Proceedings of European symposium on artificial neural networks advances in computational intelligence and learning, pp 157–162
Lal TN, Chapelle O, Weston J, Elisseeff A (2006) Embedded methods. Springer, Berlin, pp 137–165. https://doi.org/10.1007/978-3-540-35488-8_6
Book Google Scholar
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2016) Feature selection: a data perspective. arXiv preprint arXiv:1601.07996
Li Y, Liu W, Jia Y, Dong H (2017) A weighted mutual information biclustering algorithm for gene expression data. Comput Sci Inf Syst 14(3):643–660. https://doi.org/10.2298/CSIS170301021Y
Article Google Scholar
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60. https://doi.org/10.2307/2236101
Article MathSciNet MATH Google Scholar
Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256. https://doi.org/10.1137/S003614450342480
Article MathSciNet MATH Google Scholar
Newman MEJ (2004) Analysis of weighted networks. Phys Rev E 70:056131. https://doi.org/10.1103/PhysRevE.70.056131
Article Google Scholar
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113. https://doi.org/10.1103/PhysRevE.69.026113
Article Google Scholar
Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):191–218. https://doi.org/10.1007/11569596_31
Article MathSciNet MATH Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850. https://doi.org/10.2307/2284239
Article Google Scholar
Ray S, Britschgi M, Herbert C, Takeda-Uchimura Y, Boxer A, Blennow K, Friedman L, Galasko D, Jutel M, Karydas A, Kaye J, Leszek J, Miller B, Minthon L, Quinn J, Rabinovici G, Robinson W, Sabbagh M, So Y, Sparks D, Tabaton M, Tinklenberg J, Yesavage J, Tibshirani R, Wyss-Coray T (2007) Classification and prediction of clinical Alzheimer’s diagnosis based on plasma signaling proteins. Nat Med 13(11):1359–1362. https://doi.org/10.1038/nm1653
Article Google Scholar
Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1):23–69. https://doi.org/10.1023/A:1025667309714
Article MATH Google Scholar
Rosvall M, Bergstrom CT (2007) Maps of information flow reveal community structure in complex networks. Proc Natl Acad Sci USA 105(4):1118–1123. https://doi.org/10.1073/pnas.0706851105
Article Google Scholar
Sánchez-Maroño N, Alonso-Betanzos A, Tombilla-Sanromán M (2007) Filter methods for feature selection—a comparative study. Springer, Berlin, pp 178–187. https://doi.org/10.1007/978-3-540-77226-2_19
Book Google Scholar
Savić M, Ivanović M, Radovanović M, Ognjanović Z, Pejović A, Jakšić Krüger T (2015) Exploratory analysis of communities in co-authorship networks: a case study. In: Bogdanova AM, Gjorgjevikj D (eds) ICT innovations 2014. Springer, Cham, pp 55–64. https://doi.org/10.1007/978-3-319-09879-1_6
Chapter Google Scholar
Savić M, Ivanović M, Surla BD (2016) A community detection technique for research collaboration networks based on frequent collaborators cores. In: Proceedings of the 31st annual ACM symposium on applied computing, SAC ’16. ACM, New York, pp 1090–1095. https://doi.org/10.1145/2851613.2851809
Savić M, Kurbalija V, Ivanović M, Bosnić Z (2017) A feature selection method based on feature correlation networks. In: Ouhammou Y, Ivanovic M, Abelló A, Bellatreche L (eds) Model and data engineering. Springer, Cham, pp 248–261. https://doi.org/10.1007/978-3-319-66854-3_19
Chapter Google Scholar
Slavkov I, Karcheska J, Kocev D, Dzeroski S (2018) HMC-ReliefF: feature ranking for hierarchical multi-label classification. Comput Sci Inf Syst 15(1):187–209. https://doi.org/10.2298/CSIS170115043S
Article Google Scholar
Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14. https://doi.org/10.1109/TKDE.2011.181
Article Google Scholar
Van Dijck G, Van Hulle MM (2006) speeding up the wrapper feature subset selection in regression by mutual information relevance and redundancy analysis. Springer, Berlin, pp 31–40. https://doi.org/10.1007/11840817_4
Book Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83. https://doi.org/10.2307/3001968
Article Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques (Morgan Kaufmann Series in Data Management Systems), 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco
MATH Google Scholar
Yu L, Liu H (2003) Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Fawcett T, Mishra N (eds) Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863
Zhang Z, Hancock ER (2011) A graph-based approach to feature selection. Springer, Berlin, pp 205–214. https://doi.org/10.1007/978-3-642-20844-7_21
Book MATH Google Scholar
Zhao Z, Liu H (2007) Searching for interacting features. In: Proceedings of the 20th international joint conference on artifical intelligence, IJCAI’07. Morgan Kaufmann Publishers Inc., San Francisco, pp 1156–1161

Download references

Acknowledgements

This work is supported by the bilateral project “Intelligent computer techniques for improving medical detection, analysis and explanation of human cognition and behavior disorders” between the Ministry of Education, Science and Technological Development of the Republic of Serbia and the Slovenian Research Agency. M. Savić, V. Kurbalija and M. Ivanović also thank the Ministry of Education, Science and Technological Development of the Republic of Serbia for additional support through Project No. OI174023, “Intelligent techniques and their integration into wide-spectrum decision support”.

Author information

Authors and Affiliations

Department of Mathematics and Informatics, Faculty of Sciences, University of Novi Sad, Trg Dositeja Obradovića 4, 21000, Novi Sad, Serbia
Miloš Savić, Vladimir Kurbalija & Mirjana Ivanović
Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000, Ljubljana, Slovenia
Zoran Bosnić

Authors

Miloš Savić
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Kurbalija
View author publications
You can also search for this author in PubMed Google Scholar
Zoran Bosnić
View author publications
You can also search for this author in PubMed Google Scholar
Mirjana Ivanović
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miloš Savić.

Additional information

This paper is an extended version of our paper [30] presented at the 7th International Conference on Model and Data Engineering (MEDI 2017).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Savić, M., Kurbalija, V., Bosnić, Z. et al. Feature selection based on community detection in feature correlation networks. Computing 101, 1513–1538 (2019). https://doi.org/10.1007/s00607-019-00705-8

Download citation

Received: 15 March 2018
Accepted: 14 January 2019
Published: 21 January 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s00607-019-00705-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection based on community detection in feature correlation networks

Abstract

Access this article

Similar content being viewed by others

A Feature Selection Method Based on Feature Correlation Networks

Estimation of Discriminative Feature Subset Using Community Modularity

Detecting Communities in Feature-Rich Networks with a K-Means Method

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature selection based on community detection in feature correlation networks

Abstract

Access this article

Similar content being viewed by others

A Feature Selection Method Based on Feature Correlation Networks

Estimation of Discriminative Feature Subset Using Community Modularity

Detecting Communities in Feature-Rich Networks with a K-Means Method

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation