skip to main content
10.1145/3319619.3326836acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections

A new evolutionary rough fuzzy integrated machine learning technique for microRNA selection using next-generation sequencing data of breast cancer

Published: 13 July 2019 Publication History


MicroRNAs (miRNA) play an important role in various biological process by regulating gene expression. Their abnormal expression may lead to cancer. Therefore, analysis of such data may discover potential biological insight for cancer diagnosis. In this regard, recently many feature selection methods have been developed to identify such miRNAs. These methods have their own merits and demerits as the task is very challenging in nature. Thus, in this article, we propose a novel wrapper based feature selection technique with the integration of Rough and Fuzzy sets, Random Forest and Particle Swarm Optimization, to identify putative miRNAs that can solve the underlying biological problem effectively, i.e. to separate tumour and control samples. Here, Rough and Fuzzy sets help to address the vagueness and overlapping characteristics of the dataset while performing clustering. On the other hand, Random Forest is applied to perform the classification task on the clustering results to yield better solutions. The integrated clustering and classification tasks are considered as an underlying optimization problem for Particle Swarm Optimization method where particles encode features, in this case, miRNAs. The performance of the proposed wrapper based method has been demonstrated quantitatively and visually on next-generation sequencing data of breast cancer from The Cancer Genome Atlas (TCGA). Finally, the selected miRNAs are validated through biological significance tests. The code and dataset used in this paper are available online1.


N. S. Altman. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46, 3 (1992), 175--185.
C. H. Bang-Berthelsen, L. Pedersen, T. Fløyel, P. H. Hagedorn, T. Gylvin, and F. Pociot. 2011. Independent component and pathway-based analysis of miRNA-regulated gene expression in a model of type 1 diabetes. BMC Genomics 12, 1 (2011), 97.
D. P. Bartel. 2009. MicroRNAs: target recognition and regulatory functions. Cell 136 (2009), 215--233.
S. Baskerville and D. P. Bartel. 2005. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 11, 3 (2005), 241--247.
J. C. Bezdek. 1981. Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic, MA, USA.
L. Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5--32.
C. Cortes and V. Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273--297.
C. M. Croce. 2009. Causes and consequences of microRNA dysregulation in cancer. Nature Reviews Genetics 10 (2009), 704--714.
H. George and J. P. Langley. 1995. Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence 69 (1995), 338--345.
A. Grimson, K. K. Farh, W. K. Johnston, P. Garrett-Engele, L. P. Lim, and D. P. Bartel. 2007. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Molecular Cell 27, 1 (2007), 91--105.
J. G. Hunsberger, E. B. Fessler, F. L. Chibane, Y. Leng, D. Maric, A. G. Elkahloun, and D. M. Chuang. 2013. Mood stabilizer-regulated miRNAs in neuropsychiatric and neurodegenerative diseases: identifying associations and functions. American Journal of Translational Research 5, 4 (2013), 450--464.
A. Jacobsen, J. Wen, D. S. Marks, and A. Krogh. 2010. Signatures of RNA binding proteins globally coupled to effective microRNA target sites. Genome Research 20 (2010), 1010--1019.
M. Kanehisa and S. Goto. 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28 (2000), 27--30.
J. Kennedy and R. Eberhert. 1995. Particle swarm Optimization. In Proceedings of IEEE International Conference on Neural Networks 4 (1995), 1942--1948.
M. V. Kuleshov, M. R. Jones, A. D. Rouillard, N. F. Fernandez, Q. Duan, Z. Wang, S. Koplev, S. L. Jenkins, K. M. Jagodnik, A. Lachmann, M. G. McDermott, C. D. Monteiro, G. W. Gundersen, and A. Ma'ayan. 2016. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research 44 (2016), W90--W97.
P. Maji and S. Paul. 2013. Rough-Fuzzy Clustering for Grouping Functionally Similar Genes from Microarray Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10, 2 (2013), 286--299.
A. Michael and et. al. 2000. Gene Ontology: tool for the unification of biology. Nature Genetics 25 (2000), 25--29.
Z. Pawlak. 1992. Rough Sets: Theoretical Aspects of Resoning About Data. Kluwer Academic Publishers, Norwell, MA, USA.
G. Peters, F. Crespo, P. Lingras, and R. Weber. 2013. Soft clustering - Fuzzy and rough approaches and their extensions and derivatives. International Journal of Approximate Reasoning 54, 2 (2013), 307--322.
J. R. Quinlan. 1986. Induction of Decision Trees. Machine Learning 1, 1 (1986), 81--106.
A. Rodriguez, S. Griffiths-Jones, J. L. Ashurst, and A. Bradley. 2004. Identification of mammalian microRNA host genes and transcription units. Genome Research 14, 10A (2004), 1902--1910.
H. Song, Q. Wang, Y. Guo, S. Liu, R. Song, X. Gao, L. Dai, B. Li, D. Zhang, and J. Cheng. 2013. Microarray analysis of microRNA expression in peripheral blood mononuclear cells of critically ill patients with influenza A (H1N1). BMC Infectious Diseases 13, 1 (2013), 257.
Y. Sun, S. Koo, N. White, E. Peralta, C. Esau, N. M. Dean, and R. J. Perera. 2004. Development of a micro-array to detect human and mouse microRNAs and characterization of expression in human organs. Nucleic Acids Research 32 (2004), e188.
Damian Szklarczyk, John H Morris, Helen Cook, Michael Kuhn, Stefan Wyder, Milan Simonovic, Alberto Santos, Nadezhda T Doncheva, Alexander Roth, Peer Bork, Lars J. Jensen, and Christian von Mering. 2017. The STRING database in 2017: quality-controlled protein - protein association networks, made broadly accessible. Nucleic Acids Research 45 (2017), D362--D368.
D. Szklarczyk, J. H. Morris, H. Cook, M. Kuhn, S. Wyder, M. Simonovic, A. Santos, N. T. Doncheva, A. Roth, P. Bork, L. J. Jensen, and C. vonMering. 2017. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Research 45 (2017), D362--D368.
I. Vlachos, K. Zagganas, M. D. Paraskevopoulou, G. Georgakilas, D. Karagkouni, T. Vergoulis, T. Dalamagas, and A. Hatzigeorgiou. 2015. DIANA-miRPath v3.0: Deciphering microRNA function with experimental support. Nucleic Acids Research 43 (2015), W460-W466.
S. X. Yang, E. Polley, and S. Lipkowitz. 2016. New insights on PI3K/AKT pathway alterations and clinical outcomes in breast cancer. Cancer Treatment Review 45 (2016), 87--96.
X. Zhang, N. Tang, T. J. Hadden, and A. Rishi. 2011. Akt, FoxO and regulation of apoptosis. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1813, 11 (2011), 1978--1986.

Cited By

View all
  • (2022)Breast cancer prediction from microRNA profiling using random subspace ensemble of LDA classifiers via Bayesian optimizationMultimedia Tools and Applications10.1007/s11042-021-11653-x81:29(41785-41805)Online publication date: 12-Jul-2022

Index Terms

  1. A new evolutionary rough fuzzy integrated machine learning technique for microRNA selection using next-generation sequencing data of breast cancer



          Information & Contributors


          Published In

          cover image ACM Conferences
          GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion
          July 2019
          2161 pages
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 13 July 2019


          Request permissions for this article.

          Check for updates

          Author Tags

          1. breast cancer
          2. clustering
          3. feature selection
          4. fuzzy set
          5. particle swarm optimization
          6. random forest
          7. rough set


          • Research-article

          Funding Sources


          GECCO '19
          GECCO '19: Genetic and Evolutionary Computation Conference
          July 13 - 17, 2019
          Prague, Czech Republic

          Acceptance Rates

          Overall Acceptance Rate 1,669 of 4,410 submissions, 38%


          Other Metrics

          Bibliometrics & Citations


          Article Metrics

          • Downloads (Last 12 months)8
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 05 Mar 2025

          Other Metrics


          Cited By

          View all
          • (2022)Breast cancer prediction from microRNA profiling using random subspace ensemble of LDA classifiers via Bayesian optimizationMultimedia Tools and Applications10.1007/s11042-021-11653-x81:29(41785-41805)Online publication date: 12-Jul-2022

          View Options

          Login options

          View options


          View or Download as a PDF file.



          View online with eReader.







          Share this Publication link

          Share on social media