Skip to main content

Technique of Gene Expression Profiles Selection Based on SOTA Clustering Algorithm Using Statistical Criteria and Shannon Entropy

  • Conference paper
  • First Online:
Lecture Notes in Computational Intelligence and Decision Making (ISDMCI 2020)

Abstract

In this paper, we have presented a results of the research concerning selection of informative genes based on the complex use of both statistical criteria and Shannon entropy. The main objective of the research is development a technique of selection of the groups of gene expression profiles which allow dividing correctly the investigated samples into previously known classes using results of both DNA micro array experiments or RNA molecules sequencing method. The DNA microchips of the patients which were investigated on lung cancer disease were used as the experimental data. At the first step, we have selected the informative genes in term of both the statistical criteria and Shannon entropy. The number of gene expression profiles was reduced at this step from 54675 to 21431. Then, we have performed the step by step clustering process using SOTA algorithm. The number of the obtained clusters was varied from 2 at the first clustering level to 512 at the ninth level. Finally, we have calculated the internal clustering quality criterion for investigated samples which are in each of the clusters. The less value of this criterion corresponds to higher separate ability of genes in this cluster. The proposed technique creates the conditions for development of both the diagnostic method and forecasting technique based on gene regulatory networks using results of both DNA microchip experiments or RNA molecules sequencing method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Affymetrix: Statistical algorithms description document. Affymetrix, Inc., Santa Clara, CA (2002)

    Google Scholar 

  2. Alexiou, A., Chatzichronis, S., Perveen, A., Hafeez, A., Ashraf, G.M.: Algorithmic and stochastic representations of gene regulatory networks and protein-protein interactions. Curr. Topics Med. Chem. 19(6), 413–425 (2019). https://doi.org/10.2174/1568026619666190311125256

    Article  Google Scholar 

  3. Astrand, M.: Contrast normalization of oligonucleotide arrays. J. Comput. Biol. 10(1), 95–102 (2003). https://doi.org/10.1089/106652703763255697

    Article  Google Scholar 

  4. Babichev, S.: An evaluation of the information technology of gene expression profiles processing stability for different levels of noise components. Data 3(4), art. no. 48 (2018). https://doi.org/10.3390/data3040048

  5. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., Senkivskyy, V.: Techniques of DNA microarray data pre-processing based on the complex use of bioconductor tools and shannon entropy. In: CEUR Workshop Proceedings, vol. 2353, pp. 365–377 (2019)

    Google Scholar 

  6. Babichev, S., Škvor, J., Fišer, J., Lytvynenko, V.: Technology of gene expression profiles filtering based on wavelet analysis. Int. J. Intell. Syst. Appl. 10(4), 1–7 (2018). https://doi.org/10.5815/ijisa.2018.04.01

    Article  Google Scholar 

  7. Babichev, S., Lytvynenko, V., Skvor, J., Fiser, J.: Model of the objective clustering inductive technology of gene expression profiles based on SOTA and DBSCAN clustering algorithms. Adv. Intell. Syst. Comput. 689, 21–39 (2018). https://doi.org/10.1007/978-3-319-70581-1_2

    Article  Google Scholar 

  8. Babichev, S., Barilla, J., Fišer, J., Škvor, J.: A hybrid model of gene expression profiles reducing based on the complex use of fuzzy inference system and clustering quality criteria. In: 2019 Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology (EUSFLAT 2019). Atlantis Press (2019/08). https://doi.org/10.2991/eusflat-19.2019.20

  9. Barbara, D., Wu, X.: An approximate median polish algorithm for large multidimensional data sets. Springer-Verlag London Ltd. Knowl. Inf. Syst. 5, 416–438 (2003)

    Google Scholar 

  10. Bolstad, B.M., Irizarry, R.A., Åstrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003). https://doi.org/10.1093/bioinformatics/19.2.185

    Article  Google Scholar 

  11. Byron, K., Wang, J.T.L.: A comparative review of recent bioinformatics tools for inferring gene regulatory networks using time-series expression data. Int. J. Data Mining Bioinform. 20(4), 320–340 (2018). https://doi.org/10.1504/IJDMB.2018.094889

    Article  Google Scholar 

  12. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)

    MathSciNet  MATH  Google Scholar 

  13. Chen, Y.J., Kodell, R., Sistare, F., Thompson, K.L., Morris, S., Chen, J.J.: Normalization methods for analysis of microarray gene-expression data. J. Biopharmaceutical Stat. 13(1), 57–74 (2003). https://doi.org/10.1081/BIP-120017726

    Article  MATH  Google Scholar 

  14. Chen, Z., McGee, M., Liu, Q., Kong, M., Deng, Y., Scheuermann, R.H.: A distribution-free convolution model for background correction of oligonucleotide microarray data. BMC Genom. 10(1), 19 (2009). https://doi.org/10.1186/1471-2164-10-S1-S19

    Article  Google Scholar 

  15. Dorazo, J., Carazo, J.M.: Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J. Mol. Evol. 44(2), 226–260 (1997). https://doi.org/10.1007/PL00006139

    Article  Google Scholar 

  16. Eren, K., Deveci, M., Kucuktunc, O., Catalyurek, U.V.: A comparative analysis of biclustering algorithms for gene expression data. Briefings Bioinform. 14(3), 279–292 (2012)

    Article  Google Scholar 

  17. Fritzke, B.: Growing cell structures a self-organizing network for unsupervised and supervised learning. Neural Netw. 7(9), 1441–1461 (1994). https://doi.org/10.1016/0893-6080(94)90091-4

    Article  Google Scholar 

  18. Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, Heidelberg (2005)

    Google Scholar 

  19. Hausser, J., Strimmer, K.: Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. J. Mach. Learn. Res. 10, 1469–1484 (2009)

    MathSciNet  MATH  Google Scholar 

  20. Heather, J.M., Chain, B.: The sequence of sequencers: the history of sequencing DNA. Genomics 107, 1–8 (2016)

    Article  Google Scholar 

  21. Hou, J., Aerts, J., den Hamer, B., van Ijcken, W., et al.: Gene expression-based classification of non-small cell lung carcinomas and survival prediction. PLoS ONE 5(4), art. no. e10312 (2010)

    Google Scholar 

  22. Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K., Scherf, U., Speed, T.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Selected Works of Terry Speed, pp. 601–616 (2012). https://doi.org/10.1007/978-1-4614-1347-9_15

  23. Kaiser, S.: Biclustering: methods, software and application (2011)

    Google Scholar 

  24. Kanishcheva, O., Vysotska, V., Chyrun, L., Gozhyj, A.: Method of integration and content management of the information resources network. Adv. Intell. Syst. Comput. 689, 204–216 (2018). https://doi.org/10.1007/978-3-319-70581-1_14

    Article  Google Scholar 

  25. Kluger, Y., Basry, R., Chang, J.T., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Resourc. 12(4), 703–716 (2003)

    Article  Google Scholar 

  26. Kohane, I.S., Kho, A.T., Butte, A.J.: Microarrays for an Integrative Genomics, p. 236. A Bradford Book, The MIT Press, Cambridge (2003)

    Google Scholar 

  27. Lazaridis, E.N., Sinibaldi, D., Bloom, G., Mane, S., Jove, R.: A simple method to improve probe set estimates from oligonucleotide arrays. Math. Biosci. 176(1), 53–58 (2002). https://doi.org/10.1016/S0025-5564(01)00100-6

    Article  MathSciNet  MATH  Google Scholar 

  28. Lesage, R., Kerkhofs, J., Geris, L.: Computational modeling and reverse engineering to reveal dominant regulatory interactions controlling osteochondral differentiation: potential for regenerative medicine. Front. Bioeng. Biotechnol. 6, art. no. 165 (2018). https://doi.org/10.3389/fbioe.2018.00165

  29. Li, J., Reisner, J., Pham, H., Olafsson, S., Vardeman, S.: Biclustering with missing data. Inf. Sci. 510, 304–316 (2020). https://doi.org/10.1016/j.ins.2019.09.047

    Article  Google Scholar 

  30. Liu, Z.P.: Towards precise reconstruction of gene regulatory networks by data integration. Quant. Biol. 6(2), 113–128 (2018). https://doi.org/10.1007/s40484-018-0139-4

    Article  Google Scholar 

  31. Mishchuk, O., Tkachenko, R., Izonin, I.: Missing data imputation through SGTM neural-like structure for environmental monitoring tasks. Adv. Intell. Syst. Comput. 938, 142–151 (2020). https://doi.org/10.1007/978-3-030-16621-2_13

    Article  Google Scholar 

  32. Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S.: On biclustering of gene expression data. Curr. Bioinform. 5, 204–216 (2010)

    Article  Google Scholar 

  33. Naum, O., Chyrun, L., Vysotska, V., Kanishcheva, O.: Intellectual system design for content formation. In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, vol. 1, pp. 131–138. Institute of Electrical and Electronics Engineers Inc. (2017). https://doi.org/10.1109/STC-CSIT.2017.8098753

  34. Park, T., Yi, S.G., Kang, S.H., Lee, S.Y., Lee, Y.S., Simon, R.: Evaluation of normalization methods for microarray data. BMC Bioinform. 4, 13 (2003). https://doi.org/10.1186/1471-2105-4-33

    Article  Google Scholar 

  35. Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. J. Biomed. Inform. 57, 163–180 (2015)

    Article  Google Scholar 

  36. Raddatz, B.B., Spitzbarth, I., Matheis, K.A., Kalkuhl, A., Deschl, U., Baumgärtner, W., Ulrich, R.: Microarray-based gene expression analysis for veterinary pathologists: a review. Vet. Pathol. 54(5), 734–755 (2017). https://doi.org/10.1177/0300985817709887

    Article  Google Scholar 

  37. Schena, M., Davis, R.W.: Microarray Biochip Technology, pp. 1–18. Eaton Publishing (2000)

    Google Scholar 

  38. Tkachenko, R., Doroshenko, A., Izonin, I., Tsymbal, Y., Havrysh, B.: Imbalance data classification via neural-like structures of geometric transformations model: local and global approaches. Adv. Intell. Syst. Comput. 754, 112–122 (2019). https://doi.org/10.1007/978-3-319-91008-6_12

    Article  Google Scholar 

  39. Zadeh, L.: Fuzzy logic = computing with words. IEEE Trans. Fuzzy Syst. 4(2), 103–111 (1996). https://doi.org/10.1109/91.493904

  40. Zhao, Q., Xu, M., Fränti, P.: Sum-of-squares based cluster validity index and significance analysis. Lect. Notes Comput. Sci. (Lect. Notes Artif. Intell. Lect. Notes Bioinform.) 5495, 313–322 (2009). https://doi.org/10.1007/978-3-642-04921-7_32

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergii Babichev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Babichev, S., Khamula, O., Durnyak, B., Škvor, J. (2021). Technique of Gene Expression Profiles Selection Based on SOTA Clustering Algorithm Using Statistical Criteria and Shannon Entropy. In: Babichev, S., Lytvynenko, V., Wójcik, W., Vyshemyrskaya, S. (eds) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2020. Advances in Intelligent Systems and Computing, vol 1246. Springer, Cham. https://doi.org/10.1007/978-3-030-54215-3_2

Download citation

Publish with us

Policies and ethics