Abstract
In this paper, we have presented a results of the research concerning selection of informative genes based on the complex use of both statistical criteria and Shannon entropy. The main objective of the research is development a technique of selection of the groups of gene expression profiles which allow dividing correctly the investigated samples into previously known classes using results of both DNA micro array experiments or RNA molecules sequencing method. The DNA microchips of the patients which were investigated on lung cancer disease were used as the experimental data. At the first step, we have selected the informative genes in term of both the statistical criteria and Shannon entropy. The number of gene expression profiles was reduced at this step from 54675 to 21431. Then, we have performed the step by step clustering process using SOTA algorithm. The number of the obtained clusters was varied from 2 at the first clustering level to 512 at the ninth level. Finally, we have calculated the internal clustering quality criterion for investigated samples which are in each of the clusters. The less value of this criterion corresponds to higher separate ability of genes in this cluster. The proposed technique creates the conditions for development of both the diagnostic method and forecasting technique based on gene regulatory networks using results of both DNA microchip experiments or RNA molecules sequencing method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Affymetrix: Statistical algorithms description document. Affymetrix, Inc., Santa Clara, CA (2002)
Alexiou, A., Chatzichronis, S., Perveen, A., Hafeez, A., Ashraf, G.M.: Algorithmic and stochastic representations of gene regulatory networks and protein-protein interactions. Curr. Topics Med. Chem. 19(6), 413–425 (2019). https://doi.org/10.2174/1568026619666190311125256
Astrand, M.: Contrast normalization of oligonucleotide arrays. J. Comput. Biol. 10(1), 95–102 (2003). https://doi.org/10.1089/106652703763255697
Babichev, S.: An evaluation of the information technology of gene expression profiles processing stability for different levels of noise components. Data 3(4), art. no. 48 (2018). https://doi.org/10.3390/data3040048
Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., Senkivskyy, V.: Techniques of DNA microarray data pre-processing based on the complex use of bioconductor tools and shannon entropy. In: CEUR Workshop Proceedings, vol. 2353, pp. 365–377 (2019)
Babichev, S., Škvor, J., Fišer, J., Lytvynenko, V.: Technology of gene expression profiles filtering based on wavelet analysis. Int. J. Intell. Syst. Appl. 10(4), 1–7 (2018). https://doi.org/10.5815/ijisa.2018.04.01
Babichev, S., Lytvynenko, V., Skvor, J., Fiser, J.: Model of the objective clustering inductive technology of gene expression profiles based on SOTA and DBSCAN clustering algorithms. Adv. Intell. Syst. Comput. 689, 21–39 (2018). https://doi.org/10.1007/978-3-319-70581-1_2
Babichev, S., Barilla, J., Fišer, J., Škvor, J.: A hybrid model of gene expression profiles reducing based on the complex use of fuzzy inference system and clustering quality criteria. In: 2019 Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology (EUSFLAT 2019). Atlantis Press (2019/08). https://doi.org/10.2991/eusflat-19.2019.20
Barbara, D., Wu, X.: An approximate median polish algorithm for large multidimensional data sets. Springer-Verlag London Ltd. Knowl. Inf. Syst. 5, 416–438 (2003)
Bolstad, B.M., Irizarry, R.A., Åstrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003). https://doi.org/10.1093/bioinformatics/19.2.185
Byron, K., Wang, J.T.L.: A comparative review of recent bioinformatics tools for inferring gene regulatory networks using time-series expression data. Int. J. Data Mining Bioinform. 20(4), 320–340 (2018). https://doi.org/10.1504/IJDMB.2018.094889
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)
Chen, Y.J., Kodell, R., Sistare, F., Thompson, K.L., Morris, S., Chen, J.J.: Normalization methods for analysis of microarray gene-expression data. J. Biopharmaceutical Stat. 13(1), 57–74 (2003). https://doi.org/10.1081/BIP-120017726
Chen, Z., McGee, M., Liu, Q., Kong, M., Deng, Y., Scheuermann, R.H.: A distribution-free convolution model for background correction of oligonucleotide microarray data. BMC Genom. 10(1), 19 (2009). https://doi.org/10.1186/1471-2164-10-S1-S19
Dorazo, J., Carazo, J.M.: Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J. Mol. Evol. 44(2), 226–260 (1997). https://doi.org/10.1007/PL00006139
Eren, K., Deveci, M., Kucuktunc, O., Catalyurek, U.V.: A comparative analysis of biclustering algorithms for gene expression data. Briefings Bioinform. 14(3), 279–292 (2012)
Fritzke, B.: Growing cell structures a self-organizing network for unsupervised and supervised learning. Neural Netw. 7(9), 1441–1461 (1994). https://doi.org/10.1016/0893-6080(94)90091-4
Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, Heidelberg (2005)
Hausser, J., Strimmer, K.: Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. J. Mach. Learn. Res. 10, 1469–1484 (2009)
Heather, J.M., Chain, B.: The sequence of sequencers: the history of sequencing DNA. Genomics 107, 1–8 (2016)
Hou, J., Aerts, J., den Hamer, B., van Ijcken, W., et al.: Gene expression-based classification of non-small cell lung carcinomas and survival prediction. PLoS ONE 5(4), art. no. e10312 (2010)
Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K., Scherf, U., Speed, T.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Selected Works of Terry Speed, pp. 601–616 (2012). https://doi.org/10.1007/978-1-4614-1347-9_15
Kaiser, S.: Biclustering: methods, software and application (2011)
Kanishcheva, O., Vysotska, V., Chyrun, L., Gozhyj, A.: Method of integration and content management of the information resources network. Adv. Intell. Syst. Comput. 689, 204–216 (2018). https://doi.org/10.1007/978-3-319-70581-1_14
Kluger, Y., Basry, R., Chang, J.T., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Resourc. 12(4), 703–716 (2003)
Kohane, I.S., Kho, A.T., Butte, A.J.: Microarrays for an Integrative Genomics, p. 236. A Bradford Book, The MIT Press, Cambridge (2003)
Lazaridis, E.N., Sinibaldi, D., Bloom, G., Mane, S., Jove, R.: A simple method to improve probe set estimates from oligonucleotide arrays. Math. Biosci. 176(1), 53–58 (2002). https://doi.org/10.1016/S0025-5564(01)00100-6
Lesage, R., Kerkhofs, J., Geris, L.: Computational modeling and reverse engineering to reveal dominant regulatory interactions controlling osteochondral differentiation: potential for regenerative medicine. Front. Bioeng. Biotechnol. 6, art. no. 165 (2018). https://doi.org/10.3389/fbioe.2018.00165
Li, J., Reisner, J., Pham, H., Olafsson, S., Vardeman, S.: Biclustering with missing data. Inf. Sci. 510, 304–316 (2020). https://doi.org/10.1016/j.ins.2019.09.047
Liu, Z.P.: Towards precise reconstruction of gene regulatory networks by data integration. Quant. Biol. 6(2), 113–128 (2018). https://doi.org/10.1007/s40484-018-0139-4
Mishchuk, O., Tkachenko, R., Izonin, I.: Missing data imputation through SGTM neural-like structure for environmental monitoring tasks. Adv. Intell. Syst. Comput. 938, 142–151 (2020). https://doi.org/10.1007/978-3-030-16621-2_13
Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S.: On biclustering of gene expression data. Curr. Bioinform. 5, 204–216 (2010)
Naum, O., Chyrun, L., Vysotska, V., Kanishcheva, O.: Intellectual system design for content formation. In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, vol. 1, pp. 131–138. Institute of Electrical and Electronics Engineers Inc. (2017). https://doi.org/10.1109/STC-CSIT.2017.8098753
Park, T., Yi, S.G., Kang, S.H., Lee, S.Y., Lee, Y.S., Simon, R.: Evaluation of normalization methods for microarray data. BMC Bioinform. 4, 13 (2003). https://doi.org/10.1186/1471-2105-4-33
Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. J. Biomed. Inform. 57, 163–180 (2015)
Raddatz, B.B., Spitzbarth, I., Matheis, K.A., Kalkuhl, A., Deschl, U., Baumgärtner, W., Ulrich, R.: Microarray-based gene expression analysis for veterinary pathologists: a review. Vet. Pathol. 54(5), 734–755 (2017). https://doi.org/10.1177/0300985817709887
Schena, M., Davis, R.W.: Microarray Biochip Technology, pp. 1–18. Eaton Publishing (2000)
Tkachenko, R., Doroshenko, A., Izonin, I., Tsymbal, Y., Havrysh, B.: Imbalance data classification via neural-like structures of geometric transformations model: local and global approaches. Adv. Intell. Syst. Comput. 754, 112–122 (2019). https://doi.org/10.1007/978-3-319-91008-6_12
Zadeh, L.: Fuzzy logic = computing with words. IEEE Trans. Fuzzy Syst. 4(2), 103–111 (1996). https://doi.org/10.1109/91.493904
Zhao, Q., Xu, M., Fränti, P.: Sum-of-squares based cluster validity index and significance analysis. Lect. Notes Comput. Sci. (Lect. Notes Artif. Intell. Lect. Notes Bioinform.) 5495, 313–322 (2009). https://doi.org/10.1007/978-3-642-04921-7_32
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Babichev, S., Khamula, O., Durnyak, B., Škvor, J. (2021). Technique of Gene Expression Profiles Selection Based on SOTA Clustering Algorithm Using Statistical Criteria and Shannon Entropy. In: Babichev, S., Lytvynenko, V., Wójcik, W., Vyshemyrskaya, S. (eds) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2020. Advances in Intelligent Systems and Computing, vol 1246. Springer, Cham. https://doi.org/10.1007/978-3-030-54215-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-54215-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54214-6
Online ISBN: 978-3-030-54215-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)