Abstract
Technology of high dimensional data features objective clustering based on the methods of complex systems inductive modeling is presented in the paper. Architecture of the objective clustering inductive technology as a block diagram of step-by-step implementation of the objects clustering procedure was developed. Method of criterial evaluation of complex data clustering results using two equal power data subsets is proposed. Degree of clustering objectivity evaluates on the basis of complex use of internal and external criteria. Researches on the simulation results of the proposed technology based on the SOTA self-organizing clustering algorithm using the gene expression data obtained by DNA microarray analysis of patients with lung cancer GEOD-68571 Array Express database, the datasets “Compound” and “Aggregation” of the Computing School of the Eastern Finland University and the data “seeds” are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Machine learning school of computing university of eastern finland. Clustering datasets. https://cs.joensuu.fi/sipu/datasets/
Babichev, S.A., Kornelyuk, A.I., Lytvynenko, V.I., Osypenko, V.: Computational analysis of microarray gene expression profiles of lung cancer. Biopolymers Cell 32(1), 70–79 (2016). http://biopolymers.org.ua/content/32/1/070/
Babichev, S., Taif, M.A., Lytvynenko, V.: Filtration of dna nucleotide gene expression profiles in the systems of biological objects clustering. Int. Front. Sci. Lett. 8, 1–8 (2016). https://www.scipress.com/IFSL.8.1
Babichev, S., Taif, M.A., Lytvynenko, V.: Inductive model of data clustering based on the agglomerative hierarchical algorithm. In: Proceeding of the 2016 IEEE First International Conference on Data Stream Mining and Processing (DSMP), pp. 19–22 (2016). http://ieeexplore.ieee.org/document/7583499/
Beer, D.G., Kardia, S.L., et al.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8(8), 816–824 (2002). http://www.nature.com/nm/journal/v8/n8/full/nm733.html
Eren, K., Deveci, M., Kucuktunc, O., Catalyurek, U.V.: A comparative analysis of biclustering algorithms for gene expression data. Briefings Bioinform. 14(3), 279–292 (2012). https://doi.org/10.1093/bib/bbs032
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking methods: Part 2. ACM SIGMOD Rec. 31(3), 19–27 (2002). https://www.researchgate.net/publication/2533655_Clustering_Validity_Checking_Methods_Part_II
Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: finding the optimal partitioning of a data set, pp. 187–194 (2001). http://ieeexplore.ieee.org/document/989517/?reload=true&arnumber=989517
Ivakhnenko, A.: Group method of data handling as competitor to the method of stochastic approximation. Sov. Autom. Control 3, 64–78 (1968)
Kaiser, S.: Biclustering: methods, software and application (2011). https://edoc.ub.uni-muenchen.de/13073/
Kluger, Y., Basry, R., Chang, J., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13(4), 703–716 (1985). http://genome.cshlp.org/content/13/4/703.abstract
Krzanowski, W., Lai, Y.: A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics 44(1), 23–34 (1985). https://www.jstor.org/stable/2531893?seq=1#page_scan_tab_contents
Kulczycki, P., Kowalski, P.A., Lukasik, S., Zak, S.: Seeds data set. http://archive.ics.uci.edu/ml/datasets/seeds
Madala, H., Ivakhnenko, A.: Inductive Learning Algorithms for Complex Systems Modeling, pp. 26–51. CRC Press (1994). http://www.gmdh.net/articles/theory/ch2.pdf
Milligan, G., Cooper, M.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985). http://link.springer.com/article/10.1007/BF02294245
Osypenko, V.V., Reshetjuk, V.M.: The methodology of inductive system analysis as a tool of engineering researches analytical planning. Agric. Forest Eng. 58, 67–71 (2011). http://annals-wuls.sggw.pl/?q=node/234
Pontes, B., Giraldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. J. Biomed. Inf. 57, 163–180 (2015). https://www.ncbi.nlm.nih.gov/pubmed/26160444
Sarycheva, L.: Objective cluster analysis of data based on the group method of data handling. Probl. Control Automatics 2, 86–104 (2008)
Still, S., Bialek, W.: How many clusters? An information theoretic perspective. Neural Comput. 16(12), 2483–2506 (2004). http://www.mitpressjournals.org/doi/abs/10.1162/0899766042321751#.WJst02_hCUl
Xie, X., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991). http://dl.acm.org/citation.cfm?id=117682
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Babichev, S., Lytvynenko, V., Korobchynskyi, M., Taiff, M.A. (2017). Objective Clustering Inductive Technology of Gene Expression Sequences Features. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation. BDAS 2017. Communications in Computer and Information Science, vol 716. Springer, Cham. https://doi.org/10.1007/978-3-319-58274-0_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-58274-0_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58273-3
Online ISBN: 978-3-319-58274-0
eBook Packages: Computer ScienceComputer Science (R0)