Skip to main content

Abstract

Technology of high dimensional data features objective clustering based on the methods of complex systems inductive modeling is presented in the paper. Architecture of the objective clustering inductive technology as a block diagram of step-by-step implementation of the objects clustering procedure was developed. Method of criterial evaluation of complex data clustering results using two equal power data subsets is proposed. Degree of clustering objectivity evaluates on the basis of complex use of internal and external criteria. Researches on the simulation results of the proposed technology based on the SOTA self-organizing clustering algorithm using the gene expression data obtained by DNA microarray analysis of patients with lung cancer GEOD-68571 Array Express database, the datasets “Compound” and “Aggregation” of the Computing School of the Eastern Finland University and the data “seeds” are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Machine learning school of computing university of eastern finland. Clustering datasets. https://cs.joensuu.fi/sipu/datasets/

  2. Babichev, S.A., Kornelyuk, A.I., Lytvynenko, V.I., Osypenko, V.: Computational analysis of microarray gene expression profiles of lung cancer. Biopolymers Cell 32(1), 70–79 (2016). http://biopolymers.org.ua/content/32/1/070/

    Article  Google Scholar 

  3. Babichev, S., Taif, M.A., Lytvynenko, V.: Filtration of dna nucleotide gene expression profiles in the systems of biological objects clustering. Int. Front. Sci. Lett. 8, 1–8 (2016). https://www.scipress.com/IFSL.8.1

    Article  Google Scholar 

  4. Babichev, S., Taif, M.A., Lytvynenko, V.: Inductive model of data clustering based on the agglomerative hierarchical algorithm. In: Proceeding of the 2016 IEEE First International Conference on Data Stream Mining and Processing (DSMP), pp. 19–22 (2016). http://ieeexplore.ieee.org/document/7583499/

  5. Beer, D.G., Kardia, S.L., et al.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8(8), 816–824 (2002). http://www.nature.com/nm/journal/v8/n8/full/nm733.html

    Google Scholar 

  6. Eren, K., Deveci, M., Kucuktunc, O., Catalyurek, U.V.: A comparative analysis of biclustering algorithms for gene expression data. Briefings Bioinform. 14(3), 279–292 (2012). https://doi.org/10.1093/bib/bbs032

    Article  Google Scholar 

  7. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking methods: Part 2. ACM SIGMOD Rec. 31(3), 19–27 (2002). https://www.researchgate.net/publication/2533655_Clustering_Validity_Checking_Methods_Part_II

    Article  MATH  Google Scholar 

  8. Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: finding the optimal partitioning of a data set, pp. 187–194 (2001). http://ieeexplore.ieee.org/document/989517/?reload=true&arnumber=989517

  9. Ivakhnenko, A.: Group method of data handling as competitor to the method of stochastic approximation. Sov. Autom. Control 3, 64–78 (1968)

    Google Scholar 

  10. Kaiser, S.: Biclustering: methods, software and application (2011). https://edoc.ub.uni-muenchen.de/13073/

  11. Kluger, Y., Basry, R., Chang, J., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13(4), 703–716 (1985). http://genome.cshlp.org/content/13/4/703.abstract

    Article  Google Scholar 

  12. Krzanowski, W., Lai, Y.: A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics 44(1), 23–34 (1985). https://www.jstor.org/stable/2531893?seq=1#page_scan_tab_contents

    Article  MathSciNet  MATH  Google Scholar 

  13. Kulczycki, P., Kowalski, P.A., Lukasik, S., Zak, S.: Seeds data set. http://archive.ics.uci.edu/ml/datasets/seeds

  14. Madala, H., Ivakhnenko, A.: Inductive Learning Algorithms for Complex Systems Modeling, pp. 26–51. CRC Press (1994). http://www.gmdh.net/articles/theory/ch2.pdf

  15. Milligan, G., Cooper, M.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985). http://link.springer.com/article/10.1007/BF02294245

    Article  Google Scholar 

  16. Osypenko, V.V., Reshetjuk, V.M.: The methodology of inductive system analysis as a tool of engineering researches analytical planning. Agric. Forest Eng. 58, 67–71 (2011). http://annals-wuls.sggw.pl/?q=node/234

    Google Scholar 

  17. Pontes, B., Giraldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. J. Biomed. Inf. 57, 163–180 (2015). https://www.ncbi.nlm.nih.gov/pubmed/26160444

    Article  Google Scholar 

  18. Sarycheva, L.: Objective cluster analysis of data based on the group method of data handling. Probl. Control Automatics 2, 86–104 (2008)

    MathSciNet  Google Scholar 

  19. Still, S., Bialek, W.: How many clusters? An information theoretic perspective. Neural Comput. 16(12), 2483–2506 (2004). http://www.mitpressjournals.org/doi/abs/10.1162/0899766042321751#.WJst02_hCUl

    Article  MATH  Google Scholar 

  20. Xie, X., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991). http://dl.acm.org/citation.cfm?id=117682

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergii Babichev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Babichev, S., Lytvynenko, V., Korobchynskyi, M., Taiff, M.A. (2017). Objective Clustering Inductive Technology of Gene Expression Sequences Features. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation. BDAS 2017. Communications in Computer and Information Science, vol 716. Springer, Cham. https://doi.org/10.1007/978-3-319-58274-0_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58274-0_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58273-3

  • Online ISBN: 978-3-319-58274-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics