Release from the Curse of High Dimensional Data Analysis

Shinmura, Shuichi

doi:10.1007/978-3-030-24405-7_12

Shuichi Shinmura³

Part of the book series: Studies in Computational Intelligence ((SCI,volume 844))

Included in the following conference series:

3rd IEEE/ACIS International Conference on Big Data, Cloud Computing, and Data Science Engineering

1864 Accesses
9 Citations

Abstract

Golub et al. started their research to find oncogenes and new cancer subclasses from microarray around 1970. They opened their microarray on the Internet. The other five medical projects published their papers and released their microarrays, also. However, because Japanese cancer specialist advised us that NIH decided that these researches were useless after 2004, we guess medical groups abandoned these researches. Although we are looking for NIH’s report, we cannot find it now. Meanwhile, many researchers of statistics, machine learning and bioengineering continue to research as a new theme of high-dimensional data analysis using microarrays. However, they could not succeed in cancer gene analysis as same as medical researches (Problem5). We discriminated six microarrays by Revised IP-OLDF (RIP) and solved Problem5 within 54 days until December 20, 2015. We obtained the two surprising results. First, MNMs of six microarrays are zero (Fact3). Second, RIP could decompose microarray into many linearly separable gene subspaces (SMs) and noise subspace (Fact4). These two new facts indicate that we are free from the curse of high dimensional microarray data and complete the cancer gene analysis. Because all SMs are LSD and small samples, we thought to analysis all SMs by statistical methods and obtained useful results. However, we were disappointed that statistical methods do not show linearly separable facts and are useless for cancer gene diagnosis (Problem6). After trial and error, we make signal data made by RIP discriminant scores (RipDSs) from SM. Through this breakthrough, we find useful information by correlation analysis, cluster analysis, and PCA in addition to RIP, Revised LP-OLDF and hard margin SVM (H-SVM). We think that the discovery of the above two new facts is the essence of Problem5. Moreover, we claim to solve Prpblem6 and obtain useful medical care information from signal data as a cancer gene diagnosis. However, our claim needs validation by medical specialists. In this research, we introduce the reason why no researchers could succeed in the cancer gene diagnosis by microarrays from 1970.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of cancer and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)
Article Google Scholar
Aoshima, M., Yata, K.: Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models. Ann. Inst. Stat. Math. 71, 473–503 (2019)
Article MathSciNet Google Scholar
Aoshima, M., Yata, K.: High-dimensional quadratic classifiers in non-sparse settings. Methodol. Comput. Appl. Probabil. (in press, 2019)
Google Scholar
Brahim, A.B., Lima, M: Hybrid instance based feature selection algorithms for cancer diagnosis. Pattern Recognition Letters, pp. 8. 2014
Google Scholar
Buhlmann, P., Geer, A.B.: Statistics for high-dimensional data-method, theory, and applications. Springer, Berlin (2011)
Book Google Scholar
Charikar, M., Guruswami, V., Kumar, R., Rajagopalan, S., Sahai, A.: Combinatorial feature selection problems. IEEE Xplore, pp. 631–640 (2000)
Google Scholar
Chiaretti, S. et al.: Gene Expression Profile of Adult T-cell Acute Lymphocytic Leukemia Identifies Distinct Subsets of Patients with Different Response to Therapy And Survival. Blood. April 1, 2004, 103/7, pp. 2771–2778 (2004)
Article Google Scholar
Cilia, N.D., Claudio, D.S., Francesco, F., Stefano, R., Alessandra, S.F.: An experimental comparison of feature-selection and classification methods for microarray datasets. Information 10(109), 1–13 (2019)
Google Scholar
Cox, D.R.: The regression analysis of binary sequences (with discussion). J. Roy Stat. Soc. B 20 215–242 (1958)
MathSciNet MATH Google Scholar
Diao, G., Vidyashankar, A.N.: Assessing genome-wide statistical significance for large p small n problems. Genetics 194, 781–783 (2013)
Article Google Scholar
Firth, D.: “Bias reduction of maximum likelihood estimates. Biometrika 80, 27–39 (1993)
Article MathSciNet Google Scholar
Fisher, R.A.: Statistical Methods and Statistical Inference. Hafner Publishing Co., New Zealand (1956)
Google Scholar
Flury, B., Riedwyl, H.: Multivariate Statistics: A Practical Approach. Cambridge University Press, New York (1988)
Book Google Scholar
Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)
Article MathSciNet Google Scholar
Golub, T.R. et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999 Oct 15, 286/5439, 531–537 (1999)
Article Google Scholar
Goodnight, J.H.: SAS Technical Report – The Sweep Operator: Its Importance in Statistical Computing – R (100). SAS Institute Inc. USA (1978)
Google Scholar
Jeffery, I.B., Higgins, D.G., Culhane, C: Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformat. (2006)
Google Scholar
Lachenbruch, P.A., Mickey, M.R.: Estimation of error rates in discriminant analysis. Technometrics 10(1), 11 (1968)
Article MathSciNet Google Scholar
Miyake, A., Shinmura, S.: Error rate of linear discriminant function. In: Dombal, F.T., Gremy, F. (ed.) North-Holland Publishing Company. The Netherland, pp. 435–445 (1976)
Google Scholar
Miyake, A., Shinmura, S.: An algorithm for the optimal linear discriminant function and its application. Jpn Soc. Med. Electron Bio. Eng. 1815, 452–454 (1980)
Google Scholar
Sall, J.P., Creighton, L., Lehman, A.: JMP Start Statistics, Third Edition. SAS Institute Inc. 2004. (S. Shinmura, Supervise Japanese Version)
Google Scholar
Schrage, L.: Optimization Modeling with LINGO. LINDO Systems Inc. (2006)
Google Scholar
Shinmura, S.: Optimal Linear Discriminant Functions Using Mathematical Programming. Dissertation, Okayama University, Japan, pp. 1–101 (2000)
Google Scholar
Shinmura, S.: A new algorithm of the linear discriminant function using integer programming. New Trends Probab. Stat. 5, 133–142 (2000)
MathSciNet MATH Google Scholar
S. Shinmura, The optimal linear discriminant function, Union of Japanese Scientist and Engineer Publishing, Japan (ISBN 978-4-8171-9364-3), 2010
Google Scholar
Shinmura, S.: Problem of discriminant analysis by mark sense test data. Japanese Soc. Appl. Stat. 4012, 157–172 (2011)
Article Google Scholar
Shinmura, S.: End of Discriminant Functions based on Variance-Covariance Matrices. ICORES, pp. 5–16 (2014)
Google Scholar
Shinmura, S.: Four Serious Problems and New Facts of the Discriminant Analysis. In: Pinson, E., et al. (eds.) Operations Research and Enterprise Systems, pp. 15–30. Springer, Berlin (2015)
Chapter Google Scholar
Shinmura, S.: New Theory of Discriminant Analysis after R. Springer, Fisher (2016)
Book Google Scholar
Shinmura, S.: Cancer Gene Analysis to Cancer Gene Diagnosis, Amazon (2017)
Google Scholar
Shinmura, S.: Cancer Gene Analysis by Singh et al. Microarray Data. ISI2017, pp. 1–6 (2017)
Google Scholar
Shinmura, S.: Cancer Gene Analysis of Microarray Data. BCD18, pp. 1–6 (2018)
Google Scholar
Shinmura, S.: First Success of Cancer Gene Analysis by Microarrays, pp. 1–7. Biocomp’18 (2018)
Google Scholar
Shinmura, S.: High-Dimensional Microarray Data Analysis. Springer (2019)
Google Scholar
Shinmura, S.: High-dimensional microarray data analysis—first success of cancer gene analysis and cancer gene diagnosis. August ISI2019, in Press (2019)
Google Scholar
Shipp, M.A., et al.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8, 68–74 (2002)
Article Google Scholar
Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203–209
Article Google Scholar
Stam, A.: Non-traditional approaches to statistical classifications: some perspectives on Lp-norm methods. Ann. Oper. Res. 74, 1–36 (1997)
Article MathSciNet Google Scholar
Tian, E., et al.: The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma. New Eng. J. Med. 349(26), 2483–2494 (2003)
Article Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory.Springer. 1999
Google Scholar

Download references

Author information

Authors and Affiliations

Emeritus Seikei University, 1-8-7-301 Sakasai Kashiwa City, Chiba, 277-0042, Japan
Shuichi Shinmura

Authors

Shuichi Shinmura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuichi Shinmura .

Editor information

Editors and Affiliations

Software Engineering and Information Technology Institute, Central Michigan University, Mount Pleasant, MI, USA
Roger Lee

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shinmura, S. (2020). Release from the Curse of High Dimensional Data Analysis. In: Lee, R. (eds) Big Data, Cloud Computing, and Data Science Engineering. BCD 2019. Studies in Computational Intelligence, vol 844. Springer, Cham. https://doi.org/10.1007/978-3-030-24405-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-24405-7_12
Published: 31 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24404-0
Online ISBN: 978-3-030-24405-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics