skip to main content
10.1145/3611450.3611470acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesai2aConference Proceedingsconference-collections
research-article

Screening rules and information criteria-based analysis of gene expression data

Published: 20 August 2023 Publication History

Abstract

High-dimensional data is becoming increasingly common, and the biomedical field is no exception with the rapid development of technology. There are various methods to deal with high-dimensional gene expression data, but all of them have some shortcomings. In this paper, we address the theory and application of feature screening in ultra-high-dimensional discriminative classification data, with the aim of reducing ultra-high-dimensional data to a size appropriate for the originally proposed sample size, while retaining all important variables. To this end, we propose a variable screening method that sure independence screening methods in conjunction with EBIC information criteria, which can effectively reduce data dimensionality while improving computational efficiency and helping to discover the most informative variables relevant to the target. In this paper, a random simulation sampling method was first used to select parameters and filter variables using randomly sampled data, and the correct selection rate and correct fit rate of the simulation results were higher than those of other approaches, which verified the reliability of the method used. Finally, four sets of real gene expression data were used to further validate the effectiveness of the method in selecting gene expression data features.

References

[1]
Fan J and Lv J. 2008. Sure Independence Screening for Ultrahigh Dimensional Feature Space. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 70(5), 849-911. Retrieved February 12, 2020 from www.jstor.org/stable/20203862
[2]
Hall P and Miller H. 2009. Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems. Journal of Computational and Graphical Statistics, 18(3), 533-550. https://doi.org/10.1198/jcgs.2009.08041
[3]
Fan J, Samworth R and Wu Y. 2009. Ultrahigh Dimensional Feature Selection: Beyond The Linear Model. Journal of Machine Learning Research, 10(5), 2013-2038. https:// .ncbi.nlm.nih.gov/21603590/
[4]
Fan J and Song R. 2010. Sure independence screening in generalized linear models with NP-dimensionality. Annals of Statistics, 38(6), 3567-3604. https://doi.org/10.1214/10-AOS798
[5]
Gaorong Li, Heng Peng, Jun Zhang and Lixing Zhu.2012. ROBUST RANK CORRELATION BASED SCREENING. The Annals of Statistics,40(3), 1846-1887. https://www.jstor.org/stable/41713696
[6]
Zhu L, Li L, Li R and Zhu L. 2011. Modle-Free Feature Screening for Ultrahigh Dimensioonal Data. Journal of the American Statistical Association,106(496), 1464-1475. https://doi.org/10.1198/jasa.2011.tm10563
[7]
Li R, Zhong W and Zhu L. 2012. Feature Screening via Distance Correlation Learning. JASA: Journal of the American Statistical Association. 107(499), 1129-1139. http://
[8]
Cui H, Li R and Zhong W. 2015. Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis. Journal of the American Statistical Association, 110(510), 630-641. http://
[9]
Fan J and Li R. 2001. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Publications of the American Statistical Association, 96(456), 1348-1360. https://www.jstor.org/stable/3085904
[10]
Hoerl A E and Kennard R W. 1970. Ridge Regression: Applications to Nonorthogonal Problems. Technometrics, 12(1), 69–82. https://doi.org/10.2307/1267352
[11]
Frank I E and Friedman J H. 1993. A statistical view of some chemometrics regression tools. (With discussion). Technometrics, 35(2), 109-135. https://doi.org/10.2307/1269656
[12]
Tibshirani R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58(1), 267–88. http://www.jstor.org/stable/2346178
[13]
Li Yang, Xu Wenfu and Ma Shuangqi. 2018. Research on robust sparse grouping variable selection method for pollution data. Statistics and Information Forum, 33(06), 26-34
[14]
Liu and Jang Wen. 1980. The Akazuchi information criterion AIC and its significance. Practice and understanding of mathematics, 1980(03), 64-72
[15]
WANG H, LI R and TSAI CL. 2007. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94(3), 553-568. http://www.jstor.org/stable/20441396
[16]
Gene H, Golub, Michael Heath and Grace Wahba. 2012. Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics, 21(2), 215-223. https://doi.org/10.2307/1268518
[17]
Jiahua Chen and Zehua Chen. 2008. Extended Bayesian Information Criteria for Model Selection with Large Model Spaces. Biometrika, 95(3), 759-771. https://www.jstor.org/stable/20441500

Index Terms

  1. Screening rules and information criteria-based analysis of gene expression data
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Other conferences
            AI2A '23: Proceedings of the 2023 3rd International Conference on Artificial Intelligence, Automation and Algorithms
            July 2023
            199 pages
            ISBN:9798400707605
            DOI:10.1145/3611450
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 20 August 2023

            Permissions

            Request permissions for this article.

            Check for updates

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Conference

            AI2A '23

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 24
              Total Downloads
            • Downloads (Last 12 months)6
            • Downloads (Last 6 weeks)3
            Reflects downloads up to 05 Mar 2025

            Other Metrics

            Citations

            View Options

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media