research-article

An effective filtering gene selection method for microarray data via shuffling and statistical analysis

Authors:
Zejin Jason Ding

Georgia State University, Atlanta, GA

Georgia State University, Atlanta, GA
View Profile

,
Yan-Qing Zhang

Georgia State University, Atlanta, GA

Georgia State University, Atlanta, GA
View Profile

BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational BiologyAugust 2010Pages 382–385https://doi.org/10.1145/1854776.1854835

Published:02 August 2010Publication History

BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology

Pages 382–385

ABSTRACT

Correlation-based filtering gene selection methods have been shown to be quite effective for microarray data analysis, and hundreds of methods have been proposed in literature. In this paper, we extend the correlation of between genes and sample statues in a broader way where the relation between a gene vector and the label vector is particularly unique such that the relation cannot be replicated by randomly shuffling the gene expression values or sample status data. A two-layer of statistical analysis is performed on the original microarrays and label-shuffled data to identify the important gene markers. We design a simple metric---the difference of signal-to-noise between positive and negative classes---that doesn't work well for directly selecting top informative genes (verifying with linear SVM classifier); however, after collecting and ranking the second-level significance values of every gene on the original and many shuffled microarray data, the top selected genes have shown much better classification performance. Results on several public microarray data have shown genes selected by our method could also lead to high leave-one-out prediction accuracy.

References

Dudoit, S., Fridlyand, J., and Speed, T. P. 2002. Comparison of Discrimination Methods for the Classification of Tumors using Gene Expression Data. J. Am. Stat. Assoc., 97, 77--87.Google ScholarCross Ref
Lee, Y. and Lee, C. K. 2003. Classification of Multiple Cancer Types by Multi-category Support Vector Machines using Gene Expression Data. Bioinformatics, 19(9), 1132--1139.Google ScholarCross Ref
Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data: A Survey. IEEE Trans. Knowledge and Data Eng., 16(11), 1370--1386. Google ScholarDigital Library
Ressom, H. W., Varghese, R. S., Zhang, Z., Xuan, J., and Clarke, R. 2008. Classification Algorithms for Phenotype Prediction in Genomics and Proteomics. Front Biosci., 13, 691--708.Google Scholar
Pirooznia, M, Yang, J. Y., Yang, M. Q., and Deng, Y. 2008. A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics, 9(S1), S13.Google ScholarCross Ref
Saeys, Y., Inza, I., and Larranaga, P. 2007. A Review of Feature Selection Techniques in Bioinformatics. Bioinformatics, 23(19), 2507--2517. Google ScholarDigital Library
Liu, H. and Yu, L. 2005. Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Trans. on Knowl. and Data Eng., 17(4), 491--502. Google ScholarDigital Library
Tang, Y., Zhang, Y.-Q., and Huang, Z. 2007. Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis. IEEE/ACM Trans. on Comp. Bio. and Bioinfo., 4(3), 365--381. DOI=http://dx.doi.org/10.1109/TCBB.2007.70224 Google ScholarDigital Library
Chang, C.-C., Lin, C.-J. 2001. LIBSVM: A Library for Support Vector Machines. (2001) Available: http://www.csie.ntu.edu.tw/~cjlin/libsvmGoogle Scholar
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. 2002. Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 46, 389--422. Google ScholarDigital Library

Index Terms

An effective filtering gene selection method for microarray data via shuffling and statistical analysis
1. Computing methodologies
  1. Machine learning

Recommendations

An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer

Objective: The type of data in microarray provides unprecedented amount of data. A typical microarray data of ovarian cancer consists of the expressions of tens of thousands of genes on a genomic scale, and there is no systematic procedure to analyze ...
Read More
A gene selection method for microarray data based on sampling
ICCCI'10: Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II

Microarray technology has become an important tool for biologists in recent years. It can obtain the expressions of a large amount of genes in a single experiment. One of the research issues of microarray is to select a set of relevant genes from a ...
Read More
Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique

A PSO-adaptive KNN based gene selection method is proposed to select useful genes.A heuristic for selecting the optimal values of K efficiently is also proposed.The proposed technique is applied on SRBCT, ALL_AML and MLL microarray datasets.The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
August 2010
705 pages
ISBN:9781450304382
DOI:10.1145/1854776
General Chairs:
Aidong Zhang
SUNY at Buffalo
,
Mark Borodovsky
Georgia Tech
,
Program Chairs:
Gultekin Ozsoyoglu
Case Western Reserve University
,
Armin Mikler
University of North Texas
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 August 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data shuffling
gene selection
microarray data
statistical analysis
support vector machine (SVM)
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of885submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 68
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An effective filtering gene selection method for microarray data via shuffling and statistical analysis

BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology

ABSTRACT

References

Cited By

Index Terms

Recommendations

An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer

A gene selection method for microarray data based on sampling

Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An effective filtering gene selection method for microarray data via shuffling and statistical analysis

BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology

ABSTRACT

References

Cited By

Index Terms

Recommendations

An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer

A gene selection method for microarray data based on sampling

Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media