Abstract:
Accurate identification of functionally relevant variants against the ubiquitous background genetic variations is a significant challenge facing bioinformatics researcher...Show MoreMetadata
Abstract:
Accurate identification of functionally relevant variants against the ubiquitous background genetic variations is a significant challenge facing bioinformatics researchers and the challenge becomes more severe for non-coding variants. In this study, a novel computational method to identify candidate disease-associated non-coding single nucleotide polymorphisms (SNPs) of human genome is presented. To characterize SNPs, an extensive range of features, such as sequence context, DNA structure, evolutionary conservation and histone modification signals etc. are extracted. Then random forest is adopted to build the classifier model together with an ensemble method to deal with unbalanced data. 10-fold cross-validation result shows that the proposed method can achieve accuracy with the area under ROC curve (AUC) of 0.74. All the original data and the source matlab codes involved are available at https://sourceforge.net/projects/dissnp-predict/.
Published in: 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS)
Date of Conference: 24-26 May 2017
Date Added to IEEE Xplore: 29 June 2017
ISBN Information: