Abstract:
human splicing branchpoints are functional elements of the alternative splicing, and the study on branchpoints can help to understand the mechanism of human pre-mRNA tran...Show MoreMetadata
Abstract:
human splicing branchpoints are functional elements of the alternative splicing, and the study on branchpoints can help to understand the mechanism of human pre-mRNA transcript. There are a large number of human splicing branchpoints, but the wet methods that identify branchpoints are labor-intensive and time-consuming. In this paper, we utilize machine learning techniques to build models for the human branchpoint prediction. Since an intron may have multiple branchpoints, we formulate the original problem as a multi-label learning task, which predicts branchpoint sites of introns based on the characteristics of introns. First of all, we extract a diversity of intron sequence-derived features, including sparse profile, dinucleotide profile, position weight matrix profile, Markov motif profile, and polypyrimidine tract profile. Then, taking into account efficiency and effectiveness, we adopt three methods: partial least squares regression, canonical correlation analysis and regularized canonical correlation analysis, to build multi-label prediction models from different angles, by using intron sequence-derived features. Finally, we adopt the average scoring ensemble strategy to integrate different models, and develop the ensemble model for the branchpoint prediction. Computational experiments demonstrate that the proposed method can produce satisfying results on the experimentally verified dataset, and outperform other state-of-the-art methods. We develop a user-friendly web server for the human splicing branchpoint prediction, available at http://121.42.59.182:8080.
Date of Conference: 15-18 December 2016
Date Added to IEEE Xplore: 19 January 2017
ISBN Information: