The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected VoR was published on August 30, 2022. For reference purposes the VoR may still be accessed via the Supplemental Material section on this page.

Abstract

Functional annotation of human genes is fundamentally important for understanding the molecular basis of various genetic diseases. A major challenge in determining the functions of human genes lies in the functional diversity of proteins, that is, a gene can perform different functions as it may consist of multiple protein coding isoforms (PCIs). Therefore, differentiating functions of PCIs can significantly deepen our understanding of the functions of genes. However, due to the lack of isoform-level gold-standards (ground-truth annotation), many existing functional annotation approaches are developed at gene-level. In this paper, we propose a novel approach to differentiate the functions of PCIs by integrating sparse simplex projection---that is, a nonconvex sparsity-inducing regularizer---with the framework of multi-instance learning (MIL). Specifically, we label the genes that are annotated to the function under consideration as positive bags and the genes without the function as negative bags. Then, by sparse projections onto simplex, we learn a mapping that embeds the original bag space to a discriminative feature space. Our framework is flexible to incorporate various smooth and non-smooth loss functions such as logistic loss and hinge loss. To solve the resulting highly nontrivial non-convex and non-smooth optimization problem, we further develop an efficient block coordinate descent algorithm. Extensive experiments on human genome data demonstrate that the proposed approaches significantly outperform the state-of-the-art methods in terms of functional annotation accuracy of human PCIs and efficiency.

Supplementary Material

3097984-VoR (3097984-vor.pdf)

Version of Record for "Functional Annotation of Human Protein Coding Isoforms via Non-convex Multi-Instance Learning" by Luo et al., Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '17).

Download
1.99 MB

MP4 File (luo_human_protein_coding.mp4)

Download
386.60 MB

References

[1]

Robert A Amar, Daniel R Dooly, Sally A Goldman, and Qi Zhang. 2001. Multiple-instance learning of real-valued data. In International Conference on Machine learning. 3--10.

Editorial Notes

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Multiple instance learning with bag dissimilarities

Multiple instance learning

Online MIL tracking with instance-level semi-supervised learning

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations