ABSTRACT
In cancer genomics, next generation sequencing data are usually used to detect somatic driver mutations for identifying the cause of tumorigenesis. However, non-driver somatic mutations, or passenger mutations, can also play an important role in cancer cell survival and treatment by generating aberrant short peptide sequences as known as "neoantigens". Tumor-specific mutations form novel immunogenic peptides called neoantigens, which could be important to the recent promising outcomes of cancer immunotherapy, including immune checkpoint blockade. Many studies have tried to identify various sequence characteristics for prediction of immunogenicity; however, practical applications rely on a single predicted value (MHC-I binding affinity) with an arbitrary cut-off. Here, we developed Neopepsee, a method that applies a machine learning to predict personal neoantigen with next generation sequencing data. Neopepsee not only automates the entire computational procedure for immunogenicity prediction from raw data but also improves accuracy by harnessing 10 different features for classification, including conventional MHC-I and T-cell receptor binding affinity and amino acid characteristics (e.g., hydrophobicity, polarity and charge). Additionally, we found that protein sequence similarity to known pathogenic epitopes is a novel strong feature for classification. Tests with validated epitope datasets and independently proven neoantigens confirmed the remarkable improvement in accuracy. Application of Neopepsee to 224 public stomach adenocarcinoma data predicted neoantigens, whose burden is strongly correlated with patient prognosis. By providing a convenient platform with better accuracy, Neopepsee will be of many uses in cancer immunotherapy research, such as in developing predictive biomarkers and in designing personalized cancer vaccines.
Index Terms
- Neopepsee: Accurate Genome-level Prediction of Neoantigens by Harnessing Sequence and Amino Acid Immunogenicity Information
Recommendations
Designing a vaccine-based therapy against Epstein-Barr virus-associated tumors using immunoinformatics approach
AbstractEpstein-Barr virus (EBV) is widely known due to its role in the etiology of infectious mononucleosis. However, it is the first oncovirus that was identified and has been implicated in the etiology of several types of cancers. Globally, EBV ...
Highlights- Epstein-Barr virus is the first isolated tumor virus but therapies against its associated tumors are currently lacking.
- Non-structural proteins of the virus were explored to develop a multi-epitope vaccine.
- The vaccine showed the ...
Enabling large-scale next-generation sequence assembly with Blacklight
A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic ...
Brief communication: Computational analyses of mammalian lactate dehydrogenases: Human, mouse, opossum and platypus LDHs
Computational methods were used to predict the amino acid sequences and gene locations for mammalian lactate dehydrogenase (LDH) genes and proteins using genome sequence databanks. Human LDHA, LDHC and LDH6A genes were located in tandem on chromosome 11,...
Comments