Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices

Abstract

De novo peptide sequencing is the key technology for finding novel peptides from mass spectra. The overall quality of sequencing results depends on the de novo peptide sequencing algorithm as well as the quality of mass spectra. Over the past decade, the resolution and accuracy of mass spectrometers have improved by orders of magnitude and higher-resolution mass spectra have been generated. How to effectively take advantage of those high-resolution data without substantially increasing the computational complexity remains a challenge for de novo peptide sequencing tools. Here we present PointNovo, a neural network-based de novo peptide sequencing model that can robustly handle any resolution levels of mass spectrometry data while keeping the computational complexity unchanged. Our extensive experiment results show PointNovo outperforms existing de novo peptide sequencing tools by capitalizing on the ultra-high resolution of the latest mass spectrometers.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The source data for all experiments reported by this paper are accessible through the following link: https://zenodo.org/record/3998873 (ref. 30).

Code availability

The source code of PointNovo is in this github repo: https://github.com/volpato30/PointNovo (ref. 31).

References

  1. Tran, N. H. et al. Complete de novo assembly of monoclonal antibody sequences. Sci. Rep. 6, 1–10 (2016).

    Article  Google Scholar 

  2. Faridi, P. et al. A subset of HLA-I peptides are not genomically templated: evidence for cis- and trans-spliced peptide ligands. Sci Immunol. 3, eaar3947 (2018).

    Article  Google Scholar 

  3. Laumont, C. M. et al. Noncoding regions are the main source of targetable tumor-specific antigens. Sci. Transl. Med. 10, eaau5516 (2018).

    Article  Google Scholar 

  4. Ma, B. Novor: real-time peptide de novo sequencing software. J. Am. Soc. Mass. Spectrom. 26, 1885–1894 (2015).

    Article  Google Scholar 

  5. Tran, H., Zhang, X., Xin, L., Shan, B. & Li, M. De novo peptide sequencing by deep learning. Proc. Natl Acad. Sci. USA 114, 8247–8252 (2017).

    Article  Google Scholar 

  6. Dancík, V., Addona, T. A., Clauser, K. R., Vath, J. E. & Pevzner, P. A. De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 6, 327–342 (1999).

    Article  Google Scholar 

  7. Chen, T., Kao, M. Y., Tepel, M., Rush, J. & Church, G. M. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 8, 325–337 (2001).

    Article  Google Scholar 

  8. Frank, A. & Pevzner, P. PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77, 964–973 (2005).

    Article  Google Scholar 

  9. Ma, B. et al. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17, 2337–2342 (2003).

    Article  Google Scholar 

  10. Ma, B., Zhang, K. & Liang, C. An effective algorithm for peptide de novo sequencing from MS/MS spectra. J. Comput. Syst. Sci. 70, 418–430 (2005).

    Article  MathSciNet  Google Scholar 

  11. Karunratanakul, K., Tang, H., Speicher, D. W., Chuangsuwanich, E. & Sriswasdi, S. Uncovering thousands of new peptides with sequence-mask-search hybrid de novo peptide sequencing framework. Mol. Cell. Proteom. 18, 2478–2491 (2019).

    Article  Google Scholar 

  12. Qi C. R., Su H., Mo K. & Guibas L. J. PointNet: deep learning on point sets for 3D classification and segmentation. In Proc. IEEE Conference On Computer Vision and Pattern Recognition 652–660 (IEEE, 2016).

  13. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Article  Google Scholar 

  14. Tran N. H. et al. Identifying neoantigens for cancer vaccines by personalized deep learning of individual immunopeptidomes. Nat. Mach. Intell. 2, 764–771 (2019).

  15. Yang, H., Chi, H., Zeng, W., Zhou, W. & He, S. pNovo3: precise de novo peptide sequencing using a learning-to-rank framework. Bioinformatics. 35, i183–i190 (2019).

    Article  Google Scholar 

  16. Zhu, Y. et al. Spatially resolved proteome mapping of laser capture microdissected tissue with automated sample transfer to nanodroplets. Mol. Cell. Proteom. 17, 1843–1874 (2018).

    Article  Google Scholar 

  17. Shears, M. J. et al. Proteomic analysis of plasmodium merosomes: the link between liver and blood stages in malaria. J Proteome Res. 18, 3404–3418 (2019).

    Article  Google Scholar 

  18. Sobolesky, P. et al. Proteomic analysis of non-depleted serum proteins from bottlenose dolphins uncovers a high vanin-1 phenotype. Sci. Rep. 6, 33879 (2016).

    Article  Google Scholar 

  19. Benitez-Amaro, A. et al. Molecular basis for the protective effects of low-density lipoprotein receptor-related protein 1 (LRP1)-derived peptides against LDL aggregation. Biochim. Biophys. Acta Biomembr. 1861, 1302–1316 (2019).

    Article  Google Scholar 

  20. Sim, S. Y. et al. In-depth proteomic analysis of human bronchoalveolar lavage fluid toward the biomarker discovery for lung cancers. Proteom. Clin. Appl. 13, 1900028 (2019).

    Article  Google Scholar 

  21. Haythorne, E. et al. Diabetes causes marked inhibition of mitochondrial metabolism in pancreatic β-cells. Nat. Commun. 10, 2474 (2019).

    Article  Google Scholar 

  22. Tran, N. H. et al. Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry. Nat. Methods 16, 63–66 (2019).

    Article  Google Scholar 

  23. Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In Proc. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ‘16) 265–283 (USENIX, 2016).

  24. Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 8026–8037 (NeurIPS, 2019).

  25. Vaswani, A. et al. Attention Is all you need. In Advances in Neural Information Processing Systems 5998–6008 (NeurIPS, 2017).

  26. Lin, T., Goyal, P., Girshick, R., He, K. & Dollar, P. Focal loss for dense object detection. In Proc. IEEE International Conference on Computer Vision 2980–2988 (IEEE, 2017).

  27. Kingma, D. P. & Ba, L. J. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR, 2015).

  28. Bassani-Sternberg, Michal et al. Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry. Nat. Commun. 7, 13404 (2016).

  29. Bekker-Jensen, DorteB. et al. An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Syst. 4, 587–599 (2017).

    Article  Google Scholar 

  30. Qiao, R. Source Data for PointNovo (Zenodo, 2020); https://doi.org/10.5281/zenodo.3998873

  31. Ma, Z. Volpato30/PointNovo: First Release (Version v0.0.1) (Zenodo, 2020); https://doi.org/10.5281/zenodo.3960823

Download references

Acknowledgements

We thank Z. Ma for discussions on order-invariant networks. We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) (funding reference no. RGPIN-2019-04824), China’s National Key Research and Development Program under grant no. 2018YFB1003202 and NSERC grant no. OGP0046506. This work was performed while R.Q. was visiting Bioinformatics Solutions.

Author information

Authors and Affiliations

Authors

Contributions

R.Q. and A.G. conceived the research idea and the prototype of the model. R.Q. implemented the proposed algorithm and analysed the data. N.H.T, M.L, B.S, X.C. and L.X. contributed to model design and data analysis. N.H.T, M.L., A.G. and R.Q. wrote the manuscript. A.G. and M.L. supervised the research project.

Corresponding authors

Correspondence to Baozhen Shan or Ali Ghodsi.

Ethics declarations

Competing interests

The authors have filed a patent application for the PointNovo model in the USPTO Provisional Application (US Provisional Patent Application no. 62/833,959) by Bioinformatics Solutions, Waterloo, Canada. The authors are named inventors in the patent application. L.X., X.C. and B.S. are employees of Bioinformatics Solutions.

Additional information

Peer review information Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Set of peptides predicted by PointNovo and DeepNovo, comparing with the set of peptides identified by PEAKS DB.

Set of peptides predicted by PointNovo and DeepNovo, comparing with the set of peptides identified by PEAKS DB. Both DeepNovo and PointNovo are trained without the LSTM modules. Peptide score cutoff is applied to the results given by PointNovo and DeepNovo. We select the cutoff scores so that the amino acid accuracy of the remaining predicted peptides is 90%. Here, the overlap between two sets represents the peptides that are exactly the same (that is same amino acid residue sequence). Thus, the peptide recall is different from the number reported in Fig. 1, where a predicted amino acid residue is considered to be correct if the mass difference with the ground truth is smaller than 0.1 Da.

Extended Data Fig. 2 Performance of PointNovo on jittered spectra.

Performance of PointNovo on jittered spectra. To jitter the spectra, we add uniformly distributed random ppm errors to the m/z value of every peak in the original datasets. These jittered spectra could be considered as spectra of lower resolution.

Extended Data Fig. 3 Structure of T Net.

Structure of T Net. The output shape of each layer is annotated below each block. Here N denotes the number of data points. v and k are defined in the feature extraction section of online method. Hi represent the number of hidden neurons in each hidden layer, which are hyper parameters that can be turned by the users.

Extended Data Fig. 4 Comparison of using absolute m/z diff and ppm m/z diff.

Here the PointNovo models are trained on the combination of 4 datasets: PXD008808, PXD011246, PXD012645 and PXD012979.

Extended Data Fig. 5 Structure of PointNovo.

Structure of PointNovo. (a) PointNovo without LSTM. (b) PointNovo with LSTM.

Extended Data Fig. 6 Comparison with PEAKS de novo on patient Mel 16 data.

The PointNovo model here is trained on Mel 15 data, which has different peptide sequence pattern comparing with Mel 16 data.

Extended Data Fig. 7 Cross-enzyme performance of PointNovo without LSTM model on PXD004452 data.

PXD004452 dataset contains Hela samples digested by different enzyme. For each enzyme, we first ran database search peptide sequencing. The identified PSMs at 1% FDR are then split to training, validation and test set according to the ratio of 8:1:1. Separate PointNovo without LSTM models are trained for each enzyme and the cross-enzyme performance on test set is reported here.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiao, R., Tran, N.H., Xin, L. et al. Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices. Nat Mach Intell 3, 420–425 (2021). https://doi.org/10.1038/s42256-021-00304-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-021-00304-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing