Abstract:
Tandem mass spectrometry (MS/MS) is a widely used technique for protein identification, post-translational modifications, immunotherapy, and other applications. As the am...Show MoreMetadata
Abstract:
Tandem mass spectrometry (MS/MS) is a widely used technique for protein identification, post-translational modifications, immunotherapy, and other applications. As the amount of MS/MS spectra data increases, new computational methods are needed to efficiently search through these databases. This study introduces MS2VEC, a novel fingerprint embedding model designed to facilitate large-scale retrieval of peptide mass spectra. MS2VEC captures the relationships between distant peaks and incorporates position-aware fingerprint features from all peaks. To do this, dilated convolutions are used to capture remote relationships, and a novel position-aware multi-head attention pooling mechanism is used to abstract fingerprint features. The results demonstrate that MS2VEC achieves a top-1 retrieval accuracy of 0.810, outperforming existing methods by 5.1%. Interestingly, the precursor charge is not essential for the retrieval task, as the spectra itself contains enough information to accurately predict the charge. Additionally, the results suggest that weight-balanced fragment ions and water losses are important contributors to fingerprint features.
Date of Conference: 05-08 December 2023
Date Added to IEEE Xplore: 18 January 2024
ISBN Information: