Abstract
Feature detection and peak detection are one of the first steps of mass spectrometry data processing. This data comes in large volumes; thus, the processing needs to be optimized, not overloaded. State-of-the-art clustering algorithms can not perform feature detection for several reasons. First issue is the volume of the data, second is the disparity of the sampling frequency in the MZ and RT axis. Here we show the data transformation to utilize the clustering algorithms without the need to redefine its kernel. Data are first pre-clustered to obtain regions that can be processed independently. Then we transform the data so that the numerical differences between consecutive points should be the same in both space axes. We applied a set of clustering algorithms for each region to find the features, and we compared the result with the Gridmass peak detector. These findings may facilitate better utilization of the 2D clustering method as feature detectors for mass spectra.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS. ACM SIGMOD Record 28(2), 49–60 (1999). https://doi.org/10.1145/304181.304187. https://dl.acm.org/doi/abs/10.1145/304181.304187
Castillo, S., Gopalacharyulu, P., Yetukuri, L., Orešič, M.: Algorithms and tools for the preprocessing of LC-MS metabolomics data. Chemometr. Intell. Lab. Syst. 108(1), 23–32 (2011). https://doi.org/10.1016/J.CHEMOLAB.2011.03.010
Constantinopoulos, C., Titsias, M.K., Likas, A.: Bayesian feature and model selection for Gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 28(6), 1013–1018 (2006). https://doi.org/10.1109/TPAMI.2006.111
Dixon, S.J., Brereton, R.G., Soini, H.A., Novotny, M.V., Penn, D.J.: An automated method for peak detection and matching in large gas chromatography-mass spectrometry data sets. J. Chemometr. 20(8–10), 325–340 (2006). https://doi.org/10.1002/CEM.1005
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Technical report (1996). www.aaai.org
Katajamaa, M., Orešič, M.: Data processing for mass spectrometry-based metabolomics. J. Chromatogr. A 1158(1–2), 318–328 (2007). https://doi.org/10.1016/J.CHROMA.2007.04.021
McDonnell, L.A., van Remoortere, A., de Velde, N., van Zeijl, R.J., Deelder, A.M.: Imaging mass spectrometry data reduction: automated feature identification and extraction. J. Am. Soc. Mass Spectrom. 21(12), 1969–1978 (2010). https://doi.org/10.1016/J.JASMS.2010.08.008
Melymuk, L., Diamond, M.L., Riddell, N., Wan, Y., Vojta, Š., Chittim, B.: Challenges in the analysis of novel flame retardants in indoor dust: results of the INTERFLAB 2 interlaboratory evaluation. Environ. Sci. Technol. 52(16), 9295–9303 (2018)
Morris, J.S., Coombes, K.R., Koomen, J., Baggerly, K.A., Kobayashi, R.: Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21(9), 1764–1775 (2005). https://doi.org/10.1093/BIOINFORMATICS/BTI254. https://academic.oup.com/bioinformatics/article/21/9/1764/408956
Roberts, S.J., Husmeier, D., Rezek, I., Penny, W.: Bayesian approaches to Gaussian mixture modeling. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1133–1142 (1998). https://doi.org/10.1109/34.730550
Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X.: DBSCAN revisited, revisited. ACM Trans. Database Syst. (TODS) 42(3) (2017). https://doi.org/10.1145/3068335. https://dl.acm.org/doi/abs/10.1145/3068335
Treviño, V., et al.: GridMass: a fast two-dimensional feature detection method for LC/MS. J. Mass Spectrom. 50(1), 165–174 (2015). https://doi.org/10.1002/jms.3512. http://doi.wiley.com/10.1002/jms.3512
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), 141–182 (1997). https://doi.org/10.1023/A:1009783824328. https://link.springer.com/article/10.1023/A:1009783824328
Acknowledgment
We make the test dataset and proof-of-concept available at 10.5281/zenodo.6337968.
Authors thanks to Research Infrastructure RECETOX RI (No LM2018121) financed by the Ministry of Education, Youth and Sports, and Operational Programme Research, Development and Innovation - project CETOCOEN EXCELLENCE (No CZ.02.1.01/0.0/0.0/17_043/0009632) for supportive background.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Barton, V., Skutkova, H. (2022). Data Transformation for Clustering Utilization for Feature Detection in Mass Spectrometry. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2022. Lecture Notes in Computer Science(), vol 13347. Springer, Cham. https://doi.org/10.1007/978-3-031-07802-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-07802-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07801-9
Online ISBN: 978-3-031-07802-6
eBook Packages: Computer ScienceComputer Science (R0)