Skip to main content

Mining Mass Spectrometry Database Search Results—A Rough Set Approach

  • Conference paper
Rough Sets and Intelligent Systems Paradigms (RSEISP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4585))

Abstract

This paper reports results of experiments on mass spectrometry database search results produced by Keller et al. This data set describes human proteins. Data mining was conducted using the LERS system. First, the data set was discretized by a cluster analysis algorithm based on agglomerative approach. Then the basic rule set was induced by the LEM2 algorithm. Finally, the rule set was refined using changing rule strength methodology and truncation of the rule set. Our results reach the level of sensitivity and specificity of competing methods. However, our results are explainable since they are in a form of rules and, additionally, we can interpret the role of important features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Anderson, D.C., Li, W., Payan, D.G., Noble, W.S.: A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J. Proteome. Res. 2, 137–146 (2003)

    Article  Google Scholar 

  • Booker, L.B., Goldberg, D.E., Holland, J.F.: Classifier systems and genetic algorithms. In: Carbonell, J.G. (ed.) Machine Learning. Paradigms and Methods, pp. 235–282. MIT Press, Menlo Park, CA (1990)

    Google Scholar 

  • Clauser, K.R., Baker, P.R., Burlingame, A.L.: Role of accurate mass measurement (+/– 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Anal. Chem. 71, 2871–2882 (1999)

    Article  Google Scholar 

  • Craig, R., Ronald, C., Beavis, R.C.: TANDEM: matching proteins with mass spectra. Bioinformatics 20, 1466–1467 (2004)

    Article  Google Scholar 

  • Fang, J.W., Dong, Y.H., Williams, T.D., Lushington, G.H.: Classification of MS/MS Peptide Identifications and Its Application in Data Validation (Submitted)

    Google Scholar 

  • Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)

    MATH  Google Scholar 

  • Grzymala-Busse, J.W.: MLEM2: A new algorithm for rule induction from imperfect data. In: IPMU 2002. Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Annecy, France, July 1–5, 2002, pp. 243–250 (2002)

    Google Scholar 

  • Grzymala-Busse, J.W., Hippe, Z.S.: Postprocessing of rule sets induced from a melanoma data set. In: Proceedings of the COMPSAC 2002. 26th Annual International Conference on Computer Software and Applications, Oxford, England, August 26–29, 2002, pp. 1146–1151 (2002)

    Google Scholar 

  • Grzymala-Busse, J.W., Goodwin, L.K., Grzymala-Busse, W.J., Zheng, X.: An approach to imbalanced data sets based on changing rule strength. In: Learning from Imblanced Data Sets, AAI Workshop at the 17th Conference on AI, AAAI-2000, Austin, TX, July 30–31, 2000, pp. 69–74 (2000)

    Google Scholar 

  • Holland, J.H., Holyoak, K.J., Nisbett, R.E.: Induction. Processes of Inference, Learning, and Discovery. The MIT Press, Menlo Park, CA (1986)

    Google Scholar 

  • Keller, A., Nesvizhskii, A.I., Kolker, E., Aebersold, R.: Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 74, 5383–5392 (2002)

    Article  Google Scholar 

  • Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  • Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)

    MATH  Google Scholar 

  • Perkins, D.N., Pappin, D.J., Creasy, D.M., Cottrell, J.S.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999)

    Article  Google Scholar 

  • Ulintz, P.J., Zhu, J., Qin, Z.S., Andrews, P.C.: Improved classification of mass spectrometry database Search results using newer machine learning approaches. Mol. Cell. Proteomics 5, 497–509 (2006)

    Google Scholar 

  • Yates III, J.R., Eng, J.K., McCormack, A.L.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in the protein database. J. Am. Soc. Mass. Spectrom. 5, 976–989 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Marzena Kryszkiewicz James F. Peters Henryk Rybinski Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fang, J., Grzymala-Busse, J.W. (2007). Mining Mass Spectrometry Database Search Results—A Rough Set Approach. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds) Rough Sets and Intelligent Systems Paradigms. RSEISP 2007. Lecture Notes in Computer Science(), vol 4585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73451-2_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73451-2_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73450-5

  • Online ISBN: 978-3-540-73451-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics