skip to main content
10.1145/1862344.1862359acmotherconferencesArticle/Chapter ViewAbstractPublication PagessisapConference Proceedingsconference-collections
research-article

An inverted index for mass spectra similarity query and comparison with a metric-space method: case study

Published:18 September 2010Publication History

ABSTRACT

Query performance is a determining factor in the adoption of an indexing method for similarity query. Metric space indexing methods take great pride in their general applicability. However, it is usually hard for a general method to perform well for every domain. Therefore, it is of interest to investigate the performance of metric-space methods, comparing with domain specific methods, on a particular domain. This paper describes such an investigation for proteomic mass spectra. An inverted index method that exploits the sparsity of mass spectra binary format data and acts as a coarse filter before fine ranking is proposed and empirically compared with an existing metric-space indexing method. Results show that the inverted index method yields greater search efficiency and outperforms the metric-space method in query speed and index size.

References

  1. }}Bozkaya, T. and M. Ozsoyoglu, Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst., 1999. 24(3): p. 361--404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}Chavez, E., G. Navarro, R. Baeza-Yates, and J. Marroqu, Searching in metric spaces. ACM Computing Surveys (CSUR), 2001. 33(3): p. 273--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}D. Dutta and T. Chen. Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search. Bioinformatics, 23(5):612--618, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}Ari M. Frank, Nuno Bandeira, Zhouxin Shen, Stephen Tanner, Steven P. Briggs, Richard D. Smith, and Pavel A. Pevzner. Clustering Millions of Tandem Mass Spectra. J. Proteome Res. 2008 January; 7(1): 113--122.Google ScholarGoogle Scholar
  5. }}A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In The VLDB Journal, pages 518--529, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}Hjaltason, G. R. and H. Samet, Index-driven similarity search in metric spaces. ACM Transactions on Database Systems (TODS), 2003. 28(4): p. 517--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}D. Hoksza and T. Skopal. Index-based approach to similarity search in protein and nucleotide databases. CEUR Proc. Dateso 2007, vol. 235, pp. 67--80. 2007.Google ScholarGoogle Scholar
  8. }}Miranker, D. P., Xu W. and Mao, R. Mobios: a metric-space dbms to support biological discovery. Proceedings of the International Conference on Scientific and Statistical Database Management System, pp. 241--244, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}The MoBIoS repository: http://aug.csres.utexas.edu/sisap2010_msGoogle ScholarGoogle Scholar
  10. }}J. Novák, D. Hoksza. Parametrised Hausdorff Distance as a Non-Metric Similarity Model for Tandem Mass Spectrometry. In the Proceedings of the Dateso 2010 Annual International Workshop on DAtabases, TExts, Specifications and Objects. Stedronin-Plazy, Czech Republic, April 21, 2010.Google ScholarGoogle Scholar
  11. }}Perkins, D. et al. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20, 3551--3567, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  12. }}Pevzner, P. et al. Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. Genome Res., 11, 290--299, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  13. }}J. Prince, M. Carlson, R. Wang, P. Lu, and E. Marcotte. The need for a public proteomics repository. Nature Biotechnology, 22(4):471--472, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  14. }}Ramakrishnan, S. R., Mao, R., Nakorchevskiy, A. A., Prince, J. T., Willard, W. S., Xu, W., Marcotte, E. M., and Miranker, D. P. 2006. A fast coarse filtering method for peptide identification by mass spectrometry. Bioinformatics 22, 12 (Jun. 2006), 1524--1531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}Samet, H., Foundations of Multidimensional and Metric Data Structures. 2006, Morgan-Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}The Sashimi mass spectra repository: http://sashimi.sourceforge.net.Google ScholarGoogle Scholar
  17. }}G. Shakhnarovich, T. Darrell, and P. Indyk, editors. Nearest-Neighbor Methods g and Vision: Theory and Practice (Neural Information Processing). The MIT Press, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}Yates III, J. et al. Method to correlate tandem mass spectral data of modified peptides to amino acid sequences in the protein database. Anal. Chem., 67, 1426--1436, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  19. }}P. Zezula, G. Amato, V. Dohnal and M. Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer, New York, USA. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. }}Zhang, W. and Chait, B. ProFound---an expert system for protein identification using mass spectrometric peptide mapping information. Anal. Chem., 72, 2482--2489, 2000.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An inverted index for mass spectra similarity query and comparison with a metric-space method: case study

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                SISAP '10: Proceedings of the Third International Conference on SImilarity Search and APplications
                September 2010
                130 pages
                ISBN:9781450304207
                DOI:10.1145/1862344

                Copyright © 2010 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 18 September 2010

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader