Skip to main content

Using Term Lists and Inverted Files to Improve Search Speed for Metabolic Pathway Databases

  • Conference paper
  • 519 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4075))

Abstract

This paper describes a technique for efficiently searching metabolic pathways similar to a given query pathway, from a pathway database. Metabolic pathways can be converted into labeled directed graphs where the nodes represent chemical compounds. Similarity between two graphs can be computed using a metric based on Maximal Common Subgraph (MCS). By maintaining an inverted file that indexes all pathways in a database on their edges, our algorithm finds and ranks all pathways similar to the user input query pathway in time, which is linear in the total number of occurrences of the edges in common with the query in the entire database.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bader, G.D., Cary, M.P., Sander, C.: Pathguide: a pathway resource list. Nucleic Acids Res. 34(Database issue), D504–D506 (2006)

    Article  Google Scholar 

  2. KEGG - Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., Hirakawa, M.: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354–D357 (2006)

    Article  Google Scholar 

  3. Bairoch, A.: The ENZYME database in 2000. Nucleic Acids Res. 28, 304–305 (2000)

    Article  Google Scholar 

  4. Schomburg, I., Chang, A., Schomburg, D.: BRENDA, enzyme data and metabolic information. Nucleic Acids Res. 30, 7–9 (2002)

    Article  Google Scholar 

  5. Krieger, C.J., Zhang, P., Mueller, L.A., Wang, A., Paley, S., Arnaud, M., Pick, J., Rhee, S.Y., Karp, P.D.: MetaCyc: A Multiorganism Database of Metabolic Pathways and Enzymes. Nucleic Acids Research 32(1), D438–D442 (2004)

    Article  Google Scholar 

  6. PubChem database, http://pubchem.ncbi.nlm.nih.gov/

  7. Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19(3-4), 255–259 (1998)

    Article  MATH  Google Scholar 

  8. Chen, M., Hofestaedt, R.: PathAligner: Metabolic Pathway Retrieval and Alignment. Applied Bioinformatics 3(4), 241–252 (2004)

    Article  Google Scholar 

  9. Pinter, R., et al.: Tree-based Comparison of Metabolic Pathways

    Google Scholar 

  10. Metabolic Pathway Search Engine, http://data.dataspaceweb.net/pathways/Search.php

  11. Forst, C.V., Schulten, K.: Evolution of metabolisms: a new method for the comparison of metabolic pathways using genomics information. J. Comput. Biol. 6(3-4), 343–360 (1999)

    Article  Google Scholar 

  12. EC-Published in Enzyme Nomenclature. Academic Press, San Diego, California (1992), ISBN 0-12-227164-5 (hardback), 0-12-227165-3 (paperback) with Supplement 1 (1993), Supplement 2 (1994), Supplement 3 (1995), Supplement 4 (1997), Supplement 5 (in Eur. J.Biochem. 223, 1–5 (1994), Eur. J. Biochem. 232, 1–6 (1995), Eur. J. Biochem. 237, 1–5 (1996), Eur. J. Biochem. 250, 1–6 (1997), Eur. J. Biochem. 264, 610–650 (1999) respectively) (Copyright IUBMB)

    Google Scholar 

  13. Lerdorf, R., Tatroe, K.: Programming PHP (Published: 05/04/2002) ISBN 1565926102

    Google Scholar 

  14. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. Section 22.3 Depth First Search

    Google Scholar 

  15. Grossman, R.L., Kasturi, P., Hamelberg, D., Liu, B.: An Empirical Study of the Universal Chemical Key Algorithm for Assigning Unique Keys to Chemical Compounds. Journal of Bioinformatics and Computational Biology 2(1), 155–171 (2004)

    Article  Google Scholar 

  16. Neglur, G., Grossman, R.L., Liu, B.: Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS (LNBI), vol. 3615, pp. 145–157. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. Kelley, B.P., Sharan, R., Karp, R., Sittler, E.T., Root, D.E., Stockwell, B.R., Ideker, T.: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl. Acad. Sci. USA 100, 11394–11399 (2003)

    Article  Google Scholar 

  18. Kelley, B.P., Yuan, B., Lewitter, F., Sharan, R., Stockwell, B.R., Ideker, T.: PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res. 32, (Web Server issue), W83–W88 (2004)

    Article  Google Scholar 

  19. Sharan, R., Suthram, S., Kelley, R.M., Kuhn, T., McCuine, S., Uetz, P., Sittler, T., Karp, R.M., Ideker, T.: Conserved patterns of protein interaction in multiple species. Proc. Natl. Acad. Sci. USA 8, 102(6), 1974–1979 (2005)

    Article  Google Scholar 

  20. Goldman, R., Widom, J.: Dataguides:enabling query formulation and optimization in semistructured databases. In: Proceedings of VLDB, pp. 436–445 (1997)

    Google Scholar 

  21. Chung, C.-W., Min, J.-K., Shim, K.: Apex: an adaptive path index for XML data. In: SIGMOD, pp. 121–132 (2002)

    Google Scholar 

  22. Schenkel, R., Theobald, A., Weikum, G.: Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections, icde. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 360–371 (2005)

    Google Scholar 

  23. Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Symposium on Principles of Database Systems, pp. 39–52 (2002)

    Google Scholar 

  24. Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure based approach. In: Proceedings of SIGMOD 2004 (2004)

    Google Scholar 

  25. James, C.A., Weininger, D., Delany, J.: Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc. (2003)

    Google Scholar 

  26. Nenashev, V., Overbeek, R., Panyushkina, E., Pronevitch, L., Selkov Jr, E., Yunus, I.: The metabolic pathway collection from EMP: the enzymes and metabolic pathways database. Nucleic Acids Res. 24(1), 26–28 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neglur, G., Grossman, R.L., Maltsev, N., Yu, C. (2006). Using Term Lists and Inverted Files to Improve Search Speed for Metabolic Pathway Databases. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_15

Download citation

  • DOI: https://doi.org/10.1007/11799511_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36593-8

  • Online ISBN: 978-3-540-36595-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics