Using Term Lists and Inverted Files to Improve Search Speed for Metabolic Pathway Databases

Neglur, Greeshma; Grossman, Robert L.; Maltsev, Natalia; Yu, Clement

doi:10.1007/11799511_15

Using Term Lists and Inverted Files to Improve Search Speed for Metabolic Pathway Databases

Greeshma Neglur²²,
Robert L. Grossman²²,
Natalia Maltsev²³ &
…
Clement Yu²⁴

Conference paper

519 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4075))

Abstract

This paper describes a technique for efficiently searching metabolic pathways similar to a given query pathway, from a pathway database. Metabolic pathways can be converted into labeled directed graphs where the nodes represent chemical compounds. Similarity between two graphs can be computed using a metric based on Maximal Common Subgraph (MCS). By maintaining an inverted file that indexes all pathways in a database on their edges, our algorithm finds and ranks all pathways similar to the user input query pathway in time, which is linear in the total number of occurrences of the edges in common with the query in the entire database.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bader, G.D., Cary, M.P., Sander, C.: Pathguide: a pathway resource list. Nucleic Acids Res. 34(Database issue), D504–D506 (2006)
Article Google Scholar
KEGG - Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., Hirakawa, M.: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354–D357 (2006)
Article Google Scholar
Bairoch, A.: The ENZYME database in 2000. Nucleic Acids Res. 28, 304–305 (2000)
Article Google Scholar
Schomburg, I., Chang, A., Schomburg, D.: BRENDA, enzyme data and metabolic information. Nucleic Acids Res. 30, 7–9 (2002)
Article Google Scholar
Krieger, C.J., Zhang, P., Mueller, L.A., Wang, A., Paley, S., Arnaud, M., Pick, J., Rhee, S.Y., Karp, P.D.: MetaCyc: A Multiorganism Database of Metabolic Pathways and Enzymes. Nucleic Acids Research 32(1), D438–D442 (2004)
Article Google Scholar
PubChem database, http://pubchem.ncbi.nlm.nih.gov/
Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19(3-4), 255–259 (1998)
Article MATH Google Scholar
Chen, M., Hofestaedt, R.: PathAligner: Metabolic Pathway Retrieval and Alignment. Applied Bioinformatics 3(4), 241–252 (2004)
Article Google Scholar
Pinter, R., et al.: Tree-based Comparison of Metabolic Pathways
Google Scholar
Metabolic Pathway Search Engine, http://data.dataspaceweb.net/pathways/Search.php
Forst, C.V., Schulten, K.: Evolution of metabolisms: a new method for the comparison of metabolic pathways using genomics information. J. Comput. Biol. 6(3-4), 343–360 (1999)
Article Google Scholar
EC-Published in Enzyme Nomenclature. Academic Press, San Diego, California (1992), ISBN 0-12-227164-5 (hardback), 0-12-227165-3 (paperback) with Supplement 1 (1993), Supplement 2 (1994), Supplement 3 (1995), Supplement 4 (1997), Supplement 5 (in Eur. J.Biochem. 223, 1–5 (1994), Eur. J. Biochem. 232, 1–6 (1995), Eur. J. Biochem. 237, 1–5 (1996), Eur. J. Biochem. 250, 1–6 (1997), Eur. J. Biochem. 264, 610–650 (1999) respectively) (Copyright IUBMB)
Google Scholar
Lerdorf, R., Tatroe, K.: Programming PHP (Published: 05/04/2002) ISBN 1565926102
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. Section 22.3 Depth First Search
Google Scholar
Grossman, R.L., Kasturi, P., Hamelberg, D., Liu, B.: An Empirical Study of the Universal Chemical Key Algorithm for Assigning Unique Keys to Chemical Compounds. Journal of Bioinformatics and Computational Biology 2(1), 155–171 (2004)
Article Google Scholar
Neglur, G., Grossman, R.L., Liu, B.: Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS (LNBI), vol. 3615, pp. 145–157. Springer, Heidelberg (2005)
Chapter Google Scholar
Kelley, B.P., Sharan, R., Karp, R., Sittler, E.T., Root, D.E., Stockwell, B.R., Ideker, T.: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl. Acad. Sci. USA 100, 11394–11399 (2003)
Article Google Scholar
Kelley, B.P., Yuan, B., Lewitter, F., Sharan, R., Stockwell, B.R., Ideker, T.: PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res. 32, (Web Server issue), W83–W88 (2004)
Article Google Scholar
Sharan, R., Suthram, S., Kelley, R.M., Kuhn, T., McCuine, S., Uetz, P., Sittler, T., Karp, R.M., Ideker, T.: Conserved patterns of protein interaction in multiple species. Proc. Natl. Acad. Sci. USA 8, 102(6), 1974–1979 (2005)
Article Google Scholar
Goldman, R., Widom, J.: Dataguides:enabling query formulation and optimization in semistructured databases. In: Proceedings of VLDB, pp. 436–445 (1997)
Google Scholar
Chung, C.-W., Min, J.-K., Shim, K.: Apex: an adaptive path index for XML data. In: SIGMOD, pp. 121–132 (2002)
Google Scholar
Schenkel, R., Theobald, A., Weikum, G.: Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections, icde. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 360–371 (2005)
Google Scholar
Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Symposium on Principles of Database Systems, pp. 39–52 (2002)
Google Scholar
Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure based approach. In: Proceedings of SIGMOD 2004 (2004)
Google Scholar
James, C.A., Weininger, D., Delany, J.: Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc. (2003)
Google Scholar
Nenashev, V., Overbeek, R., Panyushkina, E., Pronevitch, L., Selkov Jr, E., Yunus, I.: The metabolic pathway collection from EMP: the enzymes and metabolic pathways database. Nucleic Acids Res. 24(1), 26–28 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Advanced Computing, University of Illinois at Chicago, Chicago, IL, 60607, USA
Greeshma Neglur & Robert L. Grossman
Argonne National Laboratory, Math and Computer Science Division, Argonne, IL, 60439, USA
Natalia Maltsev
Department of Computer Science, University of Illinois at Chicago, Chicago, IL, 60607, USA
Clement Yu

Authors

Greeshma Neglur
View author publications
You can also search for this author in PubMed Google Scholar
Robert L. Grossman
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Maltsev
View author publications
You can also search for this author in PubMed Google Scholar
Clement Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Humboldt-Universität zu Berlin,
Ulf Leser
Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
Felix Naumann
IBM Application and Integration Middleware, 1475 Phoenixville Pike, 19380, West Chester, PA, USA
Barbara Eckman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neglur, G., Grossman, R.L., Maltsev, N., Yu, C. (2006). Using Term Lists and Inverted Files to Improve Search Speed for Metabolic Pathway Databases. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_15

Download citation

DOI: https://doi.org/10.1007/11799511_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36593-8
Online ISBN: 978-3-540-36595-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics