Skip to main content

An Extensible Light-Weight XML-Based Monitoring System for Sequence Databases

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4075))

Abstract

Life science researchers want biological information in their interest to become available to them as soon as possible. A monitoring system is a solution that relieves biologists from periodic exploration of databases. In particular, it allows them to express their interest in certain data by means of queries/constraints; they are then notified when new data arrives satisfying these queries/constraints. We describe a sequence monitoring system XSeqM where users can combine metadata queries on sequence records with constraints on an alignment against a given source sequence. The system is an XML-based solution where constraints are specified through search fields in a user-friendly web interface and which are then translated to corresponding XPath-expressions. The system is easily extensible as addition of new databases to the system then only amounts to the specification of new mappings from search fields to XPath-expressions. To protect private source sequences obtained in labs, it is imperative that researchers do not have to upload their sequences to a general untrusted system, but that they can run XSeqM locally. To keep the system light-weight, we therefore introduce an optimization technique based on query containment to reduce the number of XPath-evaluations which constitutes the bottleneck of the system. We experimentally validate this technique and show that it can drastically improve the running time.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The Apache Xalan Project, http://xalan.apache.org

  2. Bioinformatic Sequence Markup Language (BSML), http://www.bsml.org

  3. Biomail, http://biomail.sourceforge.net/biomail

  4. Jade, http://www.biodigital.org/jade

  5. Limboole, http://fmv.jku.at/limboole/

  6. Limmat, http://fmv.jku.at/limmat/

  7. Pubcrawler, http://www.pubcrawler.ie

  8. PubMed Cubby, http://www.pubmed.gov

  9. Sciencedirect, http://www.sciencedirect.com

  10. World Wide Web Consortium. Extensible Markup Language (XML), http://www.w3.org/XML

  11. The XSeqM, http://alpha.uhasselt.be/dieter.vandecraen/XSeqM/

  12. Altinel, M., Franklin, M.J.: Efficient filtering of XML documents for selective dissemination of information. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 53–64. Morgan Kaufmann Publishers Inc., San Francisco (2000)

    Google Scholar 

  13. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)

    Google Scholar 

  14. Bleiholder, J., Khuller, S., Naumann, F., Raschid, L., Wu, Y.: Query planning in the presence of overlapping sources. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 811–828. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Bleiholder, J., Naumann, Z., Lacroix, F., Raschid, L., Murthy, H., Vidal, M.-E.: Biofast: challenges in exploring linked life sciences sources. SIGMOD Record 33(2), 72–77 (2004)

    Article  Google Scholar 

  16. Cerami, E.: XML for Bioinformatics. Springer, Heidelberg (2004)

    Google Scholar 

  17. Clark, J.: XML Path Language (XPath), http://www.w3.org/TR/xpath

  18. Diao, Y., Fischer, P., Franklin, M., To, R.: YFilter: Efficient and Scalable Filtering of XML Documents. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), p. 341. IEEE Computer Society, Los Alamitos (2002)

    Chapter  Google Scholar 

  19. Diao, Y., Franklin, M.J.: High-Performance XML Filtering: An Overview of YFilter. IEEE Data Engineering Bulletin 26(1), 41–48 (2003)

    Google Scholar 

  20. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979)

    MATH  Google Scholar 

  21. Green, T.J., Miklau, G., Onizuka, M., Suciu, D.: Processing XML Streams with Deterministic Automata. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 173–189. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  22. Hokamp, K., Wolfe, K.: What’s new in the library? What’s new in GenBank? Let PubCrawler tell you. Trends in Genetics 15(11), 471–472 (1999)

    Article  Google Scholar 

  23. Hokamp, K., Wolfe, K.H.: PubCrawler: keeping up comfortably with PubMed and GenBank. Nucleic Acids Research 32, (Web Server Issue), W16–W19 (2004)

    Article  Google Scholar 

  24. Neven, F., Van de Craen, D.: Optimizing monitoring queries over distributed data. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 829–846. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  25. Shultz, M., De Groote, S.L.: MEDLINE SDI services: how do they compare? Journal of the Medical Library Association 91(4), 460–467 (2003)

    Google Scholar 

  26. Wilson, J.F.: The rise of biological databases. The Scientist 16(6), 34 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Van de Craen, D., Neven, F., Koch, K. (2006). An Extensible Light-Weight XML-Based Monitoring System for Sequence Databases. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_25

Download citation

  • DOI: https://doi.org/10.1007/11799511_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36593-8

  • Online ISBN: 978-3-540-36595-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics