Skip to main content

Managing Data in High Throughput Laboratories: An Experience Report from Proteomics

  • Conference paper
Conceptual Modeling - ER 2006 (ER 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4215))

Included in the following conference series:

  • 1061 Accesses

Abstract

Scientific laboratories are rich in data management challenges. This paper describes an end-to-end information management infrastructure for a high throughput proteomics industrial laboratory. A unique feature of the platform is a data and applications integration framework that is employed for the integration of heterogeneous data, applications and processes across the entire laboratory production workflow. We also define a reference architecture for implementing similar solutions organized according to the laboratory data lifecycle phases. Each phase is modeled by a set of workflows integrating programs and databases in sequences of steps and associated communication and data transfers. We discuss the issues associated with each phase, and describe how these issues were approached in the proteomics implementation.

The proteomics experience section of this paper draws from the following manuscript: An End-to-End Bioinformatics Platform for High Throughput Proteomics. T. Topaloglou, M. Dharsee, M. Li, R.M. Ewing, Y.V. Bukhman, P. Chu, P. Economopoulos, S. Huynh, D. Lee, A. Pasculescu, A.-M. Salter, H. Wang.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brazma, A., Hingamp, P., et al.: Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nature Genetics 29, 365–371 (2001)

    Article  Google Scholar 

  2. Spellman, P., Miller, M., et al.: Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biology 3(9) (2002)

    Google Scholar 

  3. Orchard, S., Hermjakob, H., Binz, P.A., Hoogland, C., Taylor, C.F., Zhu, W., Julian Jr., R.K., Apweiler, R.: Further steps towards data standardisation: the Proteomic Standards Initiative. Proteomics 5(2), 337–339 (2005)

    Article  Google Scholar 

  4. Goble, C., Wroe, C., Stevens, R.: The myGrid consortium: The myGrid Project: Services, Architecture and Demonstrator. In: Proc UK e-Science programme All Hands Conference, pp. 595–603 (2003)

    Google Scholar 

  5. Leser, U., Naumann, F.: (Almost) Hands-Off Information Integration for the Life Sciences. In: CIDR (2005)

    Google Scholar 

  6. Etzold, T., Harris, H., Beaulah, S.: SRS: An Integration Platform for Databanks and Analysis Tools in Bioinformatics. In: Lacroix, Z., Chrichlow, T. (eds.) Bioinformatics: Managing scientific data. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  7. Markowitz, V.M., Korzeniewski, F., Palaniappan, K., Szeto, E., Ivanova, N., Kyrpides, N.C.: The integrated microbial genomes (IMG) system: a case study in biological data management. In: VLDB 2005 (2005)

    Google Scholar 

  8. Hsu, F., et al.: The UCSC Proteome Browser. Nucleic Acids Res. 33(Database issue), D454–D458 (2005)

    Article  Google Scholar 

  9. Boguski, M.S., McIntosh, M.W.: Biomedical informatics for proteomics. Nature 422, 233–237 (2003)

    Article  Google Scholar 

  10. Searls, D.: Data Integration challenges in drug discovery. Nature Reviews. Drug Discovery 4(1), 45–58 (2005)

    Article  Google Scholar 

  11. Markowitz, V., Campbell, J., Chen, A., Kosky, A., Palaniapan, K., Topaloglou, T.: Integration Challenges in Gene Expression Data Management. In: Lacroix, Z., Chrichlow, T. (eds.) Bioinformatics: Managing Scientific Data. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  12. Tyers, M., Mann, M.: From genomics to proteomics. Nature 422(6928), 193–197 (2003)

    Article  Google Scholar 

  13. Aebersold, R., Mann, M.: Mass spectromentry-based proteomics. Nature 422, 198–207 (2003)

    Article  Google Scholar 

  14. Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., Oinn, T.: Provenance of e-Science Experiments –experience from Bioinformatics. In: Proceedings of the UK e-Science 2nd All Hands Meeting (2003)

    Google Scholar 

  15. Pedrioli, P.G., Eng, J.K., et al.: A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnology 22(11), 1459–1466 (2004)

    Article  Google Scholar 

  16. FDA. Guidance for Industry: Part 11, Electronic Records; Electronic Signatures: Scope and Application (2003), http://www.fda.gov/cder/guidance/index.htm

  17. Yang, X., Dondeti, V., et al.: DBParser: web-based software for shotgun proteomic data analyses. J. Proteome Research 3(5), 1002–1008 (2004)

    Article  Google Scholar 

  18. Topaloglou, T.: Biological Data Management: Research, Practice and Opportunities. In: VLDB (2004)

    Google Scholar 

  19. Markowitz, V., Topaloglou, T.: Applying Data Warehouse Concepts to Gene Expression Data Management. In: 2nd IEEE International Synposium in Bioinformatics and Bioengineering (BIBE) (2001)

    Google Scholar 

  20. Soldatova, L.N., King, R.D.: Are the current ontologies in biology good ontologies? Nature Biotechnology 23, 1095–1098 (2005)

    Article  Google Scholar 

  21. Topaloglou, T., Kosky, A., Markowitz, V.: Seamless Intergation of Biological Applications within a Database Framework. In: ISMB (1999)

    Google Scholar 

  22. Franklin, M., Halevy, A., Maier, D.: From Databases to Dataspaces: A new abstraction for information management. SIGMOD Record 34(4) (2005)

    Google Scholar 

  23. Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D., Heber, G.: Scientific Data Management in the Coming Decade. SIGMOD Record 34(4) (2005)

    Google Scholar 

  24. Jagadish, H.V., Olken, F.: Database management for life sciences research. SIGMOD Record 33(2) (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Topaloglou, T. (2006). Managing Data in High Throughput Laboratories: An Experience Report from Proteomics. In: Embley, D.W., Olivé, A., Ram, S. (eds) Conceptual Modeling - ER 2006. ER 2006. Lecture Notes in Computer Science, vol 4215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11901181_46

Download citation

  • DOI: https://doi.org/10.1007/11901181_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-47224-7

  • Online ISBN: 978-3-540-47227-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics