Abstract
Scientific laboratories are rich in data management challenges. This paper describes an end-to-end information management infrastructure for a high throughput proteomics industrial laboratory. A unique feature of the platform is a data and applications integration framework that is employed for the integration of heterogeneous data, applications and processes across the entire laboratory production workflow. We also define a reference architecture for implementing similar solutions organized according to the laboratory data lifecycle phases. Each phase is modeled by a set of workflows integrating programs and databases in sequences of steps and associated communication and data transfers. We discuss the issues associated with each phase, and describe how these issues were approached in the proteomics implementation.
The proteomics experience section of this paper draws from the following manuscript: An End-to-End Bioinformatics Platform for High Throughput Proteomics. T. Topaloglou, M. Dharsee, M. Li, R.M. Ewing, Y.V. Bukhman, P. Chu, P. Economopoulos, S. Huynh, D. Lee, A. Pasculescu, A.-M. Salter, H. Wang.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brazma, A., Hingamp, P., et al.: Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nature Genetics 29, 365–371 (2001)
Spellman, P., Miller, M., et al.: Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biology 3(9) (2002)
Orchard, S., Hermjakob, H., Binz, P.A., Hoogland, C., Taylor, C.F., Zhu, W., Julian Jr., R.K., Apweiler, R.: Further steps towards data standardisation: the Proteomic Standards Initiative. Proteomics 5(2), 337–339 (2005)
Goble, C., Wroe, C., Stevens, R.: The myGrid consortium: The myGrid Project: Services, Architecture and Demonstrator. In: Proc UK e-Science programme All Hands Conference, pp. 595–603 (2003)
Leser, U., Naumann, F.: (Almost) Hands-Off Information Integration for the Life Sciences. In: CIDR (2005)
Etzold, T., Harris, H., Beaulah, S.: SRS: An Integration Platform for Databanks and Analysis Tools in Bioinformatics. In: Lacroix, Z., Chrichlow, T. (eds.) Bioinformatics: Managing scientific data. Morgan Kaufmann, San Francisco (2003)
Markowitz, V.M., Korzeniewski, F., Palaniappan, K., Szeto, E., Ivanova, N., Kyrpides, N.C.: The integrated microbial genomes (IMG) system: a case study in biological data management. In: VLDB 2005 (2005)
Hsu, F., et al.: The UCSC Proteome Browser. Nucleic Acids Res. 33(Database issue), D454–D458 (2005)
Boguski, M.S., McIntosh, M.W.: Biomedical informatics for proteomics. Nature 422, 233–237 (2003)
Searls, D.: Data Integration challenges in drug discovery. Nature Reviews. Drug Discovery 4(1), 45–58 (2005)
Markowitz, V., Campbell, J., Chen, A., Kosky, A., Palaniapan, K., Topaloglou, T.: Integration Challenges in Gene Expression Data Management. In: Lacroix, Z., Chrichlow, T. (eds.) Bioinformatics: Managing Scientific Data. Morgan Kaufmann, San Francisco (2003)
Tyers, M., Mann, M.: From genomics to proteomics. Nature 422(6928), 193–197 (2003)
Aebersold, R., Mann, M.: Mass spectromentry-based proteomics. Nature 422, 198–207 (2003)
Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., Oinn, T.: Provenance of e-Science Experiments –experience from Bioinformatics. In: Proceedings of the UK e-Science 2nd All Hands Meeting (2003)
Pedrioli, P.G., Eng, J.K., et al.: A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnology 22(11), 1459–1466 (2004)
FDA. Guidance for Industry: Part 11, Electronic Records; Electronic Signatures: Scope and Application (2003), http://www.fda.gov/cder/guidance/index.htm
Yang, X., Dondeti, V., et al.: DBParser: web-based software for shotgun proteomic data analyses. J. Proteome Research 3(5), 1002–1008 (2004)
Topaloglou, T.: Biological Data Management: Research, Practice and Opportunities. In: VLDB (2004)
Markowitz, V., Topaloglou, T.: Applying Data Warehouse Concepts to Gene Expression Data Management. In: 2nd IEEE International Synposium in Bioinformatics and Bioengineering (BIBE) (2001)
Soldatova, L.N., King, R.D.: Are the current ontologies in biology good ontologies? Nature Biotechnology 23, 1095–1098 (2005)
Topaloglou, T., Kosky, A., Markowitz, V.: Seamless Intergation of Biological Applications within a Database Framework. In: ISMB (1999)
Franklin, M., Halevy, A., Maier, D.: From Databases to Dataspaces: A new abstraction for information management. SIGMOD Record 34(4) (2005)
Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D., Heber, G.: Scientific Data Management in the Coming Decade. SIGMOD Record 34(4) (2005)
Jagadish, H.V., Olken, F.: Database management for life sciences research. SIGMOD Record 33(2) (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Topaloglou, T. (2006). Managing Data in High Throughput Laboratories: An Experience Report from Proteomics. In: Embley, D.W., Olivé, A., Ram, S. (eds) Conceptual Modeling - ER 2006. ER 2006. Lecture Notes in Computer Science, vol 4215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11901181_46
Download citation
DOI: https://doi.org/10.1007/11901181_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47224-7
Online ISBN: 978-3-540-47227-8
eBook Packages: Computer ScienceComputer Science (R0)