Skip to main content
Log in

Scientific Data Management in a Grid Environment

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Managing scientific data is by no means a trivial task even in a single site environment with a small number of researchers involved. We discuss some issues concerned with posing well-specified experiments in terms of parameters or instrument settings and the metadata framework that arises from doing so. We are particularly interested in parallel computer simulation experiments, where very large quantities of warehouse-able data are involved, run in a multi-site Grid environment. We consider SQL databases and other framework technologies for manipulating experimental data. Our framework manages the outputs from parallel runs that arise from large cross-products of parameter combinations. Considerable useful experiment planning and analysis can be done with the sparse metadata without fully expanding the parameter cross-products. Extra value can be obtained from simulation output that can subsequently be data-mined. We have particular interests in running large scale Monte Carlo physics model simulations. Finding ourselves overwhelmed by the problems of managing data and compute resources, we have built a prototype tool using Java and MySQL that addresses these issues. We use this example to discuss type-space management and other fundamental ideas for implementing a laboratory information management system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. Abramson, R. Sosic, J. Giddy and B. Hall, ???Nimrod: A Tool for Performing Parametised Simulations Using Distributed Workstations???, in Proc. 4th IEEE Symposium on High Performance Distributed Computing, Virginia, August 1995.

  2. G. Allen, E. Seidel and J. Shalf, ???Scientific Computing on the Grid???, Byte Magazine, pp. 24???32, Spring 2002.

  3. Blaze Systems Corporation, ???BlazeLIMS Laboratory Information Management System???, available from http://www.blazesystems.com. Last visited November 2004.

  4. H. Casanova, T. Bartol, F. Berman, A. Brinbaum, J. Dongarra, M. Ellisman, M. Faerman, E. Gockay, M. Miller, G. Obertelli, S. Pomerantz, S. Sejnowski, J. Stiles and R. Wolski, ???The Virtual Instrument: Support for Grid-enabled Scientific Simulations???, Technical Report CS2002-0707, May 2002.

  5. I. Foster and C. Kesselman, The Grid 2: Blueprint for a New Computing Infrastructure, 2nd edn. Morgan Kaufmann, 2003.

  6. Hawick, Coddington and James, ???Distributed Frameworks and Parallel Algorithms for Processing Large-Scale Geographic Data???, Parallel Comput., Vol. 10, p. 1297, 2003.

    Article  Google Scholar 

  7. K.A. Hawick and H.A. James, ???Ising Model Scaling Behaviour on Small-World Networks???, Technical Note CSTN-006, March 2004, available from http://www. massey.ac.nz/˜kahawick/cstn

  8. K.A. Hawick and H.A. James, ???Small-World Effects in Wireless Sensor Networks???, Technical Note CSTN-001, March 2004, available from http://www.massey. ac.nz/˜kahawick/cstn

  9. H.A. James, C.J. Scogings and K.A. Hawick, ???A Framework and Simulation Engine for Studying Artificial Life???, Res. Lett. in the Information and Mathematical Sciences, Vol. 6, May 2004.

  10. Joint Astronomy Center, ???Intelligent Agents and Robotic Telescopes to Help Astronomers Keep up with the Universe???, 14 October 2003, available from http://outreach.jach.hawaii.edu/pressroom/2003-estar/

  11. LabVantage Solutions, Inc. ???Sapphire Laboratory Information Management System???, available from http://www.labvantage.com. Last visited November 2004.

  12. B.N. Lawrence, R. Cramer, M. Gutierrez, K. Kleese van Dam, S. Kondapalli, S. Latham, R. Lowry, K. O'Neill and A. Woolf, ???The NERC DataGrid Prototype???, in S.J. Cox (ed.), Proc. U.K. e-Science All Hands Meeting, 2003.

  13. LIMSource, ???LIMSource: LIMS Resource on the Internet???, available from http://www.limsource.com. Last visited November 2004.

  14. J. Long, P. Spencer and R. Springmeyer, ???Simtracker ??? Using the Web to Track Computer Simulation Results???, in Proc. 1999 International Conference on Web-Based Modeling and Simulation, San Francisco, CA. Proceedings available as Simulation Series, Vol. 31, No. 3, from The Society for Computer Simulation.

  15. Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, ???Equation of State Calculations by Fast Computing Machines???, J. Chem. Phys., Vol. 21, No. 6, pp. 1087???1092, June 1953.

    Article  Google Scholar 

  16. R.W. Moore, Data Management Services, updated version of Data Management Systems for Scientific Applications, The Architecture of Scientific Software, Academic Publishers, 2001.

  17. J.D. Myers, T.C. Allison, S. Bittner, B. Didier, M. Frenklach, W.H. Green, Jr., Y.-L. Ho, J. Hewson, W. Koegler, C. Lansing, D. Leahy, M. Lee, R. McCoy, M. Minkoff, S. Nijsure, G. von Laszewski, D. Montoya, C. Pancerella, R. Pinzon, W. Pitz, L.A. Rahn, B. Ruscic, K. Schuchardt, E. Stephan, A. Wagner, T. Windus and C. Yang, ???A Collaborative Informatics Infrastructure for Multi-scale Science???, in Proc. Challenges of Large Applications in Distributed Environments (CLADE) Workshop, Honolulu, HI, 7 June 2004, pp. 24???33.

  18. Myers, Chappell, Elder, Geist and Schwidder, ???Re-Integrating the Research Record???, IEEE Computing in Science and Engineering, Vol. 5, No. 3, pp. 44???50, 2003.

    Google Scholar 

  19. MySQL, MySQL Database homepage, available from http://www.mysql.com. last visited July 2004.

  20. Particle Physics Data Grid (PPDG) Website, http://www.ppdg.net/, The Earth System Grid (ESG) Website, http://www.earthsystemgrid.org/, The National Fusion Grid Website, http://www.fusiongrid.org/projects/, The Collaboratory for Multi-scale Chemical Science Website, http://cmcs.org/. Last visited November 2004.

  21. C.J. Patten, F.A. Vaughan, K.A. Hawick and A.L. Brown, ???DWorFS: File System Support for Legacy Applications in DISCWorld???, in Proc. 5th IDEA Workshop, Fremantle, February 1998.

  22. Risch, T., Koparanova, M. and Thid??, B.: ???High-performance GRID Database Manager for Scientific Data???, in Proc. 4th Distributed Data and Structures, WDAS'02, Carleton Scientific: Paris, France, pp. 99???106, 2002.

    Google Scholar 

  23. A. Shoshani, A. Sim and J. Gu, ???Storage Resource Managers: Middleware Components for Grid Storage???, in Proc. 19th IEEE Symposium on Mass Storage Systems (MSS'02), 2002.

  24. STARLIMS Corporation, ???STARLIMS Laboratory Information Management System???, available from http://www.starlims.com. Last visited November 2004.

  25. Sun Microsystems, Inc., ???Java Foundation Classes (JFC/Swing) Web Page???, available from http://java.sun.com/products/jfc/index.jsp. Last visited November 2004.

  26. The European DataGrid Project Team, ???The DataGrid Project???, available from http://www.eu-datagrid.org. Last visited November 2004.

  27. L.A. Treinish, ???Scientific Data Models for Large-Scale Applications???, IBM Technical Report, available from http://www.research.ibm.com/people/l/lloydt/dm/. Last visited November 2004.

  28. University of Chicago, ???Globus Toolkit???, available from http://www.globus.org. Last visited January 2005.

  29. K.-Y. Whang and R. Krishnamurthy, ???The Multilevel Grid File ??? A Dynamic Hierarchical Multidimensional File Structure???, in Proc. 2nd International Symposium on Database Systems for Advanced Applications, Advanced Database Research and Development Series, Vol. 2, pp. 449???459, 1991.

  30. World Wide Web Consortium (W3C), ???Metadata at W3C???, available from http://www.w3.org/Metadata/. Last visited July 2004.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. A. James.

Rights and permissions

Reprints and permissions

About this article

Cite this article

James, H.A., Hawick, K.A. Scientific Data Management in a Grid Environment. J Grid Computing 3, 39–51 (2005). https://doi.org/10.1007/s10723-005-5464-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-005-5464-y

Keywords

Navigation