Skip to main content

EventDB: A Large-Scale Semi-structured Scientific Data Management System

  • Conference paper
  • First Online:
Big Scientific Data Management (BigSDM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11473))

Included in the following conference series:

  • 801 Accesses

Abstract

During the process of scientific research, the amount of data collected from scientific experimental devices has reached hundreds of PB per year. So how to use these data efficiently to produce some scientific findings is a hot problem. There are many challenges in the use of these scientific big data, such as the storage, processing and sharing of the data. In this paper, we propose a data management system, EventDB, for scientific big data. EventDB provides data management function for massive semi-structured scientific data; In EventDB, we propose IndexDB to provide a faster data retrieval, cross-domain access to provide a better data sharing and operator libraries to provide higher performance data analysis. Our preliminary experiments show that our system has improved performance by more than 6 times in data retrieval.

This research is supported by the National Key R&D Program of China under grant No. 2016YFB1000604.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liu, B.: High performance computing activities in hadron spectroscopy at BESIII. J. Phys.: Conf. Ser. 523(1), 012008 (2014)

    Google Scholar 

  2. Gaillard, M.: CERN Data Centre passes the 200-petabyte milestone (2017)

    Google Scholar 

  3. Brun, R., Rademakers, F.: ROOT–an object oriented data analysis framework. Nucl. Instrum. Methods Phys. Res. Sect. A: Accelerators Spectrometers Detectors Assoc. Equipment 389(1–2), 81–86 (1997)

    Article  Google Scholar 

  4. Ponz, J.D., Thompson, R.W., Munoz, J.R.: The FITS image extension. Astron. Astrophys. Suppl. Ser. 105, 53–55 (1994)

    Google Scholar 

  5. Cheng, Y., et al.: Data management challenges and event index technologies in high energy physics. J. Comput. Res. Dev. 54(2), 258–266 (2017)

    MathSciNet  Google Scholar 

  6. Girone, M., Shiers, J.: WLCG operations and the first prolonged LHC run. J. Phys.: Conf. Ser. 331(7), 072014 (2011)

    Google Scholar 

  7. Cranshaw, J., Goosens, L., Malon, D., McGlone, H., Viegas, F.T.A.: Building a scalable event-level metadata service for ATLAS. J. Phys.: Conf. Ser. 119(7), 072012 (2008)

    Google Scholar 

  8. Sánchez, J., Casaní, A.F., de la Hoz, S.G.: Distributed data collection for the ATLAS EventIndex. J. Phys: Conf. Ser. 664(4), 042046 (2015)

    Google Scholar 

  9. Lei, X., Li, Q., Sun, G.: HBase-based storage and analysis platform for high energy physics data. Comput. Eng. 41(6), 49–55 (2015)

    Google Scholar 

  10. Becla, J.: Improving performance of object oriented databases. BaBar case studies. In: CHEP Proceedings, Padova, Italy (2000)

    Google Scholar 

  11. Düllmann, D.: Petabyte databases. ACM SIGMOD Rec. 28(2), 506 (1999)

    Article  Google Scholar 

  12. Large Hadron Collider, European Organization for Nuclear Research. http://lhc.web.cern.ch/lhc/

  13. Beijing Electron-Positron Collider, institute of High Energy Physics Chinese Academy of Sciences. http://bepclab.ihep.cas.cn/

  14. Sloan Digital Sky Survey. http://www.sdss.org

  15. Nan, R., et al.: The five-hundred-meter aperture spherical radio telescope (FAST) project. Int. J. Modern Phys. D 20(06), 989–1024 (2011)

    Article  Google Scholar 

  16. Brahem, M., Lopes, S., Yeh, L., Zeitouni, K.: AstroSpark: towards a distributed data server for big data in astronomy. In: Proceedings of the 3rd ACM SIGSPATIAL Ph.D. Symposium, p. 3. ACM, October 2016

    Google Scholar 

  17. Wiley, K., et al.: Astronomy in the cloud: using mapreduce for image co-addition. Publ. Astron. Soc. Pac. 123(901), 366 (2011)

    Article  Google Scholar 

  18. Apache HBase. https://hbase.apache.org/

  19. Filesystem in Userspace. https://github.com/libfuse/libfuse

  20. Tannenbaum, T., Wright, D., Miller, K., Livny, M.: Condor: a distributed job scheduler. In: Beowulf Cluster Computing with Linux, pp. 307–350. MIT Press, November 2001

    Google Scholar 

  21. OpenPBS. http://www.openpbs.org

  22. Li, W.-D., Mao, Y.-J., Wang, Y.-F.: Chapter 2 the BES-III detector and offline software. Int. J. Modern Phys. A 24(Supp01), 9–21 (2009)

    Article  Google Scholar 

  23. The ATLAS EventIndex and its evolution based on Apache Kudu storage. https://indico.jinr.ru/getFile.py/access?contribId=199&sessionId=10&resId=0&materialId=slides&confId=447

  24. Ousterhout, J., et al.: The RAMCloud storage system. ACM Trans. Comput. Syst. (TOCS) 33(3), 7 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Qi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, W. et al. (2019). EventDB: A Large-Scale Semi-structured Scientific Data Management System. In: Li, J., Meng, X., Zhang, Y., Cui, W., Du, Z. (eds) Big Scientific Data Management. BigSDM 2018. Lecture Notes in Computer Science(), vol 11473. Springer, Cham. https://doi.org/10.1007/978-3-030-28061-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28061-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28060-4

  • Online ISBN: 978-3-030-28061-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics