Abstract
During the process of scientific research, the amount of data collected from scientific experimental devices has reached hundreds of PB per year. So how to use these data efficiently to produce some scientific findings is a hot problem. There are many challenges in the use of these scientific big data, such as the storage, processing and sharing of the data. In this paper, we propose a data management system, EventDB, for scientific big data. EventDB provides data management function for massive semi-structured scientific data; In EventDB, we propose IndexDB to provide a faster data retrieval, cross-domain access to provide a better data sharing and operator libraries to provide higher performance data analysis. Our preliminary experiments show that our system has improved performance by more than 6 times in data retrieval.
This research is supported by the National Key R&D Program of China under grant No. 2016YFB1000604.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liu, B.: High performance computing activities in hadron spectroscopy at BESIII. J. Phys.: Conf. Ser. 523(1), 012008 (2014)
Gaillard, M.: CERN Data Centre passes the 200-petabyte milestone (2017)
Brun, R., Rademakers, F.: ROOT–an object oriented data analysis framework. Nucl. Instrum. Methods Phys. Res. Sect. A: Accelerators Spectrometers Detectors Assoc. Equipment 389(1–2), 81–86 (1997)
Ponz, J.D., Thompson, R.W., Munoz, J.R.: The FITS image extension. Astron. Astrophys. Suppl. Ser. 105, 53–55 (1994)
Cheng, Y., et al.: Data management challenges and event index technologies in high energy physics. J. Comput. Res. Dev. 54(2), 258–266 (2017)
Girone, M., Shiers, J.: WLCG operations and the first prolonged LHC run. J. Phys.: Conf. Ser. 331(7), 072014 (2011)
Cranshaw, J., Goosens, L., Malon, D., McGlone, H., Viegas, F.T.A.: Building a scalable event-level metadata service for ATLAS. J. Phys.: Conf. Ser. 119(7), 072012 (2008)
Sánchez, J., Casaní, A.F., de la Hoz, S.G.: Distributed data collection for the ATLAS EventIndex. J. Phys: Conf. Ser. 664(4), 042046 (2015)
Lei, X., Li, Q., Sun, G.: HBase-based storage and analysis platform for high energy physics data. Comput. Eng. 41(6), 49–55 (2015)
Becla, J.: Improving performance of object oriented databases. BaBar case studies. In: CHEP Proceedings, Padova, Italy (2000)
Düllmann, D.: Petabyte databases. ACM SIGMOD Rec. 28(2), 506 (1999)
Large Hadron Collider, European Organization for Nuclear Research. http://lhc.web.cern.ch/lhc/
Beijing Electron-Positron Collider, institute of High Energy Physics Chinese Academy of Sciences. http://bepclab.ihep.cas.cn/
Sloan Digital Sky Survey. http://www.sdss.org
Nan, R., et al.: The five-hundred-meter aperture spherical radio telescope (FAST) project. Int. J. Modern Phys. D 20(06), 989–1024 (2011)
Brahem, M., Lopes, S., Yeh, L., Zeitouni, K.: AstroSpark: towards a distributed data server for big data in astronomy. In: Proceedings of the 3rd ACM SIGSPATIAL Ph.D. Symposium, p. 3. ACM, October 2016
Wiley, K., et al.: Astronomy in the cloud: using mapreduce for image co-addition. Publ. Astron. Soc. Pac. 123(901), 366 (2011)
Apache HBase. https://hbase.apache.org/
Filesystem in Userspace. https://github.com/libfuse/libfuse
Tannenbaum, T., Wright, D., Miller, K., Livny, M.: Condor: a distributed job scheduler. In: Beowulf Cluster Computing with Linux, pp. 307–350. MIT Press, November 2001
OpenPBS. http://www.openpbs.org
Li, W.-D., Mao, Y.-J., Wang, Y.-F.: Chapter 2 the BES-III detector and offline software. Int. J. Modern Phys. A 24(Supp01), 9–21 (2009)
The ATLAS EventIndex and its evolution based on Apache Kudu storage. https://indico.jinr.ru/getFile.py/access?contribId=199&sessionId=10&resId=0&materialId=slides&confId=447
Ousterhout, J., et al.: The RAMCloud storage system. ACM Trans. Comput. Syst. (TOCS) 33(3), 7 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, W. et al. (2019). EventDB: A Large-Scale Semi-structured Scientific Data Management System. In: Li, J., Meng, X., Zhang, Y., Cui, W., Du, Z. (eds) Big Scientific Data Management. BigSDM 2018. Lecture Notes in Computer Science(), vol 11473. Springer, Cham. https://doi.org/10.1007/978-3-030-28061-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-28061-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28060-4
Online ISBN: 978-3-030-28061-1
eBook Packages: Computer ScienceComputer Science (R0)