Skip to main content

HDF5-Based I/O Optimization for Extragalactic HI Data Pipeline of FAST

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11945))

Abstract

The Five-hundred-meter Aperture Spherical Radio Telescope (FAST), which is the largest single-dish radio telescope in the world, has been producing a very large data volume with high speed. So it requires a high performance data pipeline to covert the huge raw observed data to science data product. However, the existing solutions of pipelines widely used in radio data processing cannot tackle this situation efficiently. The paper proposes a pipeline architecture for FAST based on HDF5 format and several I/O optimization strategies. First, we design the workflow engine driving the various tasks efficiently in the pipeline; second, we design a common radio data storage specification on the top of HDF5 format, and also developed a fast converter to map the original FITS format to the new HDF5 format; third, we apply several concrete strategies to optimize the I/O operations, including chunks storage, parallel reading/writing, on-demand dump, and stream process etc. In the experiment of processing 700 GB of FAST data, the results show that HDF5 based data structure without other optimizations was 1.7 times faster than original FITS format. If chunk storage and parallel I/O optimization are applied, the overall performance can reach 4.5 times as the original one. Moreover, due to the good expansibility and flexibility, our solution of FAST pipeline can be adapted to other radio telescopes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Anderson, K., Alexov, A., Baehren, L., Griessmeier, J.M., Renting, A.: LOFAR and HDF5: toward a new radio data standard. Astron. Data Anal. Softw. Syst. XX 442, 53–56 (2010)

    Google Scholar 

  2. Bacon, R., et al.: The second-generation VLT instrument muse: science drivers and instrument design. In: Proceedings of SPIE - The International Society for Optical Engineering, pp. 1145–1149 (2004)

    Google Scholar 

  3. Ballester, P., et al.: Data reduction pipelines for the very large telescope. Proc. SPIE - Int. Soc. Opt. Eng. 22(2), 85–98 (2006)

    Google Scholar 

  4. Chen, Y., Winslett, M., Yong, C., Kuo, S.W.: Automatic parallel I/O performance optimization in Panda. In: Proceedings of Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 108–118 (1998)

    Google Scholar 

  5. Davis, L.E.: An overview of the ALMA pipeline system. In: Astronomical Data Analysis Software and Systems XVIII ASP Conference Series, vol. 411, p. 306 (2009)

    Google Scholar 

  6. Davis, L.E., Glendenning, B.E., Tody, D.: The ALMA prototype science pipeline. Astron. Data Anal. Softw. Syst. XIII 314, 89 (2004)

    Google Scholar 

  7. Folk, M., Heber, G., Koziol, Q., Pourmal, E., Robinson, D.: An overview of the HDF5 technology suite and its applications. In: EDBT/ICDT Workshop on Array Databases, pp. 36–47 (2011)

    Google Scholar 

  8. Fridman, P.A., Baan, W.A.: RFI mitigation methods in radio astronomy. Astron. Astrophys. 378, 327–344 (2001)

    Article  Google Scholar 

  9. Group, H.: The board of trustees of the University of Illinois: “introduction to HDF5” (2006). http://web.mit.edu/fwtools_v3.1.0/www/H5.intro.html

  10. Yan, J., et al.: Optimized data layout for spatio-temporal data in time domain astronomy. In: Ibrahim, S., Choo, K.-K.R., Yan, Z., Pedrycz, W. (eds.) ICA3PP 2017. LNCS, vol. 10393, pp. 431–440. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65482-9_30

    Chapter  Google Scholar 

  11. Ma, X., Jiao, X., Campbell, M.T., Winslett, M.: Flexible and efficient parallel I/O for large-scale multi-component simulations. In: International Parallel and Distributed Processing Symposium (2003)

    Google Scholar 

  12. Madhyastha, T.M., Reed, D.A.: Exploiting Global Input/Output Access Pattern Classification. In: Supercomputing, ACM/IEEE Conference (1997)

    Google Scholar 

  13. Masui, K., et al.: A compression scheme for radio data in high performance computing. Astron. Comput. 12, 181–190 (2015)

    Article  Google Scholar 

  14. McMullin, J.P., et al.: CASA architecture and applications. In: Astronomical Data Analysis Software and Systems XVI, Vol. 376 (2007)

    Google Scholar 

  15. Nan, R.: Five hundred meter aperture spherical radio telescope (FAST). Sci. China 49(2), 129–148 (2006)

    Article  Google Scholar 

  16. Pence, W.D., Chiappetti, L., Page, C.G., Shaw, R.A., Stobie, E.: Definition of the flexible image transport system (FITS), version 3.0. Astron. Astrophys. 524, 10 (2010)

    Article  Google Scholar 

  17. Price, D.C., Barsdell, B.R., Greenhill, L.J.: HDFITS: porting the FITS data model to HDF5. Astron. Comput. 12, 212–220 (2015)

    Article  Google Scholar 

  18. Luo, G., et al.: HyGrid: a CPU-GPU hybrid convolution-based gridding algorithm in radio astronomy. In: Vaidya, J., Li, J. (eds.) ICA3PP 2018. LNCS, vol. 11334, pp. 621–635. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05051-1_43

    Chapter  Google Scholar 

  19. Rodrigues, J.E., Rodriguez Bezos, J.E.: A graph model for parallel computation. Massachusetts Institute of Technology (1969)

    Google Scholar 

  20. Sanders, P.: Asynchronous scheduling of redundant disk array. IEEE Trans. Comput. 52(9), 1170–1184 (2000)

    Article  Google Scholar 

  21. Bardeau, S., Pety, J.: CLASS: continuum and line analysis single-dish software, a GILDAS software. https://www.iram.fr/IRAMFR/GILDAS/doc/html/class-html/. Accessed 21 Nov 2006

  22. Schaaf, R., Brazier, A., Jenness, T., Nikola, T., Shepherd, M.: A new HDF5 based raw data model for CCAT. Eprint Arxiv (2014)

    Google Scholar 

  23. Smith, S., Dunning, A., Bowen, M., Hellicar, A.D.: Analysis of the five-hundred-metre aperture spherical radio telescope with a 19-element multibeam feed. In: IEEE International Symposium on Antennas and Propagation, pp. 383–384 (2016)

    Google Scholar 

  24. Swinbank, J.D., et al.: The lofar transients pipeline. Astron. Comput. 11, 25–48 (2015)

    Article  Google Scholar 

  25. Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Symposium on the Frontiers of Massively Parallel Computation (1999)

    Google Scholar 

  26. Wells, W.D., Greisen, E.W., Harten, R.H.: FITS-a flexible image transport system. Astron. Astrophys. Suppl. Ser. 44, 363 (1981)

    Google Scholar 

  27. Wu, C., et al.: DALiuGE: a graph execution framework for harnessing the astronomical data deluge. Astron. Comput. 20, 1–15 (2017)

    Article  Google Scholar 

  28. Zichao, Y., et al.: An energy efficient storage system for astronomical observation data on dome A. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 33–46 (2015)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the Joint Research Fund in Astronomy (U1731125, U1731243, 11903056) under cooperative agreement between the National Natural Science Foundation of China (NSFC) and Chinese Academy of Sciences (CAS), the National Natural Science Foundation of China (11573019). BZ is supported by Open Project Program of the Key Laboratory of FAST, NAOC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Xiao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ji, Y., Yu, C., Xiao, J., Tang, S., Wang, H., Zhang, B. (2020). HDF5-Based I/O Optimization for Extragalactic HI Data Pipeline of FAST. In: Wen, S., Zomaya, A., Yang, L.T. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2019. Lecture Notes in Computer Science(), vol 11945. Springer, Cham. https://doi.org/10.1007/978-3-030-38961-1_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-38961-1_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-38960-4

  • Online ISBN: 978-3-030-38961-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics