Skip to main content
Log in

neCODEC: nearline data compression for scientific applications

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Advances on multicore technologies lead to processors with tens and soon hundreds of cores in a single socket, resulting in an ever growing gap between computing power and available memory and I/O bandwidths for data handling. It would be beneficial if some of the computing power can be transformed into gains of I/O efficiency, thereby reducing this speed disparity between computing and I/O. In this paper, we design and implement a NEarline data COmpression and DECompression (neCODEC) scheme for data-intensive parallel applications. Several salient techniques are introduced in neCODEC, including asynchronous compression threads, elastic file representation, distributed metadata handling, and balanced subfile distribution. Our performance evaluation indicates that neCODEC can improve the performance of a variety of data-intensive microbenchmarks and scientific applications. Particularly, neCODEC is capable of increasing the effective bandwidth of S3D, a combustion simulation code, by more than 5 times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. NetCDF-4. http://www.unidata.ucar.edu/software/netcdf

  2. The parallel virtual file system, version 2. http://www.pvfs.org/pvfs2

  3. Abbasi, H., Eisenhauer, G., Wolf, M., Schwan, K.: Datastager: scalable data staging services for petascale applications. In: HPDC ’09, New York, NY, USA (2009)

    Google Scholar 

  4. Adiga, N., Almasi, G., Almasi, G., et al.: An overview of the BlueGene/l supercomputer. In: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (Supercomputing ’02), Los Alamitos, CA, USA, pp. 1–22 (2002)

    Google Scholar 

  5. Chen, J.H., et al.: Terascale direct numerical simulations of turbulent combustion using S3D. Comput Sci. Discov. 2(1), 015001 (2009). http://stacks.iop.org/1749-4699/2/015001

    Article  Google Scholar 

  6. Cluster File System, Inc.: Lustre: a scalable, high performance file system. http://www.lustre.org/docs.html

  7. Gong, Z., Lakshminarasimhan, S., Jenkins, J., Kolla, H., Ethier, S., Chen, J., Ross, R., Klasky, S., Samatova, N.: Multi-level layout optimization for efficient spatio-temporal queries on Isabela-compressed data. In: 2012 IEEE 26th International, Parallel and Distributed Processing Symposium (IPDPS), pp. 873–884. IEEE Press, New York (2012)

    Chapter  Google Scholar 

  8. Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22(6), 789–828 (1996)

    Article  MATH  Google Scholar 

  9. Jenter, H.L., Signell, R.P.: NetCDF: a public-domain-software solution to data-access problems for numerical modelers (1992)

  10. Klasky, S., Ethier, S., Lin, Z., Martins, K., McCune, D., Samtaney, R.: Grid -based parallel data streaming implemented for the gyrokinetic toroidal code. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing (SC’03), p. 24, Washington, DC, USA, (2003). http://portal.acm.org/citation.cfm?id=1048935.1050175

    Chapter  Google Scholar 

  11. Lakshminarasimhan, S., Shah, N., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.: Compressing the incompressible with Isabela: in-situ reduction of spatio-temporal data. In: Euro-Par 2011 Parallel Processing, pp. 366–379 (2011)

    Chapter  Google Scholar 

  12. Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S., Chang, C., Klasky, S., Latham, R., Ross, R., Samatova, N.: Isabela for effective in situ compression of scientific data. Concurr. Comput. 25, 524–540 (2013)

    Article  Google Scholar 

  13. Li, J., Liao, W., Choudhary, A., Ross, R., Thakur, R., Gropp, W., Latham, R.: Parallel netCDF: a high performance scientific I/O interface. In: Proceedings of the Supercomputing ’03 (2003)

    Google Scholar 

  14. Liao, W.k., Choudhary, A.: Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08), Piscataway, NJ, USA, pp. 1–12 (2008)

    Google Scholar 

  15. Lofstead, J., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible I/O and integration for scientific codes through the adaptable I/O system (adios). In: 6th International Workshop on Challenges of Large Applications in Distributed Environments, Boston, MA (2008)

    Google Scholar 

  16. Lofstead, J., Zheng, F., Klasky, S., Schwan, K.: Adaptable, metadata rich IO methods for portable high performance IO. In: Parallel and Distributed Processing International Symposium, pp. 1–10 (2009)

    Google Scholar 

  17. Ma, X., Winslett, M., Lee, J., Yu, S.: Improving MPI–IO output performance with active buffering plus threads. In: Proceedings of International Parallel and Distributed Processing Symposium, p. 10 (2003). doi:10.1109/IPDPS.2003.1213165

    Google Scholar 

  18. Park, K., Ihm, S., Bowman, M., Pai, V.S.: Supporting practical content-addressable caching with czip compression. In: 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference (ATC’07), Berkeley, CA, USA, pp. 1–14 (2007)

    Google Scholar 

  19. Prost, J.P., Treumann, R., Hedges, R., Jia, B., Koniges, A.: MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS. In: Proceedings of Supercomputing’01 (2001)

    Google Scholar 

  20. Thakur, R., Ross, R., Latham, R., Lusk, R., Gropp, B.: Romio: a high-performance, portable MPI-IO implementation (2012). http://www.mcs.anl.gov/research/projects/romio/

  21. Ross, R.: Parallel I/O benchmarking consortium. http://www-unix.mcs.anl.gov/rross/pio-benchmark/html/

  22. Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: FAST’02, pp. 231–244. USENIX, Berkeley (2002)

    Google Scholar 

  23. Tatebe, O., Morita, Y., Matsuoka, S., Soda, N., Sekiguchi, S.: Grid datafarm architecture for petascale data intensive computing. In: Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID’02), Washington, DC, USA, p. 102 (2002)

    Chapter  Google Scholar 

  24. Thakur, R., Choudhary, A.: An extended two-phase method for accessing sections of out-of-core arrays. Sci. Program. 5(4), 301–317 (1996)

    Google Scholar 

  25. Thakur, R., Gropp, W., Lusk, E.: An abstract-device interface for implementing portable paralle-I/O interfaces. In: Proceedings of the Sixth Symposium on the Frontiers of Massively Parallel Computation (Frontiers ’96) (1996). http://www.mcs.anl.gov/home/thakur/adio.ps

    Google Scholar 

  26. Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings of the Seventh Symposium on the Frontiers of Massively Parallel Computation, pp. 182–189 (1999)

    Chapter  Google Scholar 

  27. Thakur, R., Gropp, W., Lusk, E.: On implementing MPI–IO portably and with high performance. In: Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems, pp. 23–32. ACM Press, New York (1999)

    Chapter  Google Scholar 

  28. The National Center for SuperComputing. HDF5 home page. http://hdf.ncsa.uiuc.com/HPD5/

  29. Vilayannur, M., Nath, P., Sivasubramaniam, A.: Providing tunable consistency for a parallel file store. In: Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies (FAST’05), Berkeley, CA, USA, pp. 2 (2005)

    Google Scholar 

  30. Wong, P., Van der Wijngaart, R.F.: NAS parallel benchmarks I/O, version 2.4. Tech. rep. NAS-03-002, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division

  31. Yu, W., Vetter, J.: ParColl: partitioned collective I/O on the cray XT. In: International Conference on Parallel Processing (ICPP’08), Portland, OR (2008)

    Google Scholar 

  32. Yu, W., Vetter, J., Canon, R., Jiang, S.: Exploiting lustre file joining for effective collective I/O. In: 7th Int’l Conference on Cluster Computing and Grid (CCGrid’07), Rio de Janeiro, Brazil (2007)

    Google Scholar 

  33. Yu, W., Vetter, J., Oral, H.: Performance characterization and optimization of parallel I/O on the cray XT. In: 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’08), Miami, FL (2008)

    Google Scholar 

  34. Zheng, F., et al.: Predata—preparatory data analytics on peta-scale machines. In: IPDPS, Atlanta, GA (2010)

    Google Scholar 

Download references

Acknowledgements

This work is funded in part by National Science Foundation awards CNS-0917137 and CNS-1059376. This research is sponsored in part by the Office of Advanced Scientific Computing Research; U.S. Department of Energy. This research is conducted with high performance computational resources provided by the Louisiana Optical Network Initiative (http://www.loni.org). We are very grateful for the technical support from the LONI team.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weikuan Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, Y., Xu, C., Yu, W. et al. neCODEC: nearline data compression for scientific applications. Cluster Comput 17, 475–486 (2014). https://doi.org/10.1007/s10586-013-0265-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-013-0265-8

Keywords

Navigation