Abstract
Active Storage provides an opportunity for reducing the bandwidth requirements between the storage and compute elements of current supercomputing systems, and leveraging the processing power of the storage nodes used by some modern file systems. To achieve both objectives, Active Storage allows certain processing tasks to be performed directly on the storage nodes, near the data they manage. However, Active Storage must also support key requirements of scientific applications. In particular, Active Storage must be able to support striped files and files with complex formats (e.g., netCDF). In this paper, we describe how these important requirements can be addressed. The experimental results on a Lustre file system not only show that our proposal can reduce the network traffic to near zero and scale the performance with the number of storage nodes, but also that it provides an efficient treatment of striped files and can manage files with complex data structures.
This work was supported by the DoE, Office of Advanced Scientific Computing Research, at the Pacific Northwest National Laboratory (a multiprogram national laboratory operated by Battelle for the U.S. DoE under Contract DE-AC06-76RL01830), and by the Spanish MEC and European Comission FEDER funds under grants “Consolider Ingenio–2010 CSD2006–00046”, and “TIN2006–15516–C04–03”.
Chapter PDF
References
Cluster File Systems Inc.: Lustre: A scalable, high-performance file system (2002), http://www.lustre.org
Carns, P.H., Ligon III, W.B., Ross, R.B., Thakur, R.: PVFS: a parallel file system for Linux clusters. In: Proc. of 4th Annual Linux Showcase and Con., pp. 317–327 (2000)
Felix, E.J., Fox, K., Regimbal, K., Nieplocha, J.: Active Storage processing in a parallel file system. In: Proc. of the 6th LCI International Conference on Linux Clusters: The HPC Revolution (2006)
Piernas, J., Nieplocha, J., Felix, E.J.: Evaluation of Active Storage strategies for the Lustre parallel file system. In: Proc. of 2007 Supercomp. Conf (SC 2007) (2007)
Acharya, A., Uysal, M., Saltz, J.: Active disks: Programming model, algorithms and evaluation. In: Proc. of the ACM ASPLOS Conference, pp. 81–91 (1998)
Chiu, S.C., keng Liao, W., Choudhary, A.N.: Design and evaluation of distributed smart disk architecture for I/O-intensive workloads. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J., Zomaya, A.Y. (eds.) ICCS 2003. LNCS, vol. 2660, pp. 230–241. Springer, Heidelberg (2003)
Keeton, K., Patterson, D.A., Hellerstein, J.M.: A case for intelligent disks (IDISKs). SIGMOD Record 24(7), 42–52 (1998)
Riedel, E., Gibson, G., Faloutsos, C.: Active storage for large-scale data mining and multimedia. In: Proc. of the 24th Int. Conf. on Very Large Data Bases (VLDB), pp. 62–73 (1998)
DeWitt, D.J., Hawthorn, P.: A performance evaluation of database machine architectures. In: Proc. of the 7th Int. Conf. on Very Large Data Bases (VLDB), pp. 199–214 (1981)
Lim, H., Kapoor, V., Wighe, C., Du, D.H.: Active disk file system: A distributed, scalable file system. In: Proc. of the 18th IEEE Symposium on Mass Storage Systems and Technologies, San Diego, pp. 101–115 (2001)
Gibson, G.A., Nagle, D.F., Amiri, K., Chang, F.W., Feinberg, E.M., Gobioff, H., Lee, C., Ozceri, B., Riedel, E., Rochberg, D., Zelenka, J.: File server scaling with network-attached secure disks. In: Proc. of the 1997 ACM SIGMETRICS Intl. Conf. on Measurement and Modeling of Comp. Systems, pp. 272–284 (1997)
Mesnier, M., Ganger, G., Riedel, E.: Object-based storage. IEEE Communications Magazine 41(8), 84–90 (2005)
Schlosser, S.W., Iren, S.: Database storage management with object-based storage devices. In: Proc. of the First International Workshop on Data Management on New Hardware (DaMoN) (2005)
Du, D.H.: Intelligent storage for information retrieval. In: Proc. of the Intl. Conference on Next Generation Web Services Practices (NWeSP 2005), pp. 214–220 (2005)
Rew, R.K., Davis, G.P.: NetCDF: An interface for scientific data access. IEEE Computer Graphics and Applications 10(4), 76–82 (1990)
Schuchardt, K., Palmer, B., Daily, J., Elsethagen, T., Koontz, A.: IO strategies and data services for petascale data sets from a global cloud resolving model. Journal of Physics: Conference Series 78(012089) (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Piernas, J., Nieplocha, J. (2008). Efficient Management of Complex Striped Files in Active Storage. In: Luque, E., Margalef, T., Benítez, D. (eds) Euro-Par 2008 – Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85451-7_72
Download citation
DOI: https://doi.org/10.1007/978-3-540-85451-7_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85450-0
Online ISBN: 978-3-540-85451-7
eBook Packages: Computer ScienceComputer Science (R0)