Abstract
Science collaborations such as ATLAS at the high-energy particle accelerator at CERN use a computer grid to run expensive computational tasks on massive, distributed data sets.
Dealing with big data on a grid demands workload management and data allocation to maintain a continuous workflow. Data allocation in a computer grid necessitates some data placement policy that is conditioned on the resources of the system and the usage of data.
In part, automatic and manual data policies shall achieve a short time-to-result. There are efforts to improve data policies. Data placement/allocation is vital to coping with the increasing amount of data processing in different data centers. A data allocation/placement policy decides which locations sub-sets of data are to be placed.
In this paper, a novel approach copes with the bottleneck related to wide-area file transfers between data centers and large distributed data sets with high dimensionality. The model estimates similar data with a neural network on sparse and uncertain observations and then proceeds with the allocation process. The allocation process comprises evolutionary data allocation for finding near-optimal solutions and improves over 5% on network transfers for the given data centers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdel-Ghaffar, K.A.S., Abbadi, A.E.: Optimal allocation of two-dimensional data (extended abstract). In: Afrati, F., Kolaitis, P. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 409–418. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-62222-5_60
Atallah, M.J., Prabhakar, S.: (almost) Optimal parallel block access to range queries. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 205–215. ACM (2000)
Atlas, C., et al.: Atlas computing: technical design report (2005)
Beermann, T., et al.: Methods of data popularity evaluation in the atlas experiment at the LHC. In: EPJ Web of Conferences (2021)
Bell, D.A.: Difficult data placement problems. Comput. J. 27(4), 315–320 (1984)
Berchtold, S., Böhm, C., Braunmüller, B., Keim, D.A., Kriegel, H.P.: Fast parallel similarity search in multimedia databases. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. SIGMOD 1997, pp. 1–12. ACM, New York (1997). https://doi.org/10.1145/253260.253263
Bonacorsi, D., et al.: Exploiting CMS data popularity to model the evolution of data management for run-2 and beyond. J. Phys. Conf. Ser. 664, 032003 (2015). IOP Publishing
Chang, R.S., Chang, H.P.: A dynamic data replication strategy using access-weights in data grids. J. Supercomput. 45(3), 277–295 (2008)
Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11(3) (2010)
Collaboration, A., et al.: The atlas experiment at the cern large hadron collider (2008)
Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15(3), 200–222 (2001)
Guo, W., Wang, X.: A data placement strategy based on genetic algorithm in cloud computing platform. In: 2013 10th Web Information System and Application Conference (WISA), pp. 369–372. IEEE (2013)
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24261-3_7
Liu, Y., Liu, Z., Kettimuthu, R., Rao, N., Chen, Z., Foster, I.: Data transfer between scientific facilities-bottleneck analysis, insights and optimizations. In: 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 122–131. IEEE (2019)
Megino, F.B., et al.: Implementing data placement strategies for the CMS experiment based on a popularity model. J. Phys. Conf. Ser. 396, 032047 (2012). IOP Publishing
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition (2015)
Ram, S., Marsten, R.E.: A model for database allocation incorporating a concurrency control mechanism. IEEE Trans. Knowl. Data Eng. 3(3), 389–395 (1991)
Sato, H., Matsuoka, S., Endo, T.: File clustering based replication algorithm in a grid environment. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 204–211. IEEE Computer Society (2009)
Sato, H., Matsuoka, S., Endo, T., Maruyama, N.: Access-pattern and bandwidth aware file replication algorithm in a grid environment. In: Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing, pp. 250–257. IEEE Computer Society (2008)
Spiga, D., Giordano, D., Barreiro Megino, F.H.: Optimizing the usage of multi-petabyte storage resources for LHC experiments. In: Proceedings of the EGI Community Forum 2012/EMI Second Technical Conference (EGICF12-EMITC2), 26–30 March 2012. Munich, Germany (2012). https://pos.sissa.it/162/107/
Vamosi, R., Lassnig, M., Schikuta, E.: Data allocation service ADAS for the data rebalancing of atlas. In: EPJ Web of Conferences, vol. 214, p. 06012. EDP Sciences (2019)
Wang, J.Y., Jea, K.F.: A near-optimal database allocation for reducing the average waiting time in the grid computing environment. Inf. Sci. 179(21), 3772–3790 (2009)
Wang, J., et al.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393 (2014)
Weinberger, K.Q., Sha, F., Saul, L.K.: Convex optimizations for distance metric learning and pattern classification [applications corner]. IEEE Sig. Process. Mag. 27(3), 146–158 (2010)
Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Futur. Gener. Comput. Syst. 26(8), 1200–1214 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vamosi, R., Schikuta, E. (2022). Data Allocation with Neural Similarity Estimation for Data-Intensive Computing. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13352. Springer, Cham. https://doi.org/10.1007/978-3-031-08757-8_45
Download citation
DOI: https://doi.org/10.1007/978-3-031-08757-8_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08756-1
Online ISBN: 978-3-031-08757-8
eBook Packages: Computer ScienceComputer Science (R0)