Data Allocation with Neural Similarity Estimation for Data-Intensive Computing

Vamosi, Ralf; Schikuta, Erich

doi:10.1007/978-3-031-08757-8_45

Ralf Vamosi¹³ &
Erich Schikuta¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13352))

Included in the following conference series:

International Conference on Computational Science

1298 Accesses

Abstract

Science collaborations such as ATLAS at the high-energy particle accelerator at CERN use a computer grid to run expensive computational tasks on massive, distributed data sets.

Dealing with big data on a grid demands workload management and data allocation to maintain a continuous workflow. Data allocation in a computer grid necessitates some data placement policy that is conditioned on the resources of the system and the usage of data.

In part, automatic and manual data policies shall achieve a short time-to-result. There are efforts to improve data policies. Data placement/allocation is vital to coping with the increasing amount of data processing in different data centers. A data allocation/placement policy decides which locations sub-sets of data are to be placed.

In this paper, a novel approach copes with the bottleneck related to wide-area file transfers between data centers and large distributed data sets with high dimensionality. The model estimates similar data with a neural network on sparse and uncertain observations and then proceeds with the allocation process. The allocation process comprises evolutionary data allocation for finding near-optimal solutions and improves over 5% on network transfers for the given data centers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdel-Ghaffar, K.A.S., Abbadi, A.E.: Optimal allocation of two-dimensional data (extended abstract). In: Afrati, F., Kolaitis, P. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 409–418. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-62222-5_60
Chapter Google Scholar
Atallah, M.J., Prabhakar, S.: (almost) Optimal parallel block access to range queries. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 205–215. ACM (2000)
Google Scholar
Atlas, C., et al.: Atlas computing: technical design report (2005)
Google Scholar
Beermann, T., et al.: Methods of data popularity evaluation in the atlas experiment at the LHC. In: EPJ Web of Conferences (2021)
Google Scholar
Bell, D.A.: Difficult data placement problems. Comput. J. 27(4), 315–320 (1984)
Article MathSciNet Google Scholar
Berchtold, S., Böhm, C., Braunmüller, B., Keim, D.A., Kriegel, H.P.: Fast parallel similarity search in multimedia databases. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. SIGMOD 1997, pp. 1–12. ACM, New York (1997). https://doi.org/10.1145/253260.253263
Bonacorsi, D., et al.: Exploiting CMS data popularity to model the evolution of data management for run-2 and beyond. J. Phys. Conf. Ser. 664, 032003 (2015). IOP Publishing
Google Scholar
Chang, R.S., Chang, H.P.: A dynamic data replication strategy using access-weights in data grids. J. Supercomput. 45(3), 277–295 (2008)
Article Google Scholar
Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11(3) (2010)
Google Scholar
Collaboration, A., et al.: The atlas experiment at the cern large hadron collider (2008)
Google Scholar
Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15(3), 200–222 (2001)
Article Google Scholar
Guo, W., Wang, X.: A data placement strategy based on genetic algorithm in cloud computing platform. In: 2013 10th Web Information System and Application Conference (WISA), pp. 369–372. IEEE (2013)
Google Scholar
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24261-3_7
Chapter Google Scholar
Liu, Y., Liu, Z., Kettimuthu, R., Rao, N., Chen, Z., Foster, I.: Data transfer between scientific facilities-bottleneck analysis, insights and optimizations. In: 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 122–131. IEEE (2019)
Google Scholar
Megino, F.B., et al.: Implementing data placement strategies for the CMS experiment based on a popularity model. J. Phys. Conf. Ser. 396, 032047 (2012). IOP Publishing
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition (2015)
Google Scholar
Ram, S., Marsten, R.E.: A model for database allocation incorporating a concurrency control mechanism. IEEE Trans. Knowl. Data Eng. 3(3), 389–395 (1991)
Article Google Scholar
Sato, H., Matsuoka, S., Endo, T.: File clustering based replication algorithm in a grid environment. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 204–211. IEEE Computer Society (2009)
Google Scholar
Sato, H., Matsuoka, S., Endo, T., Maruyama, N.: Access-pattern and bandwidth aware file replication algorithm in a grid environment. In: Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing, pp. 250–257. IEEE Computer Society (2008)
Google Scholar
Spiga, D., Giordano, D., Barreiro Megino, F.H.: Optimizing the usage of multi-petabyte storage resources for LHC experiments. In: Proceedings of the EGI Community Forum 2012/EMI Second Technical Conference (EGICF12-EMITC2), 26–30 March 2012. Munich, Germany (2012). https://pos.sissa.it/162/107/
Vamosi, R., Lassnig, M., Schikuta, E.: Data allocation service ADAS for the data rebalancing of atlas. In: EPJ Web of Conferences, vol. 214, p. 06012. EDP Sciences (2019)
Google Scholar
Wang, J.Y., Jea, K.F.: A near-optimal database allocation for reducing the average waiting time in the grid computing environment. Inf. Sci. 179(21), 3772–3790 (2009)
Article MathSciNet MATH Google Scholar
Wang, J., et al.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393 (2014)
Google Scholar
Weinberger, K.Q., Sha, F., Saul, L.K.: Convex optimizations for distance metric learning and pattern classification [applications corner]. IEEE Sig. Process. Mag. 27(3), 146–158 (2010)
Article Google Scholar
Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Futur. Gener. Comput. Syst. 26(8), 1200–1214 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, University of Vienna, Vienna, Austria
Ralf Vamosi & Erich Schikuta

Authors

Ralf Vamosi
View author publications
You can also search for this author in PubMed Google Scholar
Erich Schikuta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erich Schikuta .

Editor information

Editors and Affiliations

Brunel University London, London, UK
Derek Groen
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vamosi, R., Schikuta, E. (2022). Data Allocation with Neural Similarity Estimation for Data-Intensive Computing. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13352. Springer, Cham. https://doi.org/10.1007/978-3-031-08757-8_45

Download citation

DOI: https://doi.org/10.1007/978-3-031-08757-8_45
Published: 15 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08756-1
Online ISBN: 978-3-031-08757-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics