Abstract
To execute scientific applications and simulations of enormous scale, the computing paradigm is evolving into one of cluster computing and cloud computing that can exploit the large number of available computing resources. To maximize the utilization of them, company or research center needs a scheduler engine and its data space to construct a cluster computing environment. However, if certain data space is shared, problems related to the security of node, the network traffic imbalance between nodes, and the data protection could arise. To solve these issues, a manager synchronizing the shared data space for the nodes that constitute a cluster computing environment is designed. The synchronization manager shares data in two ways: First, under the cluster environment, the full synchronization group can mount a specific directory space of the master node via NFS. It is used for the data which can be globally referenced. Second, the partial synchronization group delivers data to assigned workers through rsync. It can be used to locally share data for the isolation. The partial synchronization group is superior to full synchronization group in security and efficiency because data are shared in separate manner. By applying adequate data-sharing method, the designed manager efficiently mediate sharing data as purposed.
Similar content being viewed by others
References
Topcuoglu H, Hariri S, Min-You Wu (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13:260–274
Yu J, Buyya R (2005) A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec 34(3):44–49
Ezell SJ, Atkinson RD (2016) The vital importance of high-performance computing to US competitiveness. Information Technology and Innovation Foundation, Washington, DC. http://www2.itif.org/2016-high-performance-computing.pdf
Buschettu A, Sanna D, Concas G, Pani FE (2015) A platform based on kanban to build taxonomies and folksonomies for DMS and CSS. J Convergence 6(1):1–8
Keegan N, Ji SY, Chaudhary A, Concolato C, Yu B, Jeong DH (2016) A survey of cloud-based network intrusion detection analysis. Human-centric Comput Inform Sci 6(1):19
Zhu W, Lee C (2016) A security protection framework for cloud computing. J Inf Process Syst 12(3):538–547
Elastic Compute Cloud (EC2) (2017). http://aws.amazon.com/ec2
Son of Grid Engine (2017). https://arc.liv.ac.uk/trac/SGE
Windows Subsystem for Linux (2017). https://msdn.microsoft.com/en-us/commandline/w-sl/install_guide
Oracle Grid Engine (2017). http://www.oracle.com
Univa Grid Engine (2017). http://www.univa.com/products
Open Grid Scheduler (2017). http://gridscheduler.sourceforge.net
Reducing and Eliminating NFS Usage by Grid Engine (2017). http://arc.liv.ac.uk/SGE/howto/nfsreduce.html
RSYNC (2017). https://rsync.samba.org
Linux NFS-HOWTO (2017). http://www.tldp.org/HOWTO/NFS-HOWTO/server.html
Apache (2017). https://www.apache.org/
PHP (2017). http://php.net
MySQL (2017). https://www.mysql.com
Qhost (2017). http://gridscheduler.sourceforge.net/htmlman/htmlman1/qhost.html
Qstat (2017). http://gridscheduler.sourceforge.net/htmlman/htmlman1/qstat.html
mysql-connector-python (2017). https://pypi.python.org/pypi/mysql-connector-python/2.0.4
python-daemon (2017). https://pypi.python.org/pypi/python-daemon
python-lockfile (2017). https://pypi.python.org/pypi/lockfile/0.9.1
Acknowledgements
This work was supported by Korea Institute of Science and Technology Information (KISTI) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2016R1C1B1008330).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jung, D., Lee, D., Kim, M. et al. Efficient data synchronization method on integrated computing environment. J Supercomput 75, 4252–4266 (2019). https://doi.org/10.1007/s11227-018-2445-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2445-z