Abstract
The large-scale communication systems and computer networks provide flexible, efficient, and highly available services to their users. However, the practical large-scale systems result in unpredictable, fault-tolerant, often detrimental outcomes. This leads to developing and designing analytical models to understand and predict of complex system behaviour in order to ensure availability of large-scale systems. In this paper, analytical modelling and optimization analysis are presented for large-scale systems. The key contribution of this paper is twofold. First, a generic approximate solution approach is adapted and developed for performability modelling which considers performance and availability issues of large number of nodes with multi-repairmen. The analytical model and solution presented here are capable of considering large number of nodes up to thousands and able to incorporate availability issues of the system. Second and foremost, the relationship between the number of nodes and the number of repairmen is presented with an optimization analysis for large-scale systems. In order to show the efficacy and the accuracy of the proposed approach, the results obtained from the analytical model is validated with the results obtained from the simulations.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Liang M, Gang Z, Dongxia W, Minhuan H, Xiang L, Qing M, Fei X (2015) A novel method for survivability test based on end nodes in large scale network. KSII Trans Internet Inf Syst 9(2):620–636
Maleki RE, Trivedi K, Movaghar A (2015) Performability evaluation of grid environments using stochastic reward nets. Trans. Depend. Secure Comput. 12(2):204–216
Shawky D (2014) Scalable approach to failure analysis of high-performance computing systems. ETRI J 36(6):1023–1031
Kim J, Ahn H, Park M, Kim S, Kim KP (2016) An estimated closeness centrality ranking algorithm and its performance analysis in large-scale workflow-supported social networks. KSII Trans Internet Inf Syst 10(3):1454–1466
Sanei H (2006) Approximate solution for 2-dimensional Markov processes modelling multi-server systems prone to breakdowns. Ph.D. thesis, School of Computing Science, Middlesex University London, UK
Dabrowski C, Fern H (2009) Markov chain analysis for large-scale grid systems. NIST Internal Report 7566
Balter MH, Osogami T, Wolf AS, Wierman A (2005) Multi-server queueing systems with multiple priority classes. Queuing Syst 51:331–360
Chakka R, Gemikonakli O, Basappa,P (2002) Modelling multi-server systems with time or operation dependent breakdowns, alternate repair strategies, reconfiguration and rebooting delays. In: SPECTS, pp 266–277
Quanqing X, Aung KMM, Zhu Y, Yong KL (2016) Building a large-scale object-based active storage platform for data analytics in the internet of things. J Supercomput 72:2796–2814
Vilaplana J, Solsona F, Teixid I, Mateo J, Abella F, Rius J (2014) A queuing theory model for cloud computing. J Supercomput 69(1):492–507
Cao J, Hwang K, Li K, Zomaya AY (2013) Optimal multi-server configuration for profit maximization in cloud computing. IEEE Trans Parallel Distrib Syst 24(6):1087–1096
Rezende C, Boukerche A, Pazzi RW, Bruno PS (2011) The impact of mobility on mobile ad hoc networks through the perspective of complex networks. J Parallel Distrib Comput 71(9):1189–1200
Ever E, Gemikonakli O, Kocyigit A, Gemikonakli E (2013) A hybrid approach to minimize state space explosion problem for the solution of two stage tandem queues. J Netw Comput Appl 36(2):908–926
Ever E (2014) Fault tolerant two stage open queuing systems with server failures at both stages. IEEE Commun Lett 18:1523–1526
Banks J, Carson J, Nelson B (2000) Discrete-event system simulation. Prentice Hall, Englewood Cliffs
Mitrani I, Chakka R (1995) Spectral expansion solution for a class of markov models: application and comparison with the matrix–geometric method. Perform Eval 23(3):241–260
Bruneo D (2014) A stochastic model to investigate data center performance and qos in iaas cloud computing systems. IEEE Trans Parallel Distrib Syst 25(3):560–569
Boldrini C, Conti M, Passarella A (2014) Performance modelling of opportunistic forwarding under heterogeneous mobility. Comput Commun 48:56–70
Tschaikowski M, Tribastone M (2014) Tackling continuous state space explosion in a Markovian process algebra. Theoret Comput Sci 517:1–33
Schroeder B, Gibson G (2010) A large-scale study of failures in high performance computing systems. IEEE Trans Depend Secure Comput 7(4):337–350
Wang L, Zhou F, Guo C, Zhang X, Yang M (2009) A capacity optimization algorithm for network survivability enhancement. In: IEEE international conference on multimedia information networking and security, pp 177–181
Chakka R (1995) Performance and reliability modelling of computing systems using spectral expansion. Ph.D. thesis, University of Newcastle, Upon Tyne, UK
Sahoo RK, Sivasubramaniam A, Squillante MS, Zhang Y (2004) Failure data analysis of a large-scale heterogeneous server environment. In: International conference on dependable systems and networks, pp 772–781
Schwarz T, Baker M, Bassi S, Baumgart B, Flagg W, Ingen CV, Joste K, Manasse M, Shah M (2006) Disk failure investigations at the internet archive. In: IEEE conference on mass storage systems and technologies
Ever YK, Kirsal Y, Ever E, Gemikonakli O (2015) Analytical modelling and performability evaluation of multi-channel WLANs with global failures. Int J Comput Commun Control 10(10):551–566
Zhang S, Huang N, Sun X, Zhang Y (2016) A hierarchical model for mobile ad hoc network performability assessment. KSII Trans Internet Inf Syst 10(8):3602–3620
Ever E, Kirsal Y, Gemikonakli O (2009) Performability modelling of handoff in wireless cellular networks and the exact solution of system models with service rates dependent on numbers of originating and handoff calls. In: IEEE proceedings of international conference on computational intelligence, modelling and simulation, pp 282–287
Ever E (2016) Performability analysis of cloud computing centers with large numbers of servers. J Supercomput 73:2130–2156
Smith R, Trivedi K, Ramesh A (1988) Performability analysis: measures, an algorithm, and a case study. IEEE Trans Comput 37(4):406–417
Haverkort BR (2001) Performability modelling: techniques and tools. Wiley, London
Amazon Elastic Compute Cloud, User Guide, API Version ed., Amazon Web Service LLC or its affiliate. http://aws.amazon.com/documentation/ec2 (2010)
Papadopoulos PM (2011) Extending clusters to Amazon EC2 using the Rocks toolkit. Int J High Perform Comput Appl 25(3):317–327
Academic (UB-HPC) Compute Cluster Hardware Specs, The University of Buffalo. https://www.buffalo.edu/ccr/support/research_facilities/general_compute/cluster_hardware_specs.html
Shin Yang Woo, Moon Dug Hee (2014) Approximation of throughput in tandem queues with multiple servers and blocking. Appl Math Model 38(24):6122–6132
Kirsal Y, Ever E, Kocyigit A, Gemikonakli O, Mapp G (2015) Modeling and analysis of vertical handover in highly mobile environments. J Supercomput 71:4352–4380
Melikov AZ, Ponomarenko LA, Rustamov AM (2016) Hierarchical space merging algorithm for the analysis of open tandem queueing networks. Cybern Syst Anal 52(6):867–877
Ma Y, Han J, Trivedi KS (2001) Composite performance and availability analysis of wireless communication networks. IEEE Trans Veh Technol 50(5):1216–1223
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kirsal, Y. Analytical modelling and optimization analysis of large-scale communication systems and networks with repairmen policy. Computing 100, 503–527 (2018). https://doi.org/10.1007/s00607-017-0580-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-017-0580-7
Keywords
- Optimization analysis
- Performability modelling
- Large-scale systems
- Fault-tolerant systems
- Queuing analysis
- Quality of service