Discrete Optimization
Service system design for managing interruption risks: A backup-service risk-mitigation strategy

https://doi.org/10.1016/j.ejor.2018.03.028Get rights and content

Highlights

  • This paper considers the design of a service system with interruption risks.

  • The backup-service strategy is used to mitigate interruption risks.

  • Risk mitigation is enhanced by optimizing location, allocation, and service capacities.

  • The problem is formulated as an integer non-linear program.

  • A Lagrangian-relaxation algorithm is developed to solve large-size problem instances.

Abstract

This paper considers a system of immobile service facilities that provide service to a set of customers, which can be demand points, population zones, etc. The customers create congestion at facilities because of stochastic demands and service times. Each service facility is interrupted frequently, and the recovery process for each interruption starts after an assessment period. Both recovery and assessment processes last for uncertain periods. The backup-service strategy, together with appropriate adjustment of facility locations and service capacities, is used to mitigate interruption risks, that is, each customer is assigned to a backup facility to get service when the primary facility is interrupted. The goal is to determine open service facilities and their service capacities, and to assign customers to primary and backup facilities in order to maximize an aggregated performance measure, which is a balanced sum of the customers’ and the system owner's criteria. The problem is formulated as an integer non-linear optimization model and solved by a Lagrangian-relaxation algorithm. The numerical experiments illustrate the high efficiency of the algorithm. Several managerial implications are also provided.

Introduction

A service system is considered as a network of customers (demand points or population zones) and Service Facilities (SFs). SFs in a service system can be classified to immobile and mobile. An immobile SF is fixed and the clients must travel to its location to get service, while a mobile SF travels to the customers’ locations. To design an immobile service system, one requires to select immobile SFs that are to be established, to determine their service capacities, and to assign the customers to the open SFs. Such network design problems are here called Service System Design (SSD) problems. An interested reader is referred to Boffey, Galvao, and Espejo (2007), and Berman and Krass (2015) for surveys on SSD problems.

The congestion in immobile SFs is an inappropriate system status, which is controlled in SSD problems. Two main approaches are used to manage the congestion. The first considers some constraints to prevent the expectation or tail probability of the waiting time (or queue length) from exceeding a predefined threshold (e.g., Aboolian et al., 2012, Baron et al., 2008, Rajagopalan and Yu, 2001, Silva and Serra, 2008). In the second approach, which is also applied in the current work, the waiting cost is considered as a penalty term in the objective function (e.g., Elhedhli, 2006, Wang et al., 2002, Wang et al., 2004, Aboolian et al., 2008, Berman and Drezner, 2007, Kim, 2013, Vidyarthi and Jayaswal, 2014). The second approach results in complicated non-linear optimization models, which are often solved heuristically. Recently, Ahmadi-Javid and Hoseinpour (2017) make significant progress toward optimally solving a broad class of discrete optimization problems involving queuing formulas, which includes, as a special case, the generic SSD problem formulated under the second approach.

The service process at each SF may be interrupted randomly. This fact has been thoroughly studied in the queuing literature; see Krishnamoorthy, Pramod, and Chakravarthy (2014) for a survey on queues with interruptions. Recent papers published since the publication of this survey are briefly reviewed in the following. Boualem (2014) analyzes a single-server queue with interruptions where there is no waiting room for customers. In such a situation, if a customer finds the server busy or down, he/she makes a retrial at later time. He provides bounds for the stationary distribution of the Markov chain. Tadj and Ke (2014) consider a single-server queue with interruptions and a service center that serves the customers one-by-one or in batch depending on the queue size. They investigate the optimal value of the batch size. Yang and Ke (2014) consider an M/G/1 queue in which the server is randomly activated and deactivated (goes to a vacation) when the number of customers exceeds a limit. The server is also subject to random breakdowns. They calculate the system size distribution at a stationary point of time. Krishnamoorthy, Sivadasan, and Lakshmy (2015a) consider an M/M/1 queuing system with interruptions that are caused by some environmental factors. The system (customer/server) is not aware of the interruption for a random amount of time passed from the interruption occurrence. Moreover, the customer leaves the system if the number of interruptions exceeds a finite limit (k) and/or it longs much time than a limit. The probability of each interruption, its recovery time, and the service rate after the recovery process depend on the factor type that causes the interruption. The stability condition and the response time are computed and the optimal value of k is investigated. Krishnamoorthy, Sivadasan, and Lakshmy (2015b) consider an M/G/1 queuing system with two types of vacations, namely normal and working. At the end of a busy period, the server goes for each one of these vacations based on the environment. On the completion of any vacation, if the server is empty, only a working vacation starts again. When the server is in the normal vacation, some of the customers leave the system. The server may interrupt its working vacation to return to normal service if it finds some customer in the system. The distributions for the queue length and the waiting time are investigated. Krishnamoorthy, Nair, and Narayanan (2015) consider a production-inventory system in which the service process may be interrupted, but no produced item is lost due to interruptions. An explicit expression for the stability condition is obtained, and some performance measures are provided. Dudin, Jacob, and Krishnamoorthy (2015) consider a multi-server system with a phase-type service-time distribution in which an interruption may occur considering partial service protection and service repetition. A multi-dimensional Markov chain is drawn to describe the system behavior. The stationary distribution along with some performance measures are derived. Atencia (2015) considers a discrete-time queueing system with breakdowns where the server lifetime is geometrically distributed. Jacob and Krishnamoorthy (2015) investigate queues with phase-type service-time distributions where interruptions are caused by customers. An interrupted customer retries for getting service after completion of the interruption or leaves the system. An illustration of the system with several performance measures is provided. Yang, Chang, and Ke (2016) consider a single-server queue with batch arrivals. When the server is busy, it is subject to random interruptions, and the server may take an additional vacation after the first essential vacation. They analyze the system and develop a variety of system performance measures. Hoseinpour and Ahmadi-Javid (2016 and 2017) incorporate a queue model with interruptions into a stochastic facility location problem where no backup strategy is used, and where service capacities are modeled as discrete and continuous decision variables, respectively. Kumar, Rukmani, Thanikachalam, and Kanakasabapathi (2018) study a queuing system with interruptions in which the breakdowns and repairs depend on whether the server is busy or idle. Different analytical results with a performance analysis are presented under the steady-state condition. Talen and Aissani (2016) study an unreliable M/G/1 queue system with two kinds of persistence and impatient customers, where the preventive maintenance is postponed in busy periods.

Strategies for mitigating interruption risks in an SSD problem can be categorized into three groups, namely, service-capacity, backup-service, and location-allocation. The first strategy hedges the interrupting risks by increasing the service capacities at open facilities, while under the backup-service strategy, customers experiencing interruptions are transferred to other SFs that are preplanned as backup facilities. The location-allocation strategy is used to appropriately establish SFs and allocate customers to them in a way that a balanced level of diversity is reached to limit the negative effects of interruptions. One should note that the backup-service strategy has been previously used in papers in location problems with limited workload capacities (see e.g., Hogan and ReVelle, 1986, Narasimhan et al., 1992, Pirkul and Schilling, 1988, Pirkul and Schilling, 1989, Amiri, 1998), but it has not yet considered by any work to mitigate interruption risks in SSD problems.

Several papers recently studied network design problems in supply chains with disruptions; see the review papers by Snyder et al. (2016) and Govindan, Fattahi, and Keyvanshokooh (2017). Actually, the above three strategies are similar to the strategies used for mitigating disruption risks in non-competitive supply-chain networks. Snyder et al. (2016) categorize the mitigation strategies for supply-chain disruptions into four main groups: (1) mitigating through inventories, (2) sourcing and demand flexibility, (3) facility location, and (4) interaction with external parties. Except the last group, which is particularly used in competitive environments, the other ones are related to the strategies for mitigating interruptions in service systems. Indeed, one can see that the service-capacity strategy is similar to the first group that elevates inventory levels in supply chain problems (note that service cannot be inventorized, but the service capacity can be increased). Moreover, the backup-service strategy is similar to the second group where the sourcing and demanded flexibility is strengthened by providing backup suppliers and/or assigning demands to some operating facilities. Finally, the location-allocation strategy is apparently similar to the third group used in supply chains.

When an interruption occurs, the customers who face the interruption can move or be guided to another queue in the backup facility. There are papers that study the other types of customer transfers that are rooted on other factors rather than interruptions. For example, He and Neuts (2002) consider two M/M/1 queues with transfer of customers where a batch of customers is transferred to the shorter queue when each one of the queues reaches to a specified length. Down and Lewis (2006) study a dynamic load balancing problem in a parallel queue system in which a decision–maker decides on moving customers among queues at each period. Deepak, Krishnamoorthy, Narayanan, and Vineetha (2008) investigate a parallel two-server inventoried queuing system in which customers move from the longer queue to the shorter one whenever the length difference of the two queues reaches to a given limit. In all of these studies, the customer transfer mainly depends on queue characteristics and both servers are assumed to work as before. However, in the presence of interruptions, one of the servers stops functioning until it is recovered, which means that the customer transfer depends on the servers’ states. This shows that the transfer type considered for the interruption case is basically different from the previously-studied transfer types.

This paper for the first time studies an SSD problem that simultaneously incorporates the above-mentioned three main risk-mitigation strategies (i.e., service-capacity, service-backup, and location-allocation) into the design of a service network that is subject to interruption risks. The proposed problem determines location and service capacities of open SFs, and allocates customers to primary and backup SFs. The clients are guided to queues in backup SFs when primary SFs are interrupted, which indicates a new type of customer transfer between queues. The interruption at each SF and streams of customer demands are assumed to evolve according to independent Poisson processes. Moreover, the transition from the interruption phase to the recovery phase starts after an uncertain assessment period is completed, which has not yet studied in the literature of queues with interruptions. The problem is formulated as an integer non-linear program, which is solved by a powerful Lagrangian-Relaxation (LR) algorithm. Briefly, the main contributions of our paper can be listed as follows:

  • This paper develops a model that considers service-capacity, backup-service, and location-allocation strategies for mitigating interruption risks in a service network. Our numerical study shows that our approach successfully reduces the negative effects of interruption risks.

  • In this paper, for the first time, a random assessment time is considered before starting each recovery process. To model the interruption in each SF, a three-state Markovian process is fully analyzed. In the studies on queues with interruptions, it is basically supposed that the recovery process of each interruption starts immediately after the breakdown, which is a simplifying assumption.

  • A new type of customer transfer is considered between queues, which originates from interruptions (a server-dependent factor), while the previous studies consider customer transfers that are caused by queue-dependent factors such as queue length.

  • The proposed model, which is an integer non-linear optimization model, is efficiently solved using an LR algorithm in large scales.

Our results can be used in designing service networks in different business areas. For example, they can be used for locating healthcare facilities such as emergency SFs (Syam, 2008) and preventive healthcare facilities (Vidyarthi and Kuzgunkaya, 2014, Zhang et al., 2009, Zhang et al., 2012) where the facilities’ functionality may be interrupted due to different human- or equipment-dependent failures (see Ahmadi-Javid, Seyedi, and Syam (2017) for a survey on location models for different types of healthcare facilities). Other areas of applications can also be found; for example, in designing telecommunication and call-center networks (LeBlanc & Simmons, 1989); walk-in health clinics (Jan et al., 2000, Newman, 1984); refuse collection, and disposal (Riccio, 1984), in which the service interruption is expected. Moreover, the results can potentially be used in designing networks that provide after-sales services for manufacturing systems (Saccani, Johansson, & Perona, 2007) or collection/recycling services in closed-loop supply chains (Keyvanshokooh, Ryan, & Kabir, 2016).

The rest of the paper is organized as follows. Section 2 states and formulates our SSD problem as an integer non-linear programming model, and then Section 3 presents an LR algorithm to tackle the model. Section 4 reports the computational experiment, and Section 5 provides interesting managerial insights. Section 6 concludes the paper and puts forward suggestions for areas of future study.

Section snippets

Problem formulation

This section states an SSD problem that designs an immobile service system under the risk of random interruptions. The customers accessing an SF during an interruption are diverted to a backup SFs and served at a higher service cost for the system. Some of the transferred customers go to the backup SFs, and the others leave the service system.

Consider a single-server immobile SF where the arrivals from the allocated demand points for both primary and backup services follow independent Poisson

Solution algorithm

This section develops an algorithm for the proposed model (6), (9)–(16) based on the LR method (e.g., Fisher, 2004, Geoffrion, 2010). After relaxing constraints (15) and (16), the relaxed model can be decomposed into |I| subproblems, each of which can be solved much more easily than the original. This fact motivates the development of an LR algorithm for solving the problem. In this algorithm, constraints (15) and (16) are dualized with Lagrangian multipliers uR|J| and vR|J|, respectively;

Computational results

This section summarizes the numerical experiments used to evaluate the performance of the LR algorithm developed in Section 3 for solving our SSD problem.

The proposed algorithm was run on a PC with a dual-core 2.2 GHz processor, using 4GB RAM, and operating Microsoft Windows 8, 32-bit. The algorithm was implemented using C++ linked with object-oriented GAMS API, version 24.1.3, which called solver DICOPT (DIscrete and Continuous OPTimizer) for solving subproblems LR(i)r, i ∈ I, r ∈ R, given in

Managerial insights

This section explains more about the applications of our model in business and provides observations that can be useful in practice.

The results are given for an instance with 10 potential SFs, 10 service-capacity levels, and 100 customers; however, the observations hold similarly for the other instances, which are not reported here for the sake of brevity. In Figs. 5, 6, 7 and 10, the expected recovery time and expected assessment time are fixed and the interruption rate varies, but it can be

Conclusions

This paper considers service interruption risks in the design of a service system. Each service facility operates as an M/M/1 system where each facility's service process faces interruptions according to a Poisson process. The assessment and recovery times are identically, exponentially distributed. Three risk-mitigation strategies are considered: location-allocation, service-capacity, and backup-service. Whenever an interruption occurs, arrival customers are diverted to a backup-service

References (62)

  • H.D. Sherali et al.

    A variable target value method for no differentiable optimization

    Operations Research Letters

    (2000)
  • S.S. Syam

    A multiple server location–allocation model for service system design

    Computers & Operations Research

    (2008)
  • N. Vidyarthi et al.

    Efficient solution of a class of location–allocation problems with stochastic demand and congestion

    Computers & Operations Research

    (2014)
  • YangD.Y. et al.

    Cost optimization of a repairable M/G/1 queue with a randomized policy and single vacation

    Applied Mathematical Modelling

    (2014)
  • YangD.Y. et al.

    On an unreliable retrial queue with general repeated attempts and J optional vacations

    Applied Mathematical Modelling

    (2016)
  • ZhangY. et al.

    Incorporating congestion in preventive healthcare facility network design

    European Journal of Operations Research

    (2009)
  • R. Aboolian et al.

    Location and allocation of service units on a congested network

    IIE Transactions

    (2008)
  • R. Aboolian et al.

    Profit maximizing distributed service system design with congestion and elastic demand

    Transportation Science

    (2012)
  • Ahmadi-Javid, A., Berman, O., & Hoseinpour, P. (2018). Location and capacity planning of facilities with general...
  • Ahmadi-Javid, A., & Ramshe, N. (2018). Linear formulations and valid inequalities for a classic location problem with...
  • Ahmadi-Javid, A., & Hoseinpour, P. (2017). Convexification of queueing formulas by mixed-integer second-order cone...
  • I. Atencia

    A discrete-time queueing system with server breakdowns and changes in the repair times

    Annals of Operations Research

    (2015)
  • O. Baron et al.

    Facility location with stochastic demand and constraints on waiting time

    Manufacturing & Service Operations Management

    (2008)
  • P. Belotti et al.

    Mixed-integer nonlinear optimization

    Acta Numerica

    (2013)
  • O. Berman et al.

    The multiple server location problem

    Journal of Operational Research Society

    (2007)
  • O. Berman et al.

    Stochastic location models with congestion

  • O. Berman et al.

    Locating service facilities to reduce lost demand

    IIE Transactions

    (2006)
  • D.P. Bertsekas

    Nonlinear programming

    (1999)
  • M. Boualem

    Insensitive bounds for the stationary distribution of a single server retrial queue with server subject to active breakdowns

    Advances in Operations Research

    (2014)
  • M.R. Bussieck et al.

    MINLP solver software

    Wiley encyclopedia of operations research and management science

    (2010)
  • T.G. Deepak et al.

    Inventory with service time and transfer of customers and/inventory

    Annals of Operations Research

    (2008)
  • Cited by (17)

    • Service interruption and customer withdrawal in the congested facility location problem

      2022, Transportation Research Part E: Logistics and Transportation Review
      Citation Excerpt :

      Facility locations, service capacities, and the allocation of customers to facilities are determined to maximize the total expected profit. In addition, Ahmadi-Javid & Hoseinpour (2019) considered two types of facilities to serve customers: primary and backup. Moreover, two types of demands (primary and backup) are considered for each customer, and the request process for each of the demand types from each of the primary and backup facilities is a Poison process with a known rate.

    • Improving service quality in a congested network with random breakdowns

      2021, Computers and Industrial Engineering
      Citation Excerpt :

      The main question is, how costly is this, and what is the optimal solution? To answer this question mathematically, one should first note that there are two approaches for incorporating service capacity decision in the modeling; some works simply assume a finite set of capacity levels for service provision (see e.g., Elhedhli, 2006; Vidyarthi & Jayaswal, 2014; Aboolian, Berman, & Krass, 2012; Ahmadi-Javid & Hoseinpour, 2017; Ahmadi-Javid & Hoseinpour, 2019; Hoseinpour & Ahmadi-Javid, 2019), the others model it as a continuous real-valued non-negative decision variable (see e.g., Wang, Batta, & Rump, 2004; Castillo, Ingolfsson, & Sim, 2009; Hoseinpour & Ahmadi-Javid, 2016; Elhedhli, Wang, & Saif, 2018; Ahmadi-Javid, Berman, & Hoseinpour, 2018). The second approach is applied here in modeling.

    • Locations of congested facilities with interruptible immobile servers

      2021, Computers and Industrial Engineering
      Citation Excerpt :

      In this research, different levels were considered for the service capacity of each facility, and the facility locations, the capacity of service, and the method of allocating customers to the facilities were determined so that the overall system performance is optimized. Recently, Ahmadi-Javid and Hoseinpour (2019) considered two types of facilities to provide service to a customer: primary and backup. For each customer, two types of demand (primary demand and backup demand) were assumed to receive service, and the process of each customer's demand for receiving service from each of the primary and backup facilities is a Poisson process with a specific rate.

    • A stochastic location model for designing primary healthcare networks integrated with workforce cross-training

      2020, Operations Research for Health Care
      Citation Excerpt :

      A comprehensive review of general facility location problems with immobile servers, stochastic demand, and congestion in single-service networks was provided by Berman and Krass [21]. Recent advances can be found in [22,23]. Günes and Nickel [24] and Ahmadi-Javid et al. [7] recently reviewed studies that directly focus on healthcare facility location problems, which are closely related to location models with congestion.

    View all citing articles on Scopus
    View full text