Selecting and weighting features using a genetic algorithm in a case-based reasoning approach to personnel rostering

doi:10.1016/j.ejor.2004.12.028

European Journal of Operational Research

Volume 175, Issue 2, 1 December 2006, Pages 649-671

https://doi.org/10.1016/j.ejor.2004.12.028 Get rights and content

Abstract

Personnel rostering problems are highly constrained resource allocation problems. Human rostering experts have many years of experience in making rostering decisions which reflect their individual goals and objectives. We present a novel method for capturing nurse rostering decisions and adapting them to solve new problems using the Case-Based Reasoning (CBR) paradigm. This method stores examples of previously encountered constraint violations and the operations that were used to repair them. The violations are represented as vectors of feature values. We investigate the problem of selecting and weighting features so as to improve the performance of the case-based reasoning approach. A genetic algorithm is developed for off-line feature selection and weighting using the complex data types needed to represent real-world nurse rostering problems. This approach significantly improves the accuracy of the CBR method and reduces the number of features that need to be stored for each problem. The relative importance of different features is also determined, providing an insight into the nature of expert decision making in personnel rostering.

Introduction

Nurse rostering can be defined to be the problem of placing resources (nurses), subject to constraints, into slots in a pattern, where the pattern denotes a set of legal shifts defined in terms of work that needs to be done [30]. A wide variety of constraints can be imposed on rosters depending on the legal, management, and staffing requirements of individual organisations. Definitions of roster quality and optimality are highly subjective and therefore difficult to represent systematically using utility functions or rule bases. Human rostering experts have many years of experience in making rostering decisions which reflect their individual goals and objectives.

Nurse rostering problems have been solved using a variety of different mathematical and artificial intelligence methods. They are usually modelled as optimisation problems but the objective functions used vary considerably between problems. Bailey [3], Beaumont [6], and Warner [28] use mathematical programming techniques to generate nurse rosters optimised with respect to staffing costs, under-staffing costs, and shift pattern penalties. Constraint satisfaction techniques have been developed by Abdennadher and Schlenker [1], Cheng et al. [11], and Meyer auf’m Hofe [20] which allow the definition of many different types of constraint. A number of meta-heuristic approaches have been explored including genetic algorithms [13], simulated annealing [4], tabu search [8], [12], and hyper-heuristics [10]. A CBR approach by Scott and Simpson [26] combined case-based reasoning with constraint logic programming by storing shift patterns used for the construction of nurse rosters.

Case-based repair generation (CBRG) is a technique developed by the authors to solve nurse rostering problems [7] which uses case-based reasoning (CBR). CBR is a reasoning paradigm in which new problems are solved using the solutions to similar problems that have previously been encountered [18]. Previous problems and their corresponding solutions are stored as cases in a database called a case-base. New problems are compared to the cases in the case-base and the most similar is retrieved. The solution to the problem from the retrieved case is then adapted to the context of the new problem. If the new solution could be useful for future problem solving then it is stored in the case-base, thus increasing the total knowledge held.

The CBRG method considers each constraint violation in a roster as a separate problem. The case-base contains a history of previous constraint violations and the operations that were used to repair them. Cases are retrieved from the case-base using a two stage retrieval process [23]. The first stage retrieves those cases containing violations of the same type as the current problem. The second stage calculates the similarity of these cases to the current problem using the weighted nearest neighbour method. The violations are represented by a set of characteristic features and can be interpreted as points in a feature space. Weights are assigned to the features representing their relative importance. The most similar case is then defined as the one with the smallest weighted distance from the feature vector representing the current problem. It is vital for the retrieval process that appropriate features are selected to represent the violations and that these features are carefully weighted.

One of the most common ways to determine the accuracy of a case-base is to measure its classification accuracy. The CBRG method can be seen as a classifier which determines the type and parameters of a repair for a given violation. Its classification accuracy can be measured by repeatedly removing a case from the case-base, performing a retrieval to determine the nearest case to the removed case, and then comparing the repairs. In the literature, nearest neighbour classification algorithms [14] have been used successfully to solve a number of different classification problems. They allow complex relationships between input parameters to be captured without the need to model them explicitly. However, they can be sensitive to noise in the data sets and erroneous or irrelevant features [2]. These effects can be reduced by selecting only relevant features from the feature set and assigning a weight to each feature representing its relative importance. A number of different feature weighting and selection methods have been developed including Salzberg’s [25] feature weighting algorithm based on a heuristic approach for his EACH classification method, a random mutation hill climbing approach for feature selection by Skalak [27], and a genetic algorithm by Kuncheva and Jain [19]. Many more algorithms are described in a review by Wettschereck et al. [29]. We investigate an approach to automated weighting and feature selection based on the genetic algorithm based GA-WKNN developed by Kelly and Davis [17] and a dimensionality reduction algorithm developed by Raymer et al. [24]. These approaches are adapted so that they can handle the types of data used in the CBRG method to model the nurse rostering problem.

In this paper we present an adaptation of a feature weighting and selection algorithm to a complex real life nurse rostering problem. This algorithm allows us to learn which features are important when making rostering decisions and which features are irrelevant, thus increasing our understanding of the nurse rostering problem. The accuracy of the CBRG method is increased by weighting the features and the search time is decreased by reducing the number of features that it is necessary to store in each case. Furthermore, the flexibility and adaptability of the case-based approach is enhanced because its behaviour can be tuned more precisely to the decision making style of the expert who trained it. The data used for the experiments in this paper has been derived from rosters provided by the ophthalmology ward at the Queens Medical Centre University Hospital Trust (QMC) in Nottingham, United Kingdom.

The nurse rostering problem is introduced in Section 2 and the CBRG method is described in Section 3. Section 4 introduces the different types of features used to describe the violations. The modified genetic algorithm for feature weighting and selection is presented in Section 5. The results obtained by applying the algorithm to a case-base of real life rostering decisions are presented in Section 6. Section 7 concludes the paper.

Section snippets

The nurse rostering problem

The nurse rostering problem is represented by the ordered pair $R = 〈 N, C 〉,$ where $N = {{nurse}_{i} : 0 ⩽ i < I}$ is the set of I nurses to be rostered, and $C = {{constraint}_{k} : 0 ⩽ k < K}$ is the set of K constraints. The set N contains information about the nurses to be rostered, the shifts they have been assigned and the shifts that they would prefer to work over the rostering period. The set C imposes constraints on the shift assignments in N.

Each nurse is denoted by a 4-tuple, ${nurse}_{i} = 〈 {NurseType}_{i}, {hours}_{i}, {NR}_{i}, {NP}_{i} 〉,$ where NurseType

The case-based repair generation method

The case-based repair generation (CBRG) method was developed by the authors to capture examples of individual constraint violations and the repairs that were used by human experts to solve them [23]. The violations and repairs are stored as cases in a case-base and are used to solve new violations in new rosters. When a new violation is identified in a roster the case containing the most similar violation in the case-base is retrieved. The repair from the retrieved case is used to generate a

Violation features

The first stage of the retrieval process chooses cases that are structurally the same as the focus violation. A large number of such cases can exist within a case-base and therefore it is necessary to rank them according to their violation features. The violation features are statistical characteristics of the roster and the violation. They can be seen as a ‘snap-shot’ of the state of the roster at the time the violation was repaired. They are considered to be important when making rostering

Genetic algorithm for feature weighting and selection

The nearest neighbour distance function which is used in the retrieval process requires a good selection of features and an appropriate set of feature weights. The effect of an increase in the weight of a particular feature is an increase in the influence that the feature has on the selection process. By decreasing their weighting, irrelevant features exert less influence on the calculation of the distance between cases, thus increasing the accuracy of the system.

It is not always the case that

Results

The algorithm was used to select features and feature weights based on a case-base trained using the expert rostering knowledge of nurses at the QMC. It was trained over two months on rosters involving 12 different constraints:

1.
Cover: EARLY shifts require 4 Qualified Nurses.
2.
Cover: EARLY shifts require 1 Registered Nurse.
3.
Cover: EARLY shifts require 1 Eye-Trained Nurse.
4.
Cover: LATE shifts require 3 Qualified Nurses.
5.
Cover: LATE shifts require 1 Registered Nurse.
6.
Cover: LATE shifts require 1

Conclusion

This paper has described a method for the automated selection and weighting of features for a case-based reasoning approach to nurse rostering. A genetic algorithm is used to find a subset of weighted features by searching for combinations of features and corresponding feature weights that increase the overall classification accuracy of the case-base retrieval method. The increase in classification accuracy improves the quality of the repairs that are generated by the CBRG method by ensuring

Acknowledgements

This research is supported by the Engineering and Physical Sciences Research Council (EPSRC) in the UK (grant number GR/N35205/01) and by the Queen’s Medical Centre University Hospital Trust, Nottingham.

References (30)

D.W. Aha
Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms
International Journal of Man-Machine Studies
(1992)
J. Bailey
Integrated days off and shift personnel scheduling
Computing and Industrial Engineering
(1985)
N. Beaumont
Scheduling staff using mixed integer programming
European Journal of Operational Research
(1997)
Kathryn Dowsland
Nures scheduling with tabu search and strategic oscillation
European Journal of Operational Research
(1998)
L.I. Kuncheva et al.
Nearest neighbor classifier: Simultaneous editing and feature selection
Pattern Recognition Letters
(1999)
S. Abdennadher, H. Schlenker, INTERDIP—an interactive constraint based nurse scheduler, in: Proceedings of the First...
R.N. Bailey et al.
Using simulated annealing and genetic algorithms to solve staff scheduling problems
Asia-Pacific Journal of Operational Research
(1997)
D. Beasley et al.
An overview of genetic algorithms: Part 1, fundamentals
University Computing
(1993)
G.R. Beddoe, S. Petrovic, A novel approach to finding feasible solutions to personnel rostering problems, in:...
E.K. Burke et al.
A hybrid tabu search algorithm for the nurse rostering problem

E.K. Burke, P. De Causmaecker, S. Petrovic, G. Vanden Berghe, Fitness evaluation for nurse scheduling problems, in:...

E.K. Burke et al.

A tabu-search hyperheuristic for timetabling and rostering

Journal of Heuristics

(2003)

B.M.W. Cheng, J.H.M. Lee, J.C.K. Wu, A constriant-based nurse rostering system using a redundant modeling approach....

A. Duenas, N. Mort, C. Reeves, D. Petrovic, Handling preferences using genetic algorithms for the nurse scheduling...

Cited by (88)

A multi-objective scheduling model in medical tourism centers considering multi-task staff training
2024, Engineering Applications of Artificial Intelligence
In the present era, hospitals are actively encouraging their International Patient Departments (IPD) to provide high-quality medical services at a cost-effective rate for international patients. To accomplish this objective, several crucial factors must be taken into account, including capacity constraints, hospital waiting lists, and the presence of a highly skilled workforce. The waiting time experienced by patients is a significant indicator of the quality of hospital services and is often used by patients to evaluate healthcare professionals, sometimes even surpassing their knowledge and expertise. This becomes particularly critical when dealing with international patients as longer waiting times indirectly lead to additional accommodation costs. One potential solution is to schedule international patients on different days within a specific timeframe to minimize patient flow time. However, this can only be realistically achieved if staff productivity is enhanced through in-service training. These challenges, coupled with the uncertainty surrounding treatment duration, create a complex problem for IPDs. This study proposes an optimization approach for scheduling medical tourists who visit destination medical centers with two primary objectives: 1) reducing patient waiting times and 2) decreasing training costs while enhancing the skill level of the staff serving patients. The proposed model is formulated as a bi-objective nonlinear integer programming problem, which is subsequently transformed into a linear model using theoretical techniques. Given that this problem is NP-hard, two efficient meta-heuristic methods are developed to solve it for real-world scenarios of varying sizes. The findings of this study present a comparative analysis between the outcomes generated by the proposed algorithms and an exact method (CPLEX), showcasing their effectiveness and superiority. Additionally, a sensitivity analysis highlights that despite the costs associated with in-service training, enhancing staff skills can substantially reduce patient waiting times, resulting in significant cost savings for medical services while ensuring the delivery of high-quality care. Moreover, the improved staff skills enable IPD managers to gain valuable insights into the financial well-being of the medical center, including its ability to attract medical tourists.
Active control of structures using genetic algorithm with dynamic weighting factors using in the constrained objective function
2023, Structures
Active control method by improving specification of well-known intelligent numerical search method i.e. genetic algorithm is developed here. This method reduces displacement of the structure by optimizing the control forces at each time step. The efficiency of the genetic algorithm as a part of nature-inspired metaheuristic methods is highly dependent on the constrained objective function. The constrained objective function is achieved by combining the constraints of the optimization problem. There are several methods to numerically combine these constraints. Using appropriate weighting factors to generate this function has been suggested by many researchers. In previous studies, the selection of these factors has been based on experimental or try and error methods and were constant throughout the control period. Proper selection of weighting factors increases the efficiency of the control method. Presenting a new genetic algorithm method in a way selecting weighting factors dynamically over the structural control period is the aim of this paper. Here, weighting factors are non-static in nature and are dynamically selected at each time step according to the memory of the previous step. Numerical results clearly prove the accuracy and efficiency of the proposed control process in comparison with Constant weighting factors methods.
Risk response for critical infrastructures with multiple interdependent risks: A scenario-based extended CBR approach
2022, Computers and Industrial Engineering
Critical infrastructures (CIs) are elementary utility systems that maintain the normal society functioning. However, various risks constantly emerge, resulting in widespread human and economic losses. Effective risk response for CIs is a challenging task because of extreme time sensitivity. Especially, multiple risks and interdependencies among them also increase the response difficulties. To promptly generate effective response strategies for CIs with multiple interdependent risks, a scenario-based extended case-based reasoning (CBR) approach by combining the risk network and ontological modeling is proposed in this study. First, a three-stage solution framework is constructed. Subsequently, the historical cases are represented based on a scenario analysis and ontology model. A new case retrieval mechanism including three similarity measures and corresponding algorithm is proposed to precisely retrieve the most similar historical case, and the retrieved response strategies are adapted according to the risk loss severity and risk interdependencies. Furthermore, a case study is conducted to verify the practicability and superiorities of the proposed approach, and the necessity of considering multiple interdependent risks in CI risk response. It is indicated that the study provides a more suitable tool for CI risk response in risk interdependent scenarios, which is beneficial to the effectiveness of risk response in reality.
First-order linear programming in a column generation-based heuristic approach to the nurse rostering problem
2020, Computers and Operations Research
A heuristic method based on column generation is presented for the nurse rostering problem. The method differs significantly from an exact column generation approach or a branch and price algorithm because it performs an incomplete search which quickly produces good solutions but does not provide valid lower bounds. It is effective on large instances for which it has produced best known solutions on benchmark data instances. Several innovations were required to produce solutions for the largest instances within acceptable computation times. These include using a fast first-order linear programming solver based on the work of Chambolle and Pock to approximately solve the restricted master problem. A low-accuracy but fast, first-order linear programming method is shown to be an effective option for this master problem. The pricing problem is modelled as a resource constrained shortest path problem with a two-phase dynamic programming method. The model requires only two resources. This enables it to be solved efficiently. A commercial integer programming solver is also tested on the instances. The commercial solver was unable to produce solutions on the largest instances whereas the heuristic method was able to. It is also compared against the state-of-the-art, previously published methods on these instances. Analysis of the branching strategy developed is presented to provide further insights. All the source code for the algorithms presented has been made available on-line for reproducibility of results and to assist other researchers.
A three-stage mixed integer programming approach for optimizing the skill mix and training schedules for aircraft maintenance
2018, European Journal of Operational Research
Citation Excerpt :
For some tasks for example, it is prohibited by law to involve people without the necessary skills or qualifications. Therefore, Beddoe, Petrovic, and Li (2009) and Beddoe and Petrovic (2006) talk about eye-training instead of on the job training. They present a genetic algorithm for the automated selection and weighting of features for a case-based reasoning approach to nurse rostering.
This paper presents a three-stage mixed integer programming approach for optimizing the skill mix and training schedule for aircraft maintenance workers. When all workers are trained for all skills, cheaper workforce schedules are possible. However, the training that is required to acquire all those skills can become very expensive. In the first and second stage, we therefore make a trade-off between the training costs and the resulting cheaper workforce schedule. As we assume that workers are unavailable to work during their training, the resulting schedules are only applicable in practice if the required training can be performed without endangering the current maintenance operations. In the third stage, we therefore want to find an optimal and feasible training schedule in order to obtain the desired skill mix with minimal costs. A computational experiment based on real-life data of an aircraft maintenance company not only demonstrates that our models succeed in finding good solutions within reasonable computation times, but also illustrates how the explicit incorporation of skills training in the scheduling process can lead to significant cost savings.
Genetic algorithm optimized double-reservoir echo state network for multi-regime time series prediction
2017, Neurocomputing
In prognostics and health management (PHM), the sensor measurement time series of equipment is collected, and predicting future sensor measurements accurately is crucial to PHM. Complex equipment is generally operated under dynamic operational conditions; thus, operational regime-switching process exists in the sensor measurement time series, which is called multi-regime time series. Different operational regimes may have various effects on time series; thus, the regime-switching process poses great challenge for multi-regime time series prediction. To predict the multi-regime time series accurately, the double-reservoir echo state network (DRESN) is adopted by modifying the conventional echo state network. The DRESN model has two input sequences: the sensor measurement sequence and regime parameter sequence, where the regime parameter reflects the operational regimes and influences sensor measurement; then, two reservoirs try to model these two sequences, respectively; last, the outputs of two reservoirs are aggregated to predict the future sensor measurement. The DRESN model not only considers previous sensor measurements but also takes the influence of regime parameters into account when predicting future sensor measurement; thus, it can improve the accuracy of multi-regime time series prediction. In addition, the training algorithm of the DRESN model is presented and only a linear regression problem needs to be solved, making the DRESN model efficient. To achieve good performance, four parameters of the DRESN model are optimized using genetic algorithm (GA) because GA is effective in solving mixed-integer problem, and the weighted cross validation is adopted in the objective function to achieve the accuracy and simplicity simultaneously. The DRESN model is applied to turbofan engine multi-regime time series and compared with other models. The results validate that the DRESN model can be accurate and stable in multi-regime time series prediction.

View all citing articles on Scopus

View full text

Discrete OptimizationSelecting and weighting features using a genetic algorithm in a case-based reasoning approach to personnel rostering

Abstract

Introduction

Section snippets

The nurse rostering problem

The case-based repair generation method

Violation features

Genetic algorithm for feature weighting and selection

Results

Conclusion

Acknowledgements

International Journal of Man-Machine Studies

Computing and Industrial Engineering

European Journal of Operational Research

European Journal of Operational Research

Pattern Recognition Letters

Using simulated annealing and genetic algorithms to solve staff scheduling problems

Asia-Pacific Journal of Operational Research

An overview of genetic algorithms: Part 1, fundamentals

University Computing

A hybrid tabu search algorithm for the nurse rostering problem

A tabu-search hyperheuristic for timetabling and rostering

Journal of Heuristics

Discrete Optimization
Selecting and weighting features using a genetic algorithm in a case-based reasoning approach to personnel rostering