Discrete OptimizationSelecting and weighting features using a genetic algorithm in a case-based reasoning approach to personnel rostering
Introduction
Nurse rostering can be defined to be the problem of placing resources (nurses), subject to constraints, into slots in a pattern, where the pattern denotes a set of legal shifts defined in terms of work that needs to be done [30]. A wide variety of constraints can be imposed on rosters depending on the legal, management, and staffing requirements of individual organisations. Definitions of roster quality and optimality are highly subjective and therefore difficult to represent systematically using utility functions or rule bases. Human rostering experts have many years of experience in making rostering decisions which reflect their individual goals and objectives.
Nurse rostering problems have been solved using a variety of different mathematical and artificial intelligence methods. They are usually modelled as optimisation problems but the objective functions used vary considerably between problems. Bailey [3], Beaumont [6], and Warner [28] use mathematical programming techniques to generate nurse rosters optimised with respect to staffing costs, under-staffing costs, and shift pattern penalties. Constraint satisfaction techniques have been developed by Abdennadher and Schlenker [1], Cheng et al. [11], and Meyer auf’m Hofe [20] which allow the definition of many different types of constraint. A number of meta-heuristic approaches have been explored including genetic algorithms [13], simulated annealing [4], tabu search [8], [12], and hyper-heuristics [10]. A CBR approach by Scott and Simpson [26] combined case-based reasoning with constraint logic programming by storing shift patterns used for the construction of nurse rosters.
Case-based repair generation (CBRG) is a technique developed by the authors to solve nurse rostering problems [7] which uses case-based reasoning (CBR). CBR is a reasoning paradigm in which new problems are solved using the solutions to similar problems that have previously been encountered [18]. Previous problems and their corresponding solutions are stored as cases in a database called a case-base. New problems are compared to the cases in the case-base and the most similar is retrieved. The solution to the problem from the retrieved case is then adapted to the context of the new problem. If the new solution could be useful for future problem solving then it is stored in the case-base, thus increasing the total knowledge held.
The CBRG method considers each constraint violation in a roster as a separate problem. The case-base contains a history of previous constraint violations and the operations that were used to repair them. Cases are retrieved from the case-base using a two stage retrieval process [23]. The first stage retrieves those cases containing violations of the same type as the current problem. The second stage calculates the similarity of these cases to the current problem using the weighted nearest neighbour method. The violations are represented by a set of characteristic features and can be interpreted as points in a feature space. Weights are assigned to the features representing their relative importance. The most similar case is then defined as the one with the smallest weighted distance from the feature vector representing the current problem. It is vital for the retrieval process that appropriate features are selected to represent the violations and that these features are carefully weighted.
One of the most common ways to determine the accuracy of a case-base is to measure its classification accuracy. The CBRG method can be seen as a classifier which determines the type and parameters of a repair for a given violation. Its classification accuracy can be measured by repeatedly removing a case from the case-base, performing a retrieval to determine the nearest case to the removed case, and then comparing the repairs. In the literature, nearest neighbour classification algorithms [14] have been used successfully to solve a number of different classification problems. They allow complex relationships between input parameters to be captured without the need to model them explicitly. However, they can be sensitive to noise in the data sets and erroneous or irrelevant features [2]. These effects can be reduced by selecting only relevant features from the feature set and assigning a weight to each feature representing its relative importance. A number of different feature weighting and selection methods have been developed including Salzberg’s [25] feature weighting algorithm based on a heuristic approach for his EACH classification method, a random mutation hill climbing approach for feature selection by Skalak [27], and a genetic algorithm by Kuncheva and Jain [19]. Many more algorithms are described in a review by Wettschereck et al. [29]. We investigate an approach to automated weighting and feature selection based on the genetic algorithm based GA-WKNN developed by Kelly and Davis [17] and a dimensionality reduction algorithm developed by Raymer et al. [24]. These approaches are adapted so that they can handle the types of data used in the CBRG method to model the nurse rostering problem.
In this paper we present an adaptation of a feature weighting and selection algorithm to a complex real life nurse rostering problem. This algorithm allows us to learn which features are important when making rostering decisions and which features are irrelevant, thus increasing our understanding of the nurse rostering problem. The accuracy of the CBRG method is increased by weighting the features and the search time is decreased by reducing the number of features that it is necessary to store in each case. Furthermore, the flexibility and adaptability of the case-based approach is enhanced because its behaviour can be tuned more precisely to the decision making style of the expert who trained it. The data used for the experiments in this paper has been derived from rosters provided by the ophthalmology ward at the Queens Medical Centre University Hospital Trust (QMC) in Nottingham, United Kingdom.
The nurse rostering problem is introduced in Section 2 and the CBRG method is described in Section 3. Section 4 introduces the different types of features used to describe the violations. The modified genetic algorithm for feature weighting and selection is presented in Section 5. The results obtained by applying the algorithm to a case-base of real life rostering decisions are presented in Section 6. Section 7 concludes the paper.
Section snippets
The nurse rostering problem
The nurse rostering problem is represented by the ordered pairwhereis the set of I nurses to be rostered, andis the set of K constraints. The set N contains information about the nurses to be rostered, the shifts they have been assigned and the shifts that they would prefer to work over the rostering period. The set C imposes constraints on the shift assignments in N.
Each nurse is denoted by a 4-tuple,where NurseType
The case-based repair generation method
The case-based repair generation (CBRG) method was developed by the authors to capture examples of individual constraint violations and the repairs that were used by human experts to solve them [23]. The violations and repairs are stored as cases in a case-base and are used to solve new violations in new rosters. When a new violation is identified in a roster the case containing the most similar violation in the case-base is retrieved. The repair from the retrieved case is used to generate a
Violation features
The first stage of the retrieval process chooses cases that are structurally the same as the focus violation. A large number of such cases can exist within a case-base and therefore it is necessary to rank them according to their violation features. The violation features are statistical characteristics of the roster and the violation. They can be seen as a ‘snap-shot’ of the state of the roster at the time the violation was repaired. They are considered to be important when making rostering
Genetic algorithm for feature weighting and selection
The nearest neighbour distance function which is used in the retrieval process requires a good selection of features and an appropriate set of feature weights. The effect of an increase in the weight of a particular feature is an increase in the influence that the feature has on the selection process. By decreasing their weighting, irrelevant features exert less influence on the calculation of the distance between cases, thus increasing the accuracy of the system.
It is not always the case that
Results
The algorithm was used to select features and feature weights based on a case-base trained using the expert rostering knowledge of nurses at the QMC. It was trained over two months on rosters involving 12 different constraints:
- 1.
Cover: EARLY shifts require 4 Qualified Nurses.
- 2.
Cover: EARLY shifts require 1 Registered Nurse.
- 3.
Cover: EARLY shifts require 1 Eye-Trained Nurse.
- 4.
Cover: LATE shifts require 3 Qualified Nurses.
- 5.
Cover: LATE shifts require 1 Registered Nurse.
- 6.
Cover: LATE shifts require 1
Conclusion
This paper has described a method for the automated selection and weighting of features for a case-based reasoning approach to nurse rostering. A genetic algorithm is used to find a subset of weighted features by searching for combinations of features and corresponding feature weights that increase the overall classification accuracy of the case-base retrieval method. The increase in classification accuracy improves the quality of the repairs that are generated by the CBRG method by ensuring
Acknowledgements
This research is supported by the Engineering and Physical Sciences Research Council (EPSRC) in the UK (grant number GR/N35205/01) and by the Queen’s Medical Centre University Hospital Trust, Nottingham.
References (30)
Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms
International Journal of Man-Machine Studies
(1992)Integrated days off and shift personnel scheduling
Computing and Industrial Engineering
(1985)Scheduling staff using mixed integer programming
European Journal of Operational Research
(1997)Nures scheduling with tabu search and strategic oscillation
European Journal of Operational Research
(1998)- et al.
Nearest neighbor classifier: Simultaneous editing and feature selection
Pattern Recognition Letters
(1999) - S. Abdennadher, H. Schlenker, INTERDIP—an interactive constraint based nurse scheduler, in: Proceedings of the First...
- et al.
Using simulated annealing and genetic algorithms to solve staff scheduling problems
Asia-Pacific Journal of Operational Research
(1997) - et al.
An overview of genetic algorithms: Part 1, fundamentals
University Computing
(1993) - G.R. Beddoe, S. Petrovic, A novel approach to finding feasible solutions to personnel rostering problems, in:...
- et al.
A hybrid tabu search algorithm for the nurse rostering problem
A tabu-search hyperheuristic for timetabling and rostering
Journal of Heuristics
Cited by (88)
A multi-objective scheduling model in medical tourism centers considering multi-task staff training
2024, Engineering Applications of Artificial IntelligenceRisk response for critical infrastructures with multiple interdependent risks: A scenario-based extended CBR approach
2022, Computers and Industrial EngineeringFirst-order linear programming in a column generation-based heuristic approach to the nurse rostering problem
2020, Computers and Operations ResearchA three-stage mixed integer programming approach for optimizing the skill mix and training schedules for aircraft maintenance
2018, European Journal of Operational ResearchCitation Excerpt :For some tasks for example, it is prohibited by law to involve people without the necessary skills or qualifications. Therefore, Beddoe, Petrovic, and Li (2009) and Beddoe and Petrovic (2006) talk about eye-training instead of on the job training. They present a genetic algorithm for the automated selection and weighting of features for a case-based reasoning approach to nurse rostering.