Keywords

1 Introduction

The demand for emergency medical services has increased in recent years, and the emergency department (ED) has become the most important and busiest unit within most hospitals, providing emergency care and treatment to patients in need of immediate medical attention. However, EDs in Taiwan have not been able to accommodate the rapidly increasing patient demand in the past 14 years. This demand is attributed to the growth in the aging population and to influenza outbreaks caused by new variants of the virus. According to statistics of the Department of Health in Taiwan, the total number of people visiting EDs surged from 4,379,568 in 1998 to 6,730,946 in 2012. Figure 1 illustrates the growth rates of 53.7 % in total visits. Such increases cause an imbalance or a mismatch between supply and demand, ultimately creating long-term overcrowding in hospital EDs, also known as “ED crowding”.

Fig. 1.
figure 1

Total and average daily number of patient visits to the emergency department at Taiwan from 1998 to 2012

The Institute of Medicine recently noted that the problem of ED crowding is an obstacle to the safe and timely delivery of health care. Because EDs must always be available to provide emergency medical care to patients, ED resource management can be extremely complex and uncertain. Accordingly, ED overcrowding is a national problem that requires a promising approach to identify a solution or solutions. Hospital ED crowding has led to an increase in patients’ waiting times; thus, solving this problem requires a better understanding of a hospital’s patient flow and the behaviors of patients [10]. Existing research on ED crowding is sparse and has tended to focus on the present crowding state. Ding et al. [5] addressed the importance of analyzing the length of stay (LOS) to understand the behaviors of patients in the ED. Azari et al. [2] proposed a multi-tiered data-mining approach to predict patients’ LOS by clustering and classification methods. This research provides good departure points for understanding patient behaviors based on the LOS factor.

In this work, we analyzed patients’ lengths of stay (LOS) from the perspective of the behaviors of patients in the ED. That is, we extracted rules based on set of patient attributes and treatment processes (e.g., arrival to admission, examination pattern, transfer or discharge, etc.). Notably, the examination pattern denotes the frequency and usage order of medical resources such as X-rays, CTs, and the Lab, during the patient’s stay in the hospital’s ED. Then, we built up the model to predict LOS based on the rules. Accordingly, we proposed a novel rule-based data mining approach to investigate the issue of LOS prediction to tackle the problem of ED-crowding. Our aim was to find what kinds of patients and their associated behaviors would cause the overcrowding problem.

The objectives with the associate procedures of this research are briefly addressed, as follows:

  1. 1.

    We conducted data preprocessing and then applied association rule mining to identify both frequent and infrequent ED behaviors of patients. Accordingly, two types of patient behavior (PB)—i.e., the regular and exceptional behaviors of patients—were identified, after which the rules for partitioning the dataset were extracted. In this research, we further partitioned data based on a five level triage scale, i.e. from level 1 (most urgent) to level 5 (least resource intensive) under different PBs, and LOS to analyze the correlation between various triage levels and ED crowding.

  2. 2.

    We constructed a prediction model with an explanation capability to assist the ED of a hospital in decreasing its overcrowding conditions. Then extracted rules from different groups of different LOS which may cause ED crowding and validated the accuracy of those rules.

  3. 3.

    We built a decision support system based on the proposed data-mining process in the ED and visualized the extracting rules and the statistical data in the proposed rule-based medical decision support (R-MDS) visualization portal.

The contributions of this research are, (1) to our best of knowledge, the idea of partitioning the unbalanced dataset by Apriori algorithm [1] to investigate different users’ behaviors in the medical domain has not been reported; (2) as we know, the variants of patients’ lengths of stay (LOS) in the ED are typically diverse; thus, collecting numerous patient records to conduct empirical study is another challenge in this work. We have collected 43885 medical records from Mackay Memorial Hospital representing the period from January 1, 2010 to June 30, 2010 to do this research; and (3) Yao and Kumar [9] proposed a novel CONFlexFlow (Clinical cONtext-based Flexible workFlow) approach to represent medical knowledge by a better understanding of clinical context through ontologies, and using them to activate the right rules for a certain activity. The study emphasized a proposed implementation framework and presented a concept prototype using multiple open source tools to realize flexible clinical pathways and to build the clinical decision support system. The optimal objective of this long-term research is to build an interactive decision support system (DSS) for Mackay Memorial Hospital to help doctors, nurses and clerks make just-in-time decisions.

2 The R-MDS Visualization Portal

To assist interactive decision support, we built a visualization portal. Doctors, nurses, or clerks can input patient attributes, i.e., triage, age, treatment, etc., via the developed R-MDS dashboard application, which can then graphically present the statistical results, rules, and other ED-related information for further interaction with the systems. The related patient data and the rules extracted based on the proposed ED data-mining process are stored in the ED rule knowledge base. For this research, we built the visualization portal using Silverlight. Then, using web services via a Silverlight application, we could elicit results from the ED rule knowledge base and the ED database. Figure 2 shows the system framework. We briefly introduce the deployment system as follows.

Fig. 2.
figure 2

The system framework of medical data mining and visualization

The EDDS Visualization Portal: We built a front-end interface using Rich Interactive Application (RIA) services with Silverlight and used the C# and the .NET framework to implement the back-end services. Silverlight is an application framework for writing and running RIAs that provides an interface that is faster and more responsive than traditional applications. The Microsoft .NET RIA services simplify n-tier application patterns and provide data validation and authentication by integrating with Silverlight components on the client and ASP.NET on the mid-tier. Several Silverlight visualization controllers can present data, e.g., the custom controller, dashboard controller, and graph generation controller. The user can then refer to the method on the domain context that corresponds to the domain service query method to retrieve the data. Technically, user interfaces are declared in Extensible Application Markup Language (XAML), a declarative XML-based language, to manipulate the controllers easily and efficiently.

Domain-Driven Kernel Service: Web or mobile applications developed by Silverlight are run on the client side; thus, we implement a domain-driven kernel service that adopts Windows Communication Foundation (WCF) RIA services to create a Silverlight client that uses simplified data validation and is aware of the application logic on the middle tier when interacting with data. That is, Silverlight controls will utilize classes that are automatically generated from codes in the middle tier. We can then use web services via a Silverlight application, i.e., the developed R-MDS visualization portal. In this work, our Domain-Driven Kernel Service includes three layers as follows: a service layer, an ADO.NET entity framework layer, and a database layer. The service layer is composed of several ED-related service modules that have corresponding interfaces in the proposed portal. The ADO.NET entity framework supports the development of data-driven applications. The entity framework will transform a LINQ or SQL entity to the relational SQL. The database layer includes an ED rule knowledge base and the ED database. When the R-MDS visualization portal receives the parameters from the portal, it will call the corresponding interface, and then pass the results to the portal for helping users provide just-in-time services or decision-making.

ED Data-Mining Process: The data set is extracted from the healthcare information system (HIS) of the ED of the Mackay Memorial Hospital in Taipei. We adopted the CRISP-DM process to build the model and then extract the users’ behavior rules. In this work, we proposed a multi-tiered ED crowding data-mining and visualization framework to predict patients’ LOS and extracted rules from LOS that may cause ED crowding and then visualize and validate the rules. The validating rules will be stored in the knowledge base in the proposed system for further interactive decision support. We will introduce our proposed ED data-mining process in the next section.

3 The Views, and the Services of the R-MDS Visualization Portal

In our preliminary design, there are three views (interfaces) provided in the developed portal: a report view, diagnostic view, and maintenance view. We introduce the three views briefly in the following. More details will be addressed in our future work.

Report View: The interface shows the results based on the user’s input data. There are three kinds of charts used: line, column, and pie charts which show the statistical data of the ED. Table 1 shows an example of statistical information on patient attributes (in this case, age) as an information visualization (IV) chart. In addition, we designed the block to display global messages that can show real-time emergency information to the staff.

Table 1. Examples of attribute statistical information with IV charts

Diagnostic View: This view allows users to append a new instance via the diagnostic view. In addition, the interface shows the matched inference rules based on the criteria set by users via the interface, as shown in Fig. 3. The rules will be ranked based on their scores, i.e., to what degree they match the user’s setting criteria. The user can also check the statistical data in the diagnostic interface or switch to the report view to have further explorations, as shown at the bottom of Fig. 3.

Fig. 3.
figure 3

Diagnostic view (interface)

Maintenance View: The maintenance view includes the role authority management, user management, global message management, and rule management. It helps system administrators to conduct basic but important system management tasks.

4 Three-Phase Medical Data Mining Process for the ED

4.1 The Process

The emergency department (ED) is the most important and the busiest unit within a hospital, providing emergency care and treatment to patients needing immediate medical attention. However, EDs in Taiwan have not been able to accommodate the rapidly increasing patient demand in the past 14 years. Hoot and Aronky [7] and Chan et al. [3] postulated three solutions to address ED overcrowding: (1) increasing supply resources by adding manpower, number of beds, equipment, and space; (2) controlling demand sources by implementing feasible strategies such as referrals to other departments, clinics, or hospitals; (3) exploiting management skills and operational research models for the efficient allocation of medical resources to reduce patient waiting time and alleviate ED overcrowding. In our ongoing work, we try to tackle the ED overcrowding problem by adopting a domain-driven decision support approach and by exploiting management skills for the efficient application of medical resources [4]. Basically, the main objectives toward building the medical data mining model are addressed as follows.

  1. 1.

    We conducted data preprocessing and then applied association rule mining to identify frequent ED behaviors of patients and infrequent ED behaviors of patients. Accordingly, two types of patient behavior (PB)—i.e., regular and exceptional behaviors of patients—were identified.

  2. 2.

    We adopted a k-means clustering approach to classify the two types of PB based on different LOS and then labeled the cluster results by linguistic terms—i.e., long, medium or short LOS. Furthermore, we conducted correlation analysis between various groups of LOS and the ED crowding condition based on a five level triage scale, i.e. from level 1 (most urgent) to level 5 (least resource intensive).

  3. 3.

    We extracted rules from clusters of LOS of two types of PBs which have a positive correlation with ED crowding. We then used another data set to verify the rules in terms of precision and accuracy. Due to the limitation of pages, the validation results will be explained in our on-going work [4].

The procedure for three-phase data mining is introduced below briefly.

Phase 1. Data Preprocessing and Partitioning: First, we conducted data cleaning for dealing with noisy, incomplete, and inconsistent data Han and Kamber [6]. We then did data discretization for numerical data, e.g., age, numbers of varying medical equipment such as X-ray and CT machines, number of lab technicians, etc. Notably, the frequency for using medical resources will be combined into a treatment pattern and expressed as P(# of XRAYLevel, # of CTLevel, # of LABLevel) which form as a derived attribute, i.e., pattern. Accordingly, the attributes for data preprocessing includ: mode of arrival, triage, age, arrival day, arrival time, temperature, pattern (# of XRAY Level, # of CT Level, # of LAB Level), and disposition. After applying discretizing for the usage of medical resources, the pattern will be transformed into linguistic terms, i.e., high (H), medium (M) and low (L). Table 2 lists the basic demographic attributes of ED patients with descriptive statistical data from January 1, 2010 to July 31, 2010. The attributes are selected from the hospital’s ED database treated as input variables of the association rule mining algorithm. We then adopted the Apriori algorithm to partition the dataset into frequent ED behaviors of patients and infrequent ED behaviors of patients. Accordingly, two types of patient behavior (PB)—i.e., the regular and exceptional behaviors of patients—were identified.

Table 2. Basic demographic attributes with descriptive statistical data of ED patients

Phase 2. Data Clustering and Labeling: In this phase, we used k-means to cluster the separated dataset, i.e., regular PBs and exceptional into three groups based on the LOS and we extracted rules from each cluster in the next phase. We evaluated and compare the clustering results of three groups of two PBs. Then, analyzing the correlation between various ranges of LOS and ED crowding conditions based on the five- level triage scale respond to our research question: Which target group(s) caused the problem of ED crowding?

Phase 3. Rule Extractions and Validations: In the remaining phase, we adopted classification methods to build models and then extract the rules for future predictions. We evaluated the accuracy of the models to identify the one with the best predictive capabilities that can explain the two types of PB based on triage. We will consult with the clinicians of the cooperating hospital to validate the rules. Finally, we will learn whether the rules can help the ED to decrease the problem of ED crowding.

4.2 The Preliminary Analysis Results

We first analyzed the correlation of patients with different LOS and ED crowding under various levels of triages, which based on the 5 level triage scale, are “Resuscitation”, “Emergency”, “Urgent”, “Semi-Urgent”, and “Nonurgent” (Level 1 to Level 5, respectively). Figure 4(a) and 4(b) show the results of Pearson’s correlation analysis at different levels of triages in Mackay Memorial Hospital in the first six months of 2010. Apparently, no matter for regular PBs or exceptional PB, there is a positive relationship between each level of triage and ED crowding under the shortest LOS, i.e., cluster 0. On the other hand, for groups of medium or long LOS, there may be positive or negative relationship between different levels of triage and ED crowding conditions. In our ongoing work, we have consulted the attending physician to preliminarily confirm the value of the extracted rules. The attending physician has stated that the rules belonging to regular PB have higher value than those for exceptional PB. Accordingly, we will evaluate the prediction accuracy of the group with the shortest LOS for patients with regular behaviors.

Fig. 4(a).
figure 4

Correlation of patients with different LOS and ED crowding under various levels of triage for regular PB

Fig. 4(b).
figure 5

Correlation of patients with different LOS and ED crowding under various levels triage for exceptional PBs

5 Experimental Setup and Preliminary Results

As mentioned earlier, we adopted classification methods to build the model and then extracted the rules for future predictions. Basically, J48, CART and JRip are selected as methods in this research. We conducted an experiment to investigate the effects of making prediction of patients’ LOS by considering triage levels. First, the prediction is based on patients’ behaviors without considering triage, named as either Regular or Exceptional. Then, the prediction takes into consideration the triage of the patients of regular behaviors, named as T n _R. In this research, we adopted 10-fold cross validation to evaluate the accuracy of the prediction results.

In this research, we make a comparison between prediction results with or without considering triage levels for patients of regular behaviors. Note that we only focus on patients with the shortest LOS, due to its higher correlation with ED crowding as compared to a medium or long LOS. Figure 5 shows the prediction results with or without considering triage for patients with regular behaviors.

Fig. 5.
figure 6

Prediction results with or without considering triage for Regular PB

The results show that the three decision-tree methods can achieve similar results for each condition. If we partitioned data based on treatment behavior and triage, we find that we can achieve better prediction results for the regular-behavior patients who belong to triage 5 (T5) or triage 4 (T4) compared to the other conditions. Figure 5(a) shows regular behavior patients who belong to T4 with a positive correlation with ED crowding and pass statistical test. Other ongoing work also shows that patients belonging to the nonurgent group lead to ED crowding. Thus, the results will be useful for analyzing the behaviors of patients who cause ED crowding and will be practical for medical decision support.

6 Conclusions and Future Works

In our work, we propose a novel framework for mining patients’ behaviors in the ED related to different LOS, i.e., a multi-tiered ED crowding data mining and visualization framework for medical decision support. Accordingly, a rule-based medical decision support (R-MDS) visualization portal is presented in this work. We expect the future contributions of this research will be (1) to build a predictive model with the explanation capability to assist the ED of a hospital in decreasing its overcrowding conditions and improving the quality of health care; and (2) to accomplish the deployment of an interactive medical decision support system with information visualization tools to help doctors and clinical staff conduct analysis and make timely decisions. Keim and Kriegel [8] point out that developing visualization techniques that are adequate for exploring large amounts of multi-dimensional data is one of the research challenges. We would help users explore the multi-dimensional data by employing interactive techniques. Finally, we also believe that the proposed framework can be generalized to give rule-based decision support and include domain-driven knowledge discovery in the database.