Processing online analytics with classification and association rule mining

https://doi.org/10.1016/j.knosys.2010.01.006Get rights and content

Abstract

Business performance measurements, decision support systems (DSS) and online analytical processing (OLAP) have a common goal i.e., to assist decision-makers during the decision-making process. Integrating DSS and OLAP into existing business performance measurements hopes to improve the accuracy of analysis and provide in-depth, multi-angle view of data. This paper describes a decision support system containing our methodology, Weighted and Layered workflow evaluation (WaLwFA), extended to incorporate business intelligence using C4.5 and association rule algorithms. C4.5 produces more comprehensible decision trees by showing only important attributes. Furthermore, C4.5 can be transformed into IF-THEN rules. However, association rules are preferred as data can be described in rules of multiple granularities. Sorting rules based on rules’ complexities permits OLAP to navigate through layers of complexities to extract rules of relevant sizes and to view data from multidimensional perspectives in each layer. Experimental results on an airline domain are presented.

Introduction

Business is a life by itself and being able to make decisions at the right place and right time is crucial in ensuring the survival of business in today’s global market environment. However, uncertain customers’ demands, comprehensiveness of data and complex businesses [16] have resulted in difficulty in collecting, processing and analyzing data at reduced cost and speed. As such, the quality of decisions reached is sometimes compromised. This scenario is reflected in a recent survey conducted among 6500 managers at Harbridge House. It reveals that only 20% of managers performed well in making definite decisions and in identifying the root of the problem.1 Hence, re-engineering business processes will take time and require participation of various workforces in order to create new ones or remedy previous solutions.

Two categories of problems affecting the performance of different groups in the organization have been identified based on prior research. The first category affects the technical operations personnel.

The technical/operations group of an organization usually faces three problems. First, most workflow management systems codify and categorize workflow according to functions but these workflow processes are usually not evaluated to determine which processes will result in best workflow practices. Hence, it is difficult to formulate best practices for each workflow category in comparison with workflow from other information systems [35]. Second, once the workflows of both local and external systems have been evaluated and best practices identified, there is a need to integrate them as meta-models and subsequently form the reference meta-model, essential for guiding implementation. Third, decision support systems rely on attributes (evaluation criteria) and values to advise or predict intelligently as seen in the case of cost-benefit analyses and net value propositions. Reuse of evaluation criteria across organizational functions is currently absent.

We have addressed the first category of problems with the WaLwFA methodology. WaLwFA [22] adopts the concept of model-driven architecture (MDA) with aims to capture best practices, which can be adopted and instantiated to many domains or various information systems. In WaLwFA [22], the business process models are evaluated using a set of weighted criteria and sub-criteria, which are derived by averaging the assignment of weights by a group of experts.

A business process model consists of a collection of business processes/workflows within an organization. Collection of several business models will be evaluated and the business models with higher scores are kept within the repository to form business meta-models and reference model respectively. The repository will constantly be updated when there is a new “good” business model. This ensures that the repository remains updated with the latest best practice. Our focus in this paper is the problems faced by the strategic management group.

In dynamic and competitive business environments, the organization’s top management is always challenged with ways to sustain or improve the organization’s long-term and short-term growth. As mentioned in Porter’s Five-Forces Model [7], rivalry among competing firms is the most powerful factor that forces the management to think of ways to remain competitive in the market including adopting best practices to formulate best strategy. For example, in the airline industry, with several airline companies offering attractive travel deals to similar destinations, the management group has to decide the best organizational strategy to be implemented in order to compete with existing competitors for market coverage. Deciding the best strategy to be implemented requires evaluation and consequently, the use of the right performance measurement.

Many business performance measurements and approaches [18], [22], [30], [31] have been introduced to complement one another in evaluating the status of organizations. Detailed list of works related to business performance measurements can be found in Section 2.1. Purely relying on subjective evaluation based on expertise and experience may possibly result in inaccurate evaluation of the status of the organization. Inaccurate evaluation may result in choosing the wrong strategy. So how can we utilize decision support systems (DSS) and Online analytical processing (OLAP) to enhance well-known business performance measurements and approaches to improve the process of decision-making? Questions arising are:

  • Currently, an individual expert or a group of experts can prioritise attributes through consensus. Getting consensus or relying on the individual expert will put a strain to the organization’s financial and time resources. So how can we mine data that contains experts’ feedbacks in order to identify key attribute(s) that provide high impact to the overall performance score of the organization?

  • The ever-changing business environments can result in different demands and expectations of employees within the organization. Therefore, current business performance measurements should be more flexible to allow management to manipulate and view the criteria according to their needs at different times. So how can we mine the data to identify the different degrees of correlation among attributes contributing towards a decision from multiple perspectives?

To address the above problems, we aim to:

  • Introduce a decision support system architecture consisting of our business performance model; namely, WaLwFA [22] extended to incorporate business intelligence capabilities to assist decision-making. The initial version of WaLwFA [22] relies on an averaging approach to determine the average weight for each of the criteria. This may not be optimal, as the averaging approach tends to normalize the weights derived from a group of experts. The heuristic functions within business intelligence algorithms will help to identify best or near-optimal solutions to the problem. Two business intelligence algorithms that are incorporated within the extended WaLwFA are C4.5 decision tree and association rule.

  • Use a business intelligence (BI) technique (namely the C4.5 decision tree algorithm) to discover significant attributes useful to building a decision tree. By using the C4.5 decision tree algorithm, we can identify the root of the decision tree and subsequently identify the significant attributes contributing to decision-making.

  • Use the association rule algorithm (namely Apriori algorithm [1] to derive simple and complex association rules and perform correlation analysis to calculate the dependency among attributes. It allows us to identify attribute(s) and cause-and-effect relationship among other attribute(s).

Why C4.5 and Apriori? Many researchers have compared the performance of different decision tree algorithms on the same benchmark data [11], [29], [39]. Although computer scientists have notably preferred Quinlan’s decision tree algorithms [34] to build decision tree models, the statistical community prefers Classification and Regression Trees or CART [2]. Study by Brieman et al. [2] on prediction of “long-term survival chances of patients admitted to intensive care after heart attack” indicates that the decision tree produced by CART is insignificant, having “three nodes and four leaves”. Breiman et al. [2] point out that the generation of simple decision trees raises much doubt on CART’s statistical ability to generate an accurate decision tree. During the conversion of decision tree into classification rules, Breiman et al. [2] further note that the rules are more sophisticated and less precise. Overall, C4.5 [34] can manage continuous and discrete attributes, missing attribute values within the training data and enforces pruning to ensure that Occam’s Razor is met.

The Apriori algorithm [1] is regarded as a well-known association rule learning algorithm in computer science. Typically, mining association rules involves discovering and generating association rules using the frequent itemsets that have minimum support [26]. The Apriori algorithm inherits a downward closure feature where if a particular set of items has passed the minimum support threshold, then the subsequent subsets of items generated from the set of items will have minimum support values as well [26]. The advantage of Apriori is the use of a large itemset [37] at the initial stage. The downward closure property allows easy generation of subsets of items from the large itemset, provided that the items fulfill the minimum support threshold. Because its implementation concentrates on processes of “discovering and generating”, the Apriori algorithm is easily implemented in any environment including in parallel systems. The success of many applications and research findings related to the Apriori algorithm indicates that the Apriori algorithm has been successfully applied to various problem domains such as manufacturing, marketing, logistics, medicine and many others [3], [4], [5], [8], [38].

  • By embedding BI with business processes and OLAP, we can mine data at different levels of granularity during the strategic and management decision-making processes.

  • By incorporating data mining techniques, we can specifically discover patterns that describe the relationships among criteria and allow multidimensional view of data at different layers of abstraction.

Section snippets

Business performance measurement

Business performance measurements are introduced as a means to track the execution of business strategies by comparing the current results with the business goals. Research in this area has existed since the early twentieth century with the establishment of Du Pont Powder Company’s Pyramid of Financial Ratio [31]. Focusing on the financial aspect, its financial-related criteria are arranged hierarchically where financial measures at higher levels depend on computational results at lower levels.

Proposed methodology (extended WaLwFA)

This section describes an extension to WaLwFA [22]. Fig. 1 shows the graphical representation of the proposed extension to WaLwFA. The red2 circle highlights the position and applicability of WaLWFA in the general DSS framework.

Based on Fig. 1, the input and output to the extended WaLwFA depends on the data organization component of DSS that stores all relevant raw and processed data. The business

Input data stage

Business transaction and evaluation data are often confidential to the company. Hence, we have obtained input data from a survey conducted on 33 experts in information technology as seed input to simulate a proof of concept. Experts (IT professionals in various industries such as banking, software house and many others with at least one year of working experience in IT) are asked to review an airline information system based on criteria established within WaLwFA. They are also asked to assign

Conclusion

We have proposed an extension to WaLwFA to embed well-known business performance measurements and approaches with DSS and OLAP to improve the process of decision-making. We investigated mining data in order to differentiate between the different degrees of significance in attributes and to identify the different degrees of correlation among attributes contributing towards a decision from multiple perspectives.

We have presented a DSS consisting of our business performance model, namely WaLwFA,

References (40)

  • F.R. David, Strategic Management Concepts and Cases, Pearson International Edition, 2009, p....
  • S. Doddi et al.

    Discovery of association rules in medical data

    Informatics for Health and Social Care

    (2001)
  • W. Eckerson

    BI case study: international truck and engine corporation

    Business Intelligence Journal

    (2004)
  • P. Ewing, L. Lundahl, The balanced scorecard at ABB Sweden – the EVITA Project, in: The International Workshop on Cost...
  • D.H. Fisher, K.B. McKuSick, An empirical comparison of ID3 and back-propagation, in: Proceedings of the Eleventh...
  • L. Fitzgerald, R. Johnston, S. Brignall, R. Silverstro, C. Voss, Performance Measurement in Service Business, CIMA,...
  • A.M. Ghalayini et al.

    The changing basis of performance measurement

    Journal of Operations and Production Management

    (1996)
  • J.R. Hauser et al.

    The house of quality

    Harvard Business Review

    (1988)
  • G. Huber

    The Necessary Nature of Future Firms: Attributes of Survivors in the Changing World

    (2003)
  • H.T. Johnson, R.S. Kaplan, Relevance Lost – The Rise and Fall of Management Accounting, Harvard Business School Press,...
  • Cited by (47)

    • ARC-SL: Association rule-based classification with soft labels

      2021, Knowledge-Based Systems
      Citation Excerpt :

      Moreover, in order to improve the model accuracy for complex classification problems, fuzzy sets and belief functions are integrated to AC for generating the classification rules with fuzzy antecedents [13,14] or belief consequents [15,16]. Up to now, it has been successfully applied for solving many real-world problems, including software defect prediction [17,18], text classification [19,20], medical diagnosis [21,22] and others [23,24]. In this paper, an association rule-based soft labelled classification method (ARC-SL) is proposed to obtain an accurate and interpretable classification model from imprecisely labelled data, where the class information of training instances is represented as belief functions.

    • Integrating decision tree with back propagation network to conduct business diagnosis and performance simulation for solar companies

      2016, Decision Support Systems
      Citation Excerpt :

      Generally, non-financial indicators are treated as leading factors while financial indicators are considered as lagging factors. Recently, owing to rapid advance in information technologies, business intelligence has become much more popular than before [15,24]. Its typical benefits include data visualization and association, dimension reduction, performance prediction, and scenario simulation.

    • An improved algorithm for mining class association rules using the difference of Obidsets

      2015, Expert Systems with Applications
      Citation Excerpt :

      Association rule mining has been extensively studied due to its application in numerous fields such as market basket analysis, medicine, protein sequencing, census data processing, and fraud detection. Many subjects have attracted researchers, including mining association rules (Duong, Tin, & Vo, 2014; Grahne & Zhu, 2005; Lucchese, Orlando, & Perego, 2006; Vo, Hong, & Le, 2012; Vo, Hong, & Le, 2013; Zaki & Hsiao, 2005) and classification based on association rules (Abdelhamid, Ayesh, Thabtah, Ahmadi, & Hadi, 2012; Chien & Chen, 2010; Coenen, Leng, & Zhang, 2007; Li, Han, & Pei, 2001; Lim & Lee, 2010; Liu, Hsu, & Ma, 1998; Liu, Jiang, Liu, & Yang, 2008; Liu, Ma, & Wong, 2000; Nguyen & Vo, 2014; Nguyen, Vo, Hong, & Thanh, 2012; Nguyen, Vo, Hong, & Thanh, 2013; Nguyen, Vo, & Le, 2014; Nguyen, Vo, & Le, 2015; Thabtah, Cowling, & Peng, 2004; Thabtah, Cowling, & Hammoud, 2006; Veloso, Meira, Goncalves, Almeida, & Zaki, 2007; Veloso, Meira, Goncalves, Almeida, & Zaki, 2011; Veloso, Meira, & Zaki, 2006; Vo & Le, 2008; Yang, Mabu, Shimada, & Hirasawa, 2011; Yin & Han, 2003; Zhang, Chen, & Wei, 2011; Zhao, Tsang, Chen, & Wang, 2010). A common issue in these problems is frequent itemset mining.

    View all citing articles on Scopus
    View full text