EasyMiner.eu: Web framework for interpretable machine learning based on rules and frequent itemsets

doi:10.1016/j.knosys.2018.03.006

Knowledge-Based Systems

Volume 150, 15 June 2018, Pages 111-115

https://doi.org/10.1016/j.knosys.2018.03.006 Get rights and content

Abstract

EasyMiner (http://www.easyminer.eu) is a web-based system for interpretable machine learning based on frequent itemsets. It currently offers association rule learning (apriori, FP-Growth) and classification (CBA). EasyMiner offers a visual interface designed for interactivity, allowing the user to define a constraining pattern for the mining task. The CBA algorithm can also be used for pruning of the rule set, thus addressing the common problem of “too many rules” on the output, and the implementation supports automatic tuning of confidence and support thresholds. The development version additionally supports anomaly detection (FPI and its variations) and linked data mining (AMIE+). EasyMiner is dockerized, some of its components are available as open source R packages.

Introduction

Rules are one of the most accessible forms of knowledge that can be derived from data, and can thus serve as a basis for a machine learning framework focused on generation of interpretable models. In order to ensure scalability, the presented system relies on association rule learning, which uses efficient algorithms for frequent itemset mining proven to work on large datasets [1]. While association rules were originally devised for exploratory data mining, they can also be turned to a classifier and also serve as a basis for interpretable anomaly detection [2]. The EasyMiner framework contains a carefully curated selection of algorithms based on association rules and their “building blocks” – the frequent itemsets. These cover some of the most common machine learning problems while fostering interpretability by adhering to one type of symbolic knowledge representation.

Association rule learning can be informally described as a task of finding all rules in the input dataset of the form: antecedent ⇒ consequent, which meet predefined statistical measures of interest. When the input for association rule learning is a transaction database as originally expected by the apriori algorithm, the first approach for mining association rules [3], the discovered association rules are composed of items. Example of such rule is: onion, potato ⇒ meat. In EasyMiner, the input for association rule learning is a flat file containing multinominal attributes, as in the standard classification task. This corresponds to output association rules such as district=Prague ∧ salary=Low ⇒ rating=C. Each rule is associated with interest measures, such as support, defined as the number of data rows (instances) matching the entire rule, and confidence that expresses how many percent of instances matching the antecedent also match the consequent.

Algorithms for classification that are based on association rules take the list of rules output by association rule learning on the input and process it into a rule-based classifier. Classification based on Associations (CBA) algorithm proposed by Liu et al. [4] is considered as the reference algorithm for this group of classification algorithms. The main steps in CBA are removal or redundant rules and inclusion of a default rule, which ensures that every test instance is covered. While CBA, proposed in 1998, is a relatively old algorithm, we included it into EasyMiner. The output of CBA is more user friendly than of its successors, while the difference between the CBA's accuracy and the accuracy of the state-of-the-art association rule classification algorithms is very small [5]. CBA also helps to address one of the main problems with association rule learning (as an exploratory data mining task), which is the high number of rules that can be generated. Since CBA only removes rules from the original list, it can be used for pruning the set of association rules.

As two additional types of task, the development version of EasyMiner integrates anomaly detection and extraction of association rules from linked data. Anomaly detection is based on the frequent itemset-based outlier detection approach [6], [7], which assumes that if an instance is covered by multiple frequent itemsets, it means that this data instance is unlikely to be an anomaly. The linked data support is based on implementation of the AMIE+ algorithm for rule mining in ontological knowledge bases [8].

Section snippets

Problems and background

The presented system offers an open source web-based framework for machine learning. Its main functionality covered in this article is association rule learning and building of classifiers composed of association rules. The difference between EasyMiner and main-stream open source machine learning toolboxes, such as Scikit-learn or R (http://scikit.ml/, https://www.r-project.org/), or specialized toolboxes such as spmf (http://www.philippe-fournier-viger.com/spmf/) that also contain some of the

Software framework

EasyMiner is composed of several microservices, which communicate via REST APIs (ref. to Fig. 1). The application has two logical layers. Frontend provides user interface, management of users and tasks and integration of backend services. Backend handles the data processing. The data processing itself is composed of three independent components: data storage, data preprocessing and data mining. Fig. 1 gives a more detailed view of the core data mining layer. EasyMiner historically supports

Software architecture

This paper describes EasyMiner/R, which uses the R framework for performing machine learning tasks. The association rule learning step in CBA is performed by implementation of the apriori algorithm in C introduced in [13], which is wrapped into the R’s arules package [11]. The pruning has been partly implemented in Java and wrapped as a standalone R package.⁴ The use of R facilitates further extensions of the system, for example with additional

Illustrative examples

In this section, we will cover the entire workflow of analyzing dataset with EasyMiner. A video file demonstrating this process on a particular dataset is contained in the supplementary material.

First, the user has to log in using a local account or a social network account. After authentication, the user uploads the dataset in CSV or zipped CSV, there is also the option to reuse an already uploaded data file. Once the data are uploaded, the user selects which data fields will be used by

Conclusions

EasyMiner/R available at http://www.easyminer.eu is an open source framework for interpretable machine learning. For association rule learning and classification, it offers an interactive web-based interface. The user visually constructs a “query” in the browser by defining a rule template. Since the mining proceeds incrementally from shorter to longer rules, the user is served the shortest, and typically most satisfying rules first before the mining has finished. If there are too few or too

Required metadata

(Tables 2 and 3)

Acknowledgments

We would like to thank Jan Rauch and Milan Milan Šimůnek for helpful discussions. The research and development of EasyMiner was supported by the European Union via the LinkedTV project (no. FP7-287911) and the OpenBudgets.eu project (No. H2020-645833) and by the University of Economics, Prague (grants IGA 15/2010, 26/2011, 21/2013 and 29/2016 and institutional support for long term research of SV and TK). The development of the CBA component was supported by CESNET grant no. 540/2014.

The

References (21)

U. Yun et al.
An efficient algorithm for mining high utility patterns from incremental databases with one database scan
Knowl. Based Syst.
(2017)
U. Yun et al.
Damped window based high average utility pattern mining over data streams
Knowl. Based Syst.
(2018)
T. Hastie et al.
The Elements of Statistical Learning
(2001)
M. Kopp et al.
Evaluation of association rules extracted during anomaly explanation
ITAT
(2015)
J. Fürnkranz et al.
A brief overview of rule learning
International Symposium on Rules and Rule Markup Languages for the Semantic Web. RuleML 2015
(2015)
B. Liu et al.
Integrating classification and association rule mining
KDD’98: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining
(1998)
T. Kliegr, Quantitative CBA: small and comprehensible association rule classification models, arXiv:1711.10166...
Z. He et al.
Fp-outlier: frequent pattern based outlier detection
Comput. Sci. Inf. Syst.
(2005)
J. Kuchař et al.
Spotlighting anomalies using frequent patterns
Proceedings of the KDD 2017 Workshop on Anomaly Detection in Finance. Halifax: PMLR
(2018)
L. Galárraga et al.
Fast rule mining in ontological knowledge bases with AMIE+
VLDB J.
(2015)

There are more references available in the full text version of this article.

Cited by (14)

Web usage analysis of Pillar 3 disclosed information by deposit customers in turbulent times
2021, Expert Systems with Applications
Citation Excerpt :
They used basic principles of the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology to create segments of users with common interests. Other authors (Anitha, 2010; Bhawsar, Pathak, & Patidar, 2012; Vojíř, Zeman, Kuchař, & Kliegr, 2018; Yin & Guo, 2013) also focused on the effort to detect, respectively, to predict the next step of visitors to the web portal. Makkar, Gulati, and Sharma (2010) used Petri Nets with information obtained from the log file and the structure of the web portal to predict the behaviour of users on the web portal.
Market discipline has been a scrutinized area since the last financial crisis in 2008. Regulators strengthened their role particularly through Pillar 3 in Basel III. However, there are still some aspects of market discipline that deserve special attention to avoid future failures. This study focuses on the analysis of the interest and behaviour of deposit stakeholders based on website data dedicated to disclosures of commercial bank in Slovakia during and after turbulent times (period 2009–2012). The data consists of log files, and web mining techniques were applied (the modelling of web user behaviour in dependence on time - based on the proposals of the authors). The results show that also in turbulent times, stakeholders’ interest in Pillar 3 disclosures is low (in line with (Munk, Pilkova, Benko, & Blažeková, 2017)) and the highest interest was identified for the Pricing List category. After turbulent times, Pillar 3 categories (Pillar 3 related information and Pillar 3 disclosures) have weak interest, with peaks at the beginning of the year, and the highest increase was in the Business Conditions category. The results suggest that the enhancement of interest of key stakeholders in disclosures inevitably requires changes to deliver sufficient disclosure data structures and to design a disclosure policy that fulfils regulatory expectations.
A review of possible effects of cognitive biases on interpretation of rule-based machine learning models
2021, Artificial Intelligence
Citation Excerpt :
In fact, quite some work has been devoted to explaining black-box models, such as neural networks, support vector machines and tree ensembles with interpretable surrogate models, such as rules and decision trees (for a survey on this line of work we refer, e.g., to [69]). As such a conversion typically also goes hand-in-hand with a corresponding reduction in the accuracy of the model, this approach has also been criticized [142], and the interest in directly learning rule-based models has recently renewed (see, e.g., [52,176,110,173]). Embedding cognitive biases to learning algorithms The applications of cognitive biases go beyond explaining existing machine learning models.
While the interpretability of machine learning models is often equated with their mere syntactic comprehensibility, we think that interpretability goes beyond that, and that human interpretability should also be investigated from the point of view of cognitive science. The goal of this paper is to discuss to what extent cognitive biases may affect human understanding of interpretable machine learning models, in particular of logical rules discovered from data. Twenty cognitive biases are covered, as are possible debiasing techniques that can be adopted by designers of machine learning algorithms and software. Our review transfers results obtained in cognitive psychology to the domain of machine learning, aiming to bridge the current gap between these two areas. It needs to be followed by empirical studies specifically focused on the machine learning domain.
Driving maneuver classification from time series data: a rule based machine learning approach
2022, Applied Intelligence
Finding Suitable Membership Functions for Mining Fuzzy Association Rules in Web Data Using Learning Automata
2021, International Journal of Pattern Recognition and Artificial Intelligence
Sparse Nonnegative Interaction Models
2021, IEEE Access
Accent labeling algorithm based on morphological rules and machine learning in English conversion system
2021, Journal of Intelligent Systems

View all citing articles on Scopus

View full text

EasyMiner.eu: Web framework for interpretable machine learning based on rules and frequent itemsets

Abstract

Introduction

Section snippets

Problems and background

Software framework

Software architecture

Illustrative examples

Conclusions

Required metadata

Acknowledgments

Knowl. Based Syst.

Knowl. Based Syst.

The Elements of Statistical Learning

Evaluation of association rules extracted during anomaly explanation

ITAT

A brief overview of rule learning

International Symposium on Rules and Rule Markup Languages for the Semantic Web. RuleML 2015

Integrating classification and association rule mining

KDD’98: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining

Fp-outlier: frequent pattern based outlier detection

Comput. Sci. Inf. Syst.

Spotlighting anomalies using frequent patterns

Proceedings of the KDD 2017 Workshop on Anomaly Detection in Finance. Halifax: PMLR

Fast rule mining in ontological knowledge bases with AMIE+

VLDB J.