Elsevier

Knowledge-Based Systems

Volume 150, 15 June 2018, Pages 111-115
Knowledge-Based Systems

EasyMiner.eu: Web framework for interpretable machine learning based on rules and frequent itemsets

https://doi.org/10.1016/j.knosys.2018.03.006Get rights and content

Abstract

EasyMiner (http://www.easyminer.eu) is a web-based system for interpretable machine learning based on frequent itemsets. It currently offers association rule learning (apriori, FP-Growth) and classification (CBA). EasyMiner offers a visual interface designed for interactivity, allowing the user to define a constraining pattern for the mining task. The CBA algorithm can also be used for pruning of the rule set, thus addressing the common problem of “too many rules” on the output, and the implementation supports automatic tuning of confidence and support thresholds. The development version additionally supports anomaly detection (FPI and its variations) and linked data mining (AMIE+). EasyMiner is dockerized, some of its components are available as open source R packages.

Introduction

Rules are one of the most accessible forms of knowledge that can be derived from data, and can thus serve as a basis for a machine learning framework focused on generation of interpretable models. In order to ensure scalability, the presented system relies on association rule learning, which uses efficient algorithms for frequent itemset mining proven to work on large datasets [1]. While association rules were originally devised for exploratory data mining, they can also be turned to a classifier and also serve as a basis for interpretable anomaly detection [2]. The EasyMiner framework contains a carefully curated selection of algorithms based on association rules and their “building blocks” – the frequent itemsets. These cover some of the most common machine learning problems while fostering interpretability by adhering to one type of symbolic knowledge representation.

Association rule learning can be informally described as a task of finding all rules in the input dataset of the form: antecedentconsequent, which meet predefined statistical measures of interest. When the input for association rule learning is a transaction database as originally expected by the apriori algorithm, the first approach for mining association rules [3], the discovered association rules are composed of items. Example of such rule is: onion, potatomeat. In EasyMiner, the input for association rule learning is a flat file containing multinominal attributes, as in the standard classification task. This corresponds to output association rules such as district=Praguesalary=Lowrating=C. Each rule is associated with interest measures, such as support, defined as the number of data rows (instances) matching the entire rule, and confidence that expresses how many percent of instances matching the antecedent also match the consequent.

Algorithms for classification that are based on association rules take the list of rules output by association rule learning on the input and process it into a rule-based classifier. Classification based on Associations (CBA) algorithm proposed by Liu et al. [4] is considered as the reference algorithm for this group of classification algorithms. The main steps in CBA are removal or redundant rules and inclusion of a default rule, which ensures that every test instance is covered. While CBA, proposed in 1998, is a relatively old algorithm, we included it into EasyMiner. The output of CBA is more user friendly than of its successors, while the difference between the CBA's accuracy and the accuracy of the state-of-the-art association rule classification algorithms is very small [5]. CBA also helps to address one of the main problems with association rule learning (as an exploratory data mining task), which is the high number of rules that can be generated. Since CBA only removes rules from the original list, it can be used for pruning the set of association rules.

As two additional types of task, the development version of EasyMiner integrates anomaly detection and extraction of association rules from linked data. Anomaly detection is based on the frequent itemset-based outlier detection approach [6], [7], which assumes that if an instance is covered by multiple frequent itemsets, it means that this data instance is unlikely to be an anomaly. The linked data support is based on implementation of the AMIE+ algorithm for rule mining in ontological knowledge bases [8].

Section snippets

Problems and background

The presented system offers an open source web-based framework for machine learning. Its main functionality covered in this article is association rule learning and building of classifiers composed of association rules. The difference between EasyMiner and main-stream open source machine learning toolboxes, such as Scikit-learn or R (http://scikit.ml/, https://www.r-project.org/), or specialized toolboxes such as spmf (http://www.philippe-fournier-viger.com/spmf/) that also contain some of the

Software framework

EasyMiner is composed of several microservices, which communicate via REST APIs (ref. to Fig. 1). The application has two logical layers. Frontend provides user interface, management of users and tasks and integration of backend services. Backend handles the data processing. The data processing itself is composed of three independent components: data storage, data preprocessing and data mining. Fig. 1 gives a more detailed view of the core data mining layer. EasyMiner historically supports

Software architecture

This paper describes EasyMiner/R, which uses the R framework for performing machine learning tasks. The association rule learning step in CBA is performed by implementation of the apriori algorithm in C introduced in [13], which is wrapped into the R’s arules package [11]. The pruning has been partly implemented in Java and wrapped as a standalone R package.4 The use of R facilitates further extensions of the system, for example with additional

Illustrative examples

In this section, we will cover the entire workflow of analyzing dataset with EasyMiner. A video file demonstrating this process on a particular dataset is contained in the supplementary material.

First, the user has to log in using a local account or a social network account. After authentication, the user uploads the dataset in CSV or zipped CSV, there is also the option to reuse an already uploaded data file. Once the data are uploaded, the user selects which data fields will be used by

Conclusions

EasyMiner/R available at http://www.easyminer.eu is an open source framework for interpretable machine learning. For association rule learning and classification, it offers an interactive web-based interface. The user visually constructs a “query” in the browser by defining a rule template. Since the mining proceeds incrementally from shorter to longer rules, the user is served the shortest, and typically most satisfying rules first before the mining has finished. If there are too few or too

Required metadata

(Tables 2 and 3)

Acknowledgments

We would like to thank Jan Rauch and Milan Milan Šimůnek for helpful discussions. The research and development of EasyMiner was supported by the European Union via the LinkedTV project (no. FP7-287911) and the OpenBudgets.eu project (No. H2020-645833) and by the University of Economics, Prague (grants IGA 15/2010, 26/2011, 21/2013 and 29/2016 and institutional support for long term research of SV and TK). The development of the CBA component was supported by CESNET grant no. 540/2014.

The

References (21)

  • U. Yun et al.

    An efficient algorithm for mining high utility patterns from incremental databases with one database scan

    Knowl. Based Syst.

    (2017)
  • U. Yun et al.

    Damped window based high average utility pattern mining over data streams

    Knowl. Based Syst.

    (2018)
  • T. Hastie et al.

    The Elements of Statistical Learning

    (2001)
  • M. Kopp et al.

    Evaluation of association rules extracted during anomaly explanation

    ITAT

    (2015)
  • J. Fürnkranz et al.

    A brief overview of rule learning

    International Symposium on Rules and Rule Markup Languages for the Semantic Web. RuleML 2015

    (2015)
  • B. Liu et al.

    Integrating classification and association rule mining

    KDD’98: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining

    (1998)
  • T. Kliegr, Quantitative CBA: small and comprehensible association rule classification models, arXiv:1711.10166...
  • Z. He et al.

    Fp-outlier: frequent pattern based outlier detection

    Comput. Sci. Inf. Syst.

    (2005)
  • J. Kuchař et al.

    Spotlighting anomalies using frequent patterns

    Proceedings of the KDD 2017 Workshop on Anomaly Detection in Finance. Halifax: PMLR

    (2018)
  • L. Galárraga et al.

    Fast rule mining in ontological knowledge bases with AMIE+

    VLDB J.

    (2015)
There are more references available in the full text version of this article.

Cited by (14)

  • Web usage analysis of Pillar 3 disclosed information by deposit customers in turbulent times

    2021, Expert Systems with Applications
    Citation Excerpt :

    They used basic principles of the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology to create segments of users with common interests. Other authors (Anitha, 2010; Bhawsar, Pathak, & Patidar, 2012; Vojíř, Zeman, Kuchař, & Kliegr, 2018; Yin & Guo, 2013) also focused on the effort to detect, respectively, to predict the next step of visitors to the web portal. Makkar, Gulati, and Sharma (2010) used Petri Nets with information obtained from the log file and the structure of the web portal to predict the behaviour of users on the web portal.

  • A review of possible effects of cognitive biases on interpretation of rule-based machine learning models

    2021, Artificial Intelligence
    Citation Excerpt :

    In fact, quite some work has been devoted to explaining black-box models, such as neural networks, support vector machines and tree ensembles with interpretable surrogate models, such as rules and decision trees (for a survey on this line of work we refer, e.g., to [69]). As such a conversion typically also goes hand-in-hand with a corresponding reduction in the accuracy of the model, this approach has also been criticized [142], and the interest in directly learning rule-based models has recently renewed (see, e.g., [52,176,110,173]). Embedding cognitive biases to learning algorithms The applications of cognitive biases go beyond explaining existing machine learning models.

  • Finding Suitable Membership Functions for Mining Fuzzy Association Rules in Web Data Using Learning Automata

    2021, International Journal of Pattern Recognition and Artificial Intelligence
View all citing articles on Scopus
View full text