Elsevier

Information Sciences

Volumes 355–356, 10 August 2016, Pages 58-73
Information Sciences

QualityCover: Efficient binary relation coverage guided by induced knowledge quality

https://doi.org/10.1016/j.ins.2016.03.009Get rights and content

Abstract

Formal Concept Analysis, as a mathematical tool, has been applied successively in diverse fields such as data mining, conceptual modeling, social networks, software engineering, and the semantic web, to cite but a few. One of the utter shortcoming of Formal Concept Analysis, however, is the large number of formal concepts that are extracted from even reasonably sized formal contexts. This overwhelming number was a key hindrance for a larger utilization of the technique (FCA). To overcome this shortcoming, only extracting a minimal coverage of formal concepts could be a remedy. Even though this task has been shown to be NP-hard, it attracted the attention of many researchers. In this paper, we introduce a new gain function based approach, called QualityCover, for the extraction of a pertinent coverage of a formal context. This algorithm operates akin to a greedy approach and relies on the assessment of a measure of correlation for the selection of the formal concepts to be retained in the final coverage. Extensive experiments show that QualityCover obtains very encouraging results versus those obtained by pioneering approaches in the literature.

Introduction

Formal Concept Analysis (FCA) is a mathematical tool for analyzing data and formally representing conceptual knowledge [16]. FCA forms conceptual structures from data. Such structures consist of units, which are formal abstractions of concepts of human thought allowing meaningful and comprehensible interpretation [26]. Interestingly enough, a distinguishing feature of FCA is an inherent integration of components of conceptual processing of data and knowledge [4]. Through the integration of these components, FCA’s mathematical settings have recently been shown to act as a powerful tool by providing a theoretical framework for the efficient resolution of many practical problems from data mining, software engineering and information retrieval, to cite but a few [14], [15], [18], [21].

Nevertheless, the overwhelming number of formal concepts that may be drawn from even a reasonably sized contexts [1], was a hindrance towards a larger utilization of FCA. In fact, an interesting tackle of this issue is to find coverage of a formal context by a minimal number of formal concepts. By considering this problem, we instantiate the famous algorithmic problem of set coverage, which involves finding the smallest sub-collection of sets that covers some universe. Although finding an optimal solution to this problem is NP-hard, a greedy algorithm is widely used, and typically finds solutions that are close to optimal [36].

In this respect, the dedicated literature is witnessing a heavy number of works [10]. Thus, Belkhiter et al. [3] introduced a pertinent rectangular decomposition of a formal context as well as an application to documentary databases. The decomposition introduced is based on the selection of ”optimal” formal concepts. The optimality is assessed through the maximization of a function that computes the storage space of a formal concept. Later, Khcherif et al. [22] introduced a rectangular decomposition approach based on the Riguet’s difunctional relation [32]. The computation of this difunctional is reduced to the localization of a set of key points called isolated points. The latter have been shown to enable a minimal set of formal concepts to cover a given formal context to be determined. Belohlavek and Vychodil [7] introduced the GreCond approach and later Belohlavek and Trnecka [6] via the GreEss approach, put focus on the same issue. Thus, they proposed a new method for decomposing a binary matrix into a Boolean product of factors. Worthy of citation, both latter approaches have a close connection with the isolated points through shedding light on ”mandatory” formal concepts in the coverage.

In this paper, we introduce a new approach for the extraction of a pertinent coverage of a formal context. The driving idea is that chosen formal concepts convey added value from the quality of knowledge that may be drawn. Indeed, the intent part of a formal concept has been shown to play a key role in the rule set construction. This rule set is at the origin of a variety of compact subsets of the implication/association rule1 sets of a context, which are called as generic bases [8].

Interestingly enough, as stressed on in [17], only informative (generic) association rules can be derived from highly correlated patterns. This fact is behind our motivation for the choice of the chosen formal concepts kept in the coverage. This choice is based on the assessment of the correlation of their respective intent parts. By doing so, our aim is improving the informativity as well as the strength of the derived association rules.

It is important that, to the best of our knowledge, the introduced approach is the first that tackles this issue from a data mining point of view. Indeed, pioneering approaches of the literature only, respectively, paid attention to the minimization of the storage of a documentary database as well as the number of factors within the Boolean factor analysis framework.

To show the benefits of our approach, extensive comparisons were carried out versus pioneering ones in the literature and these shed light on very encouraging results. The validation protocol relies heavily on the criterion of coverage’s compacity as well as common quality metrics–e.g., coupling, cohesion, stability, separation and distance.

The remainder of the paper is organized as follows: The next section recalls the key notions used throughout this paper. Section 3 reviews related work. Then, we thoroughly describe, in Section 4, our algorithm for the extraction of a pertinent coverage from a binary relation, called QualityCover. Section 5 describes the experimental study and the results we obtained. Section 6 concludes the paper and identifies avenues for future work.

Section snippets

Key notions

In this section, we briefly sketch the key notions used in the remainder of this paper.

Related work

The costly computation complexity of extractions the whole set of formal concepts was a main impediment to the wide-scale use of FCA battery of results for large datasets. To overcome this drawback, the issue of extracting a compact coverage of formal concepts grasped the interest of the research community. At a glance, the dedicated literature witnessed two main streams for addressing such a task: (i) Gain function based approaches; and (ii) Localization of key points based approaches. In the

Extracting a pertinent coverage from a binary relation

In this section, we introduce a new approach, based on a greedy algorithm, for the extraction of a pertinent coverage of a binary relation. The guiding idea of our approach, is that the extraction process is mainly based on the quality of knowledge that may be drawn from each formal concept. In fact, our gain function is based on the assessment of the correlation of the intent part of pertinent formal concepts.

In the following, we thoroughly describe a new algorithm, called QualityCover, for

Experimental results

In this section, we present our results showing the efficiency of our proposed QualityCover algorithm. The solution was implemented and executed on a Core i7 PC, CPU 2.4 GHz with 16 GB of RAM and an Ubuntu distribution Linux system. The experiments mainly concern the compacity as well as the quality of the formal concepts composing the coverage. The quality is assessed through coupling, cohesion, stability, separation and distance metrics. At first, we lead a series of experiments to assess the

Conclusion

In this article, we presented a new gain function based approach, built on a greedy algorithm QualityCover, for the extraction of a pertinent coverage of a binary relation. The main thrust of the approach is that the obtained coverage relies on an assessment of a measure of correlation of a set of items to select formal concepts to be included in the coverage. Extensive experimental work showed that QualityCover obtains very encouraging results versus those obtained by pioneering approaches of

Acknowledgments

The authors are thankful for the anonymous reviewers and Pr. Peter Eklund (Head of PhD School, IT University of Copenhagen) who accepted to proof read this paper. We also thank the respective authors whom accepted to provide us the source codes of their algorithms, namely Martin Trnecka for the GreEss algorithm, Vilem Vychodil for the GreCond algorithm and Fethi Ferjani for GenCoverage algorithm.

References (41)

  • M. Barbut et al.

    Ordre et Classification. Algèbre et Combinatoire

    (1970)
  • N. Belkhiter et al.

    Décomposition rectangulaire optimale d’une relation binaire: application aux bases de données documentaires

    INFOR

    (1994)
  • R. Belohlavek, Introduction to formal concept analysis, 2008, (https://phoenix.inf.upol.cz/esf/ucebni/formal.pdf). Last...
  • R. Belohlavek et al.

    Basic level of concepts in formal concept analysis

    Proceedings of the 10th International Conference on Formal Concept Analysis (ICFCA2012), LNCS 7278, Leuven, Belgium

    (2012)
  • R. Belohlavek et al.

    Discovery of optimal factors in binary data via a novel method of matrix decomposition

    J. Comput. Syst. Sci.

    (2009)
  • S. Ben Yahia et al.

    A new generic basis of factual and implicative association rules

    Intell. Data Anal.

    (2009)
  • G. Choquet

    Theory of capacities

    Ann. l’Inst. Fourier

    (1953)
  • I. Dimassi et al.

    Dfsp: A new algorithm for a swift computation of formal concept set stability

    Proceedings of the 11th International Conference on Concept Lattices and Their Applications (CLA’2014)

    (2014)
  • L. Duan

    Effective and Efficient Correlation Analysis with Application to Market Basket Analysis and Network Community Detection, (Ph.D. Thesis)

    (2012)
  • P.W. Eklund et al.

    Concept similarity and related categories in information retrieval using formal concept analysis

    Int. J. Gen. Syst.

    (2012)
  • Cited by (37)

    • Revisiting the GreCon algorithm for Boolean matrix factorization

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Namely, the formal context represents the input data, and formal concepts represent factors in such data. As a consequent, BMF can be seen as a covering of a formal context by formal concepts [7–9]. In the pioneer work [9] two main algorithms, GreCon and GreConD, for BMF as well as a fundamental theory of BMF based on FCA were established.1

    • Boolean matrix factorization with background knowledge

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Nevertheless, our approach can be simply extended to an arbitrary BMF method. There are many (general) BMF methods, e.g., [2,10–19]. Note, none of them utilizes the background knowledge.

    • On the efficient stability computation for the selection of interesting formal concepts

      2019, Information Sciences
      Citation Excerpt :

      Avenues of future work are as follows: Design of a Top-k extraction method of formal concepts: It is a crucial issue to design a Top-k extraction method that belongs to the relevance-oriented type, which usually extracts the first k formal concepts sequentially according to a ranking based on an aggregation of the above considered quality used in [23]. It is of paramount importance that the aggregation of these assessment criteria should avoid the pitfall of using the classical weighted means and linear combination schemes to address this issue.

    View all citing articles on Scopus
    View full text