Relation-based granules to represent relational data and patterns

doi:10.1016/j.asoc.2015.08.045

Applied Soft Computing

Volume 37, December 2015, Pages 467-478

https://doi.org/10.1016/j.asoc.2015.08.045 Get rights and content

Highlights

•
Formula-based relations defined in a granular computing framework are proposed.
•
The relations are used to construct more informative granules.
•
The granules are used to represent relational data and patterns.
•
Thanks to this approach, the generation of patterns can be speeded up.
•
The approach makes it possible to discover richer knowledge from relational data.

Abstract

The complex structure of relational data makes the process of knowledge discovery from data a more challenging task compared with the single table data structure. The usefulness of granular computing based approaches to mining data stored in a single table is a driving force for adapting this method to relational data. This paper proposes relation-based granules that are defined in a granular computing based approach to mining relational data. The relations are used to represent relational data and patterns to be discovered. Thanks to this representation, the generation of patterns can be speeded up. The representation also makes it possible to discover richer knowledge from relational data.

Graphical abstract

Introduction

Granular computing provides a general framework for problem solving. It covers theories, methodologies, techniques, and tools that make use of granules [1]. They are, in general, understood as collections formed in the process of a semantically meaningful grouping of elements based on their indistinguishability, similarity, proximity or functionality [2].

The main idea of granular computing is to make it possible to view the same problem at many levels of granularity. Switching between different levels enables to choose the representation best matching to the problem. A more specific level granularity may reveal more detailed information, whereas a more abstract level granularity may improve a problem solution thanks to omitting irrelevant details.

A granular computing approach has successfully found application in data mining (e.g. [3], [4], [5], [6], [7]). Some attempts have also been made to adapt the idea of granulation to mining data stored in a relational structure (e.g. [8], [9], [10]).

Granulation tools can provide an alternative representation of the data to be mined. The primitives in this case are defined not by attribute values, but by granules of entities. Using granules, one can form collections of objects that share the same features (e.g. attribute values). A granular representation of the data facilitates the generation of patterns. Since elementary granules reveal basic features hidden in data, they are used as atoms in the construction of patterns.

A granular computing based approach for data mining can be defined using a description language for information granules [11]. The data is primarily stored in an information system. However, it can also be represented in a granular form that is constructed using atomic formulas of the language. Each information granule is characterized by a pair of syntax and semantics. The former is defined by a formula constructed using attributes and values that describe objects, whereas the latter is understood as the set of objects that satisfy the formula. Information granules can be used to express patterns hidden in the data.

The above approach can be upgraded to a relational case by expanding the description language by additional atomic formulas that identify pairs of joinable objects from different database tables [12], [13]. The data in such an approach is represented by a compound information system that is a combination of particular information systems (each corresponding to one database table). These systems are combined according to the connections that occur among database tables.

The compound information system can be directly mined or can be beforehand transformed into a granular form. The former facilitates the construction of patterns over many tables since the connection among tables are included in the system; however, elementary granules that show objects sharing the same features are not contained. The latter, in turn, includes elementary granules (each associated with one table or with two tables to show the connection between them) but the construction of relational patterns over the description language requires granules to be defined so that each of them is associated with all tables under consideration.

To construct a relational data representation that is more coherent and useful for pattern discovery, relation-based granules are proposed in this study. They are formed using relations that join relational information granules with their features. They are used to represent both the data and patters. Relation-based granules are more informative than the granules based on which they are constructed. They include information about how a given granule can alternatively be joined with another from a different information system. The relations used to represent data are fundamental components of patterns. Since the relations express basic features of objects, the process of the generation of patterns can be speeded up. Furthermore, the structure of relation-based granules facilities the formation of more advanced conditions. They correspond to those that can be formed by using aggregation functions in relational databases. Therefore, patterns constructed based on such relations show richer knowledge than standard relational patterns.

The organization of the remaining part of the paper is as follows. Section 2 introduces compound information systems and description languages defined for relational information granules. Sections 3 and 4 propose relation-based granules and show their application to the representation of relational data and patterns. In Section 5, the approach is evaluated by analyzing its time complexity. Section 6 compares the approach with other related approaches. Finally, Section 7 concludes the study.

Section snippets

Description languages for relational information granules

This section introduces the definitions of information systems and their description languages defined for relational data. The languages are expansions of the description language defined for data stored in the standard information system [11].

Throughout this paper the following running example will be used.

Example 1

Given a database for the customers of a grocery store.

customer
Id	Name	Age	Gender	Income	Class
1	Adam Smith	30	Male	1500	Yes
2	Tina Jackson	33	Female	2500	Yes
3	Ann Thompson	30	Female	1800	No
4	Susan Clark	30	Female

Relation-based granules

This section proposes an expansion of the description languages by defining relation-based granules. To distinguish them from those defined in the previous section, we will call the latter formula-based granules.

A relation is constructed based on a formula to show not only the objects that satisfy the formula but also the values of attributes that characterize the objects. More precisely, attribute values and an object are in the relation if and only if the object satisfies the formula

Relational data and patterns represented by relation-based granules

This section shows how relation-based granules can be used for representing both the relational data and patterns.

The approach's complexity

This section evaluates the cost of the construction of representations of relational data and patterns using proposed relations. It also compares the approach proposed in this paper with a standard one in terms of complexity. The latter is understood as an approach where the database tables are mined directly, i.e. the data is not transformed into an alternative representation (except for using typical data mining transformation techniques such as e.g. discretization).

Table 1 includes the cost

Related works

This section compares the proposed approach with others that use a granular computing environment to handle relational data.

The information system proposed in [15], called a sum of information systems, is the pair of the universe (the Cartesian product of the universes of the information systems, each corresponding to one table) and the attribute set (the collection of attributes from the attribute sets of the information systems). A constrained version of this system allows only tuples of

Conclusions

This paper has expended description languages defined for relational information granules. The expansion includes formula-based relations designed for representing relational databases and patterns to be discovered. The main advantages of the approach can be summarized as follows.

1.
The cost of generation of relational patterns can be decreased compared with that when the patterns are generated directly from the database. In fact, relations that represent the database consist of atomic formulas to

Acknowledgements

The author thanks anonymous reviewers for their valuable suggestions which have helped to improve the paper.

The project was funded by the National Science Center awarded on the basis of the decision number DEC-2011/01/D/ST6/07225.

References (17)

T.Y. Lin
Introduction to special issues on data mining and granular computing
Int. J. Approx. Reason.
(2005)
P. Hońko
Association discovery from relational data via granular computing
Inform. Sci.
(2013)
Y.Y. Yao
Granular computing: basic issues and possible solutions
A. Bargiela et al.
Toward a theory of granular computing for human-centered information processing
IEEE Trans. Fuzzy Syst.
(2008)
A. Bargiela et al.
Granular Computing: An Introduction
(2003)
T.Y. Lin et al.
Special issue on granular computing and data mining
Int. J. Intell. Syst.
(2004)
W. Pedrycz et al.
Handbook of Granular Computing
(2008)
S.K. Pal
Granular mining and rough-fuzzy pattern recognition: a way to natural computation
IEEE Intell. Inform. Bull.
(2012)

There are more references available in the full text version of this article.

Cited by (2)

Recent granular computing frameworks for mining relational data
2019, Artificial Intelligence Review
Cloud data processing using granular based weighted concept lattice and Hamming distance
2018, Computing

View full text

Relation-based granules to represent relational data and patterns

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Description languages for relational information granules

Relation-based granules

Relational data and patterns represented by relation-based granules

The approach's complexity

Related works

Conclusions

Acknowledgements

Int. J. Approx. Reason.

Inform. Sci.

Granular computing: basic issues and possible solutions

Toward a theory of granular computing for human-centered information processing

IEEE Trans. Fuzzy Syst.

Granular Computing: An Introduction

Special issue on granular computing and data mining

Int. J. Intell. Syst.

Handbook of Granular Computing

Granular mining and rough-fuzzy pattern recognition: a way to natural computation

IEEE Intell. Inform. Bull.