Relation-based granules to represent relational data and patterns
Graphical abstract
Introduction
Granular computing provides a general framework for problem solving. It covers theories, methodologies, techniques, and tools that make use of granules [1]. They are, in general, understood as collections formed in the process of a semantically meaningful grouping of elements based on their indistinguishability, similarity, proximity or functionality [2].
The main idea of granular computing is to make it possible to view the same problem at many levels of granularity. Switching between different levels enables to choose the representation best matching to the problem. A more specific level granularity may reveal more detailed information, whereas a more abstract level granularity may improve a problem solution thanks to omitting irrelevant details.
A granular computing approach has successfully found application in data mining (e.g. [3], [4], [5], [6], [7]). Some attempts have also been made to adapt the idea of granulation to mining data stored in a relational structure (e.g. [8], [9], [10]).
Granulation tools can provide an alternative representation of the data to be mined. The primitives in this case are defined not by attribute values, but by granules of entities. Using granules, one can form collections of objects that share the same features (e.g. attribute values). A granular representation of the data facilitates the generation of patterns. Since elementary granules reveal basic features hidden in data, they are used as atoms in the construction of patterns.
A granular computing based approach for data mining can be defined using a description language for information granules [11]. The data is primarily stored in an information system. However, it can also be represented in a granular form that is constructed using atomic formulas of the language. Each information granule is characterized by a pair of syntax and semantics. The former is defined by a formula constructed using attributes and values that describe objects, whereas the latter is understood as the set of objects that satisfy the formula. Information granules can be used to express patterns hidden in the data.
The above approach can be upgraded to a relational case by expanding the description language by additional atomic formulas that identify pairs of joinable objects from different database tables [12], [13]. The data in such an approach is represented by a compound information system that is a combination of particular information systems (each corresponding to one database table). These systems are combined according to the connections that occur among database tables.
The compound information system can be directly mined or can be beforehand transformed into a granular form. The former facilitates the construction of patterns over many tables since the connection among tables are included in the system; however, elementary granules that show objects sharing the same features are not contained. The latter, in turn, includes elementary granules (each associated with one table or with two tables to show the connection between them) but the construction of relational patterns over the description language requires granules to be defined so that each of them is associated with all tables under consideration.
To construct a relational data representation that is more coherent and useful for pattern discovery, relation-based granules are proposed in this study. They are formed using relations that join relational information granules with their features. They are used to represent both the data and patters. Relation-based granules are more informative than the granules based on which they are constructed. They include information about how a given granule can alternatively be joined with another from a different information system. The relations used to represent data are fundamental components of patterns. Since the relations express basic features of objects, the process of the generation of patterns can be speeded up. Furthermore, the structure of relation-based granules facilities the formation of more advanced conditions. They correspond to those that can be formed by using aggregation functions in relational databases. Therefore, patterns constructed based on such relations show richer knowledge than standard relational patterns.
The organization of the remaining part of the paper is as follows. Section 2 introduces compound information systems and description languages defined for relational information granules. Sections 3 and 4 propose relation-based granules and show their application to the representation of relational data and patterns. In Section 5, the approach is evaluated by analyzing its time complexity. Section 6 compares the approach with other related approaches. Finally, Section 7 concludes the study.
Section snippets
Description languages for relational information granules
This section introduces the definitions of information systems and their description languages defined for relational data. The languages are expansions of the description language defined for data stored in the standard information system [11].
Throughout this paper the following running example will be used.
Example 1 Given a database for the customers of a grocery store.customer Id Name Age Gender Income Class 1 Adam Smith 30 Male 1500 Yes 2 Tina Jackson 33 Female 2500 Yes 3 Ann Thompson 30 Female 1800 No 4 Susan Clark 30 Female
Relation-based granules
This section proposes an expansion of the description languages by defining relation-based granules. To distinguish them from those defined in the previous section, we will call the latter formula-based granules.
A relation is constructed based on a formula to show not only the objects that satisfy the formula but also the values of attributes that characterize the objects. More precisely, attribute values and an object are in the relation if and only if the object satisfies the formula
Relational data and patterns represented by relation-based granules
This section shows how relation-based granules can be used for representing both the relational data and patterns.
The approach's complexity
This section evaluates the cost of the construction of representations of relational data and patterns using proposed relations. It also compares the approach proposed in this paper with a standard one in terms of complexity. The latter is understood as an approach where the database tables are mined directly, i.e. the data is not transformed into an alternative representation (except for using typical data mining transformation techniques such as e.g. discretization).
Table 1 includes the cost
Related works
This section compares the proposed approach with others that use a granular computing environment to handle relational data.
The information system proposed in [15], called a sum of information systems, is the pair of the universe (the Cartesian product of the universes of the information systems, each corresponding to one table) and the attribute set (the collection of attributes from the attribute sets of the information systems). A constrained version of this system allows only tuples of
Conclusions
This paper has expended description languages defined for relational information granules. The expansion includes formula-based relations designed for representing relational databases and patterns to be discovered. The main advantages of the approach can be summarized as follows.
- 1.
The cost of generation of relational patterns can be decreased compared with that when the patterns are generated directly from the database. In fact, relations that represent the database consist of atomic formulas to
Acknowledgements
The author thanks anonymous reviewers for their valuable suggestions which have helped to improve the paper.
The project was funded by the National Science Center awarded on the basis of the decision number DEC-2011/01/D/ST6/07225.
References (17)
Introduction to special issues on data mining and granular computing
Int. J. Approx. Reason.
(2005)Association discovery from relational data via granular computing
Inform. Sci.
(2013)Granular computing: basic issues and possible solutions
- et al.
Toward a theory of granular computing for human-centered information processing
IEEE Trans. Fuzzy Syst.
(2008) - et al.
Granular Computing: An Introduction
(2003) - et al.
Special issue on granular computing and data mining
Int. J. Intell. Syst.
(2004) - et al.
Handbook of Granular Computing
(2008) Granular mining and rough-fuzzy pattern recognition: a way to natural computation
IEEE Intell. Inform. Bull.
(2012)
Cited by (2)
Recent granular computing frameworks for mining relational data
2019, Artificial Intelligence Review