1 Introduction

The concept of testor (initially test) was created by S.V. Yablonskii as a tool for solving problems of control and diagnosis of faults in contact networks and combinatorial circuits. From these early works [2, 22] derived a research line that is primarily concerned with this kind of problems. In the middle of the sixties of the past century, the methods of Testor Theory were extended to prediction problems in such domains as geology and medicine [9]. Later Testor Theory has been developed in both directions, tools and applications.

The primary concept of testor (and typical testor) has had numerous generalizations and adaptations to different environments [17]. In 1980, R.S. Goldman published an article in Russian introducing a type of testor which he called fuzzy testor [14]. The main characteristic of this kind of testor is that comparisons between two values of the same attribute take real values in the interval [0,1], in such a way that 0 is interpreted as equality (or minimal difference) and 1 is interpreted as the maximum possible difference. An exhaustive review of the literature published since then allows us to ensure that this fuzzy testor is not well known and the study of its properties and possibilities has been very limited.

Undoubtedly, when one refers to fuzziness, there is unanimity in understanding that one refers to the fuzzy set theory and the derived logic introduced by Zadeh in 1965 [25]. However, in many problems, sources of fuzziness are varied and even more so their treatment. Classification problems are no exception.

In the rough set theory, the fuzzy approach has also been studied; so for example, to address the issues of feature selection and attribute reduction, there are some works on fuzzy reducts. Since the publication of the work of Dubois and Prade [10], there have been many studies regarding basic concepts of Rough Set Theory from the perspective of fuzziness. Fuzzy attributes [5, 27], fuzzy positive region [6, 7], fuzzy decision trees [11, 12], fuzzy rules [19, 20], fuzzy discernibility matrix [27], fuzzy decision reduct to certain degree [8] and fuzzy subset of attributes [26] are some examples that show the variety of studies that have been made related to fuzziness within the context of Rough Set Theory.

The notion of reduct plays an essential role in rough set analysis. Previously, the concepts of testor and reduct have been related [3, 15, 16]. In this paper, we revisit the old concept of Goldman’s fuzzy testor re-studying it from the Rough Set Theory point of view and particularly we analyze its relation with the concept of reduct from the conceptual approach developed by Y.Y. Yao in [23], as well as its application to supervised classification problems.

This document is organized as follows: In Sect. 2, we present some basic concepts regarding Goldman’s fuzzy testor and the conceptual definition of reduct. In Sect. 3, we show that a Goldman’s typical fuzzy testor can be defined as a reduct. In Sect. 4, we exemplify a practical use of Goldman’s typical fuzzy testors to build a rule based classifier and additionally, we illustrate their practical usefulness through a case study. Our concluding remarks are summarized in Sect. 5.

2 Basic Concepts

The basic representation of data in Rough Set Theory is an information system, which is a table with rows representing objects while columns specify attributes. Formally, an information system is defined as a 4-tuple \(IS = (U, A^*_{t}=A_{t}\cup \{d\},\) \(\{ V_{a}\;|\; a \in A^*_{t}\},\{ I_{a}\;|\; a \in A^*_{t}\})\), where U is a finite non-empty set of objects, \(A^*_t\) is a finite non-empty set of attributes, d denotes the decision attribute, \(V_a\) is a non-empty set of values of \(a\in A^*_t\), and \(I_a\) : \(U \rightarrow V_a\) is an information function that maps an object of U to exactly one value in \(V_a\).

Let us define for each attribute a in \(A_{t}\) a real valued dissimilarity function \(\varphi _a: V_ a\times V_a \rightarrow [0,1]\) in such a way that 0 is interpreted as equality (or minimal difference) and 1 is interpreted as the maximum possible difference.

Applying these comparison functions to all possible pairs of objects belonging to different classes in IS, a [0,1]-pairwise discernibility matrix can be built. We will denote such discernibility matrix as DM. We assume that IS is consistent, that is, there is not a pair of indiscernible objects belonging to different classes, this means that there will be no complete row of zeros in DM.

Example 1 illustrates an information system and its corresponding [0,1]-pairwise discernibility matrix, applying the heterogeneous distance function used by Giraud-Carrier et al. [13] which is normally associated to the Heterogeneous Euclidean-Overlap Metric function HEOM [21]. This function defines the distance between two values x and y of a given attribute a as:

$$\begin{aligned} {\varphi }_{a}(x,y)=\left\{ \begin{array}{l@{\quad }l} 1 &{} if \; x \;or\; y\; is\; unknown, \; else \\ overlap_a(x,y) &{} if \; a\; is\; nominal, \; else \\ rn_{diff_a}(x,y) &{} otherwise \end{array} \right. \mathrm{\ \ } \end{aligned}$$
(1)

being

$$\begin{aligned} {overlap}_{a}(x,y)=\left\{ \begin{array}{l} {0 \; \; \; if \; \textit{x} = \textit{y}} \\ {1 \; \; \; otherwise} \end{array} \right. \mathrm{\ \ } \end{aligned}$$

and

$$\begin{aligned} {rn_{diff_a}(x,y)}=\frac{\mid x - y\mid }{max_a - min_a} \end{aligned}$$

Example 1

M shows an example of an information table, d represents the decision attribute. DM is the corresponding [0,1]-pairwise discernibility matrix. DM is obtained applying \({\varphi }_{a}(x,y)\) in (1), attribute by attribute, to each pair of objects in M belonging to different classes.

figure a

From the second row of DM, for example, we can state that there exists a pair of objects (belonging to different classes in M) which are discernible in grade 0.81 regarding \(a_2\). The same pair of objects are indiscernible regarding \(a_3\). These objects are the corresponding to the first and third rows of M.

From the example, we can also affirm that all objects belonging to different classes in M are discernible in a certain grade regarding attributes \(a_2\) (at least 0.26) and \(a_4\) (at least 0.55). Although \(a_1\) and \(a_3\) do not fulfill the same property.

We will introduce the notation \(\mu _{f_i}(a_{j})\) to refer to the value corresponding to row \(f_i\) and to the column associated to attribute \(a_{j}\) in DM. So, for example \(\mu _{f_4}(a_{2})=0.39\), \(\mu _{f_6}(a_{1})=0\), \(\mu _{f_1}(a_{3})=1\).

Definition 1

(Goldman’s fuzzy testor). Let \(A_t= \{a_1, a_2, ..., a_n\}\) and let \( T=\{a_{r_1}|\mu _{r_1}, a_{r_2}|\mu _{r_2}, ..., a_{r_s}|\mu _{r_s}\}\) be a fuzzy subset of \(A_t\) such that \(\forall p \in \{1,2,...,s\}\) \(\quad \mu _{r_p} \ne 0\). T is a Goldman’s fuzzy testor with respect to IS if

\(\forall f_i \in DM\) (being \(f_i\) the i-th row in DM) \(\exists a_{r_p}|\mu _{r_p} \in T\) such that \(\mu _{r_p} \le \mu _{f_i}(a_{r_p})\).

We will denote the set of all Goldman’s fuzzy testors of an information system by \(\varPsi \).

According to the above definition, a Goldman’s fuzzy testor is a fuzzy subset of attributes such that the set of attributes belonging to it, with the corresponding membership degrees, are able to discern all pairs of objects belonging to different classes.

Definition 2

(Goldman’s typical fuzzy testor). \( T=\{a_{r_1}|\mu _{r_1}, a_{r_2}|\mu _{r_2},..., a_{r_s}|\mu _{r_s}\}\), (\(\mu _{r_p} \ne 0; \; p \in \{1,2,...,s\}\)) is a Goldman’s typical fuzzy testor with respect to IS if:

  1. (i)

    T is a Goldman’s fuzzy testor with respect to IS.

  2. (ii)

    \(\forall p \in \{1,2,...,s\}, T \setminus \{a_{r_p}|\mu _{r_p}\}\) is not a Goldman’s fuzzy testor with respect to IS.

  3. (iii)

    \(\forall T'\) such that \(T \subset T'\) and \(supp(T) = supp(T')\) (it means that \(\forall p \in \{1,2,...,s\}\) \(\quad \mu _{r_p} \le \mu '_{r_p}\) and for at least one index the inequality is strict) \(T'\) is not a Goldman’s fuzzy testor with respect to IS.

We will denote the set of all Goldman’s typical fuzzy testors of an information system by \(\varPsi ^*\).

Condition (ii) means that if a fuzzy singleton \(\{a_{r_p}|\mu _{r_p}\}\) is eliminated from a Goldman’s typical fuzzy testor, the resulting subset is no longer a Goldman’s fuzzy testor. Condition (iii) means that if the membership degree of some attribute to T is increased, then T stops being a Goldman’s fuzzy testor.

Example 2

Regarding M in Example 1, \(\{a_4|0.55\}\), \(\{a_2|0.81, a_4|0.58\}\) and \(\{a_1|1.0\), \(a_2|0.81, a_4|0.94\}\) are examples of Goldman’s typical fuzzy testors.

\(\{a_1|1.0, a_2|0.48, a_3|1.0, a_4|0.96\}\) and \(\{a_1|1.0, a_2|0.39, a_4|0.96\}\) are both Goldman’s fuzzy testors but not typical, because \(\{a_1|1.0, a_2|0.48, a_4|0.96\}\) is a Goldman’s typical testor.

In the classic formulation, typical testors are minimal by inclusion, this is also quite common for different extensions of the primary concept of testor, see for example [17]. However, Goldman’s fuzzy testors do not keep the property of minimality by inclusion. Therefore, in order to define Goldman’s typical fuzzy testors in a similar way, we introduce the following partial order.

Definition 3

(partial order \(\preceq \)). Let us consider the following binary relation over the set \(\mathfrak {P}(A)\) of all fuzzy subsets of A. Let \(t_1\), \(t_2 \in \mathfrak {P}(A)\), then

\(t_1 \preceq t_2 \Leftrightarrow (t_1 \cap t_2) \cup ((supp(t_1) \setminus supp(t_2))\cap t_1) \cup ((supp(t_2) \setminus supp(t_1))\cap t_2) = t_2\)

It is not difficult to prove that \(\preceq \) is a partial order. For saving space, we omit the proof.

Proposition 1

\(T \in \varPsi \) is a Goldman’s typical fuzzy testor with respect to IS if T is a minimal element for the relation \(\preceq \) defined over \(\varPsi \).

Proof

Let T be a minimal element in \(\varPsi \) for the relation \(\preceq \) and let us suppose that T is not a Goldman’s typical fuzzy testor, then either (a) we can eliminate any fuzzy singleton from T or (b) we can increase the membership degree of any attribute belonging to supp(T) (or both) and T continues being a Goldman’s fuzzy testor. Let us suppose (a) is fulfilled. Let \(T=\{a_{r_1}|\mu _{r_1}, a_{r_2}|\mu _{r_2}, ..., a_{r_s}|\mu _{r_s}\} \in \varPsi \) and for simplicity suppose that \(T'=\{a_{r_2}|\mu _{r_2}, ..., a_{r_s}|\mu _{r_s}\}\) is also a Goldman’s fuzzy testor, \(T' \in \varPsi \). Then, \((T' \cap T)=T'\), \(((supp(T) \setminus supp(T'))\cap T)=\emptyset \) and \((supp(T') \setminus supp(T)= \{a_{r_1}\}\) then \(((supp(T') \setminus supp(T))\cap T')=\{a_{r_1}|\mu _{r_1}\}\) and hence \((T' \cap T) \cup ((supp(T') \setminus supp(T))\cap T') \cup ((supp(T) \setminus supp(T'))\cap T) = T\), which means that \(T' \preceq T\) which contradicts that T is a minimal element in \(\varPsi \).

Now, let us suppose (b) is fulfilled. Let \(T=\{x_{r_1}|\mu _{r_1}, x_{r_2}|\mu _{r_2}, ..., x_{r_s}|\mu _{r_s}\} \in \varPsi \) and for simplicity suppose that \(T'=\{x_{r_1}|\nu _{r_1},x_{r_2}|\mu _{r_2}, ..., x_{r_s}|\mu _{r_s}\}\) with \(\nu _{r_1} > \mu _{r_1}\), is also a Goldman’s fuzzy testor, \(T' \in \varPsi \). In this case \((T' \cap T)=T\), and \(supp(T') = supp(T)\) hence, \((T' \cap T) \cup ((supp(T') \setminus supp(T))\cap T') \cup ((supp(T) \setminus supp(T'))\cap T) = T\), which once again contradicts that T is a minimal element in \(\varPsi \).

3 Goldman’s Fuzzy Reducts

In this section, we follow Y.Y. Yao’s approach to the conceptual formulation of the Rough Sets Theory, namely the conceptual formulation of the concept of reduct [23]. The notion of reduct plays an essential role in rough set analysis. In order to formulate an in-depth conceptual understanding of reducts, Y.Y. Yao searched for an explanation and interpretation of the reduct concept in a wider context. Given a set of attributes, the question is whether there exists a subset that serves for the same purpose as that of the entire set. Such a subset may be considered as a reduct of the original set of attributes. In [24], a conceptual definition of a reduct of a set of attributes is developed based on this intuitive understanding.

Suppose A is a finite set and \(\mathfrak {p}(A)\) is the power set of A. Let \(\mathbb {P}\) denote a unary predicate on subsets of A. For \(S \in \mathfrak {p}(A)\), \(\mathbb {P}(S)\) stands for the statement that subset S has property \(\mathbb {P}.\) The values of \(\mathbb {P}\) are computed by an evaluation \(\mathfrak {e}\) with reference to certain available data, for example, an information system. For a subset \(S \in \mathfrak {p}(A)\), \(\mathbb {P}(S)\) is true if S has property \(\mathbb {P}\), otherwise, it is false. In this way, a conceptual definition of reduct is given based on an evaluation \(\mathfrak {e}\) as follows.

Definition 4

(Subset-based definition). Given an evaluation \(\mathfrak {e}\) of \(\mathbb {P}\), if a subset \(R \subseteq A\) fulfills the following conditions:

  1. (a)

    existence: \(\mathbb {P}_\mathfrak {e}(A)\);

  2. (b)

    sufficiency: \(\mathbb {P}_\mathfrak {e}(R)\);

  3. (c)

    minimization: \(\forall B\subseteq R (\lnot {\mathbb {P}_\mathfrak {e}(B))}\);

we call R a reduct of A.

These three conditions reflect the fundamental characteristics of a reduct. Condition of existence (a) ensures that a reduct of S exists, in the great majority of the studies it is explicitly assumed that the whole set A must have the property \(\mathbb {P}\), and then A itself is a candidate to be a reduct. Condition of sufficiency (b) expresses that a reduct R of A is sufficient for preserving the property \(\mathbb {P}\) of A. Condition of minimization (c) expresses that a reduct is a minimal subset of A having property \(\mathbb {P}\) in the sense that none of the proper subsets of R has the property. Since it is needed to check all the subsets of R to verify the Definition 4, Y.Y. Yao called this definition a subset-based definition [23].

For our convenience, we consider \(\mathfrak {P}(A)\) as the set of all subsets of A, including fuzzy subsets, instead of the classical power set \(\mathfrak {p}(A)\). Besides, we consider the partial order \(\preceq \) previously defined instead of the classic inclusion.

Let \(A= \{a_1, a_2, ..., a_n\}\) be as before the set of condition attributes in IS, and \(S=\{a_{r_1}|\mu _{r_1}\), \(a_{r_2}|\mu _{r_2}\), ..., \(a_{r_s}|\mu _{r_s}\}\) a fuzzy subset of A, i.e. \(\{a_{r_1}, a_{r_2}, ..., a_{r_s}\} \subseteq A\) and \(0< \mu _{r_p} \le 1\), \(p=\{1,2,...s\}\). Let \(\mu ^o_i = min \{\varphi _i(u,v) \ne 0\}\) for all pair of objects u and v belonging to different classes in IS, \(1 \le i \le n\), and let \(A^o=\{a_1|\mu ^o_1\), \(a_2|\mu ^o_2\), ..., \(a_n|\mu ^o_n\}\). \(A^o\) is a fuzzy subset of A built considering the minimum among the nonzero values of each column of the [0,1]-pairwise discernibility matrix DM associated to IS.

Example 3

For the information table in Example 1, \(A^o=\{a_1|1.0\), \(a_2|0.26\), \(a_3|1.0\), \(a_4|0.55\}\).

Let us consider the following predicate \(\mathbb {P}\):

\(\mathbb {P}(S) \equiv \forall u, v \in IS [I_d(u) \ne I_d(v)] \rightarrow \exists x_{r_p}|\mu _{r_p} \in S\) such that \(\mu _{r_p} \le \varphi _{r_p}(u,v))\).

Notice that \(A^o\) has the property \(\mathbb {P}\) since by construction \(\forall u, v \in IS\) with \(I_d(u) \ne I_d(v)\) \([\mu _{r^o_p} \le \varphi _{r_p}(u,v)]\) unless \( min \{\varphi _{r_p}(u,v)\}\)= 0, \(\forall p \in \{1,...,n\}\), but this is not possible since we have assumed that IS is consistent. Then, we have \(\mathbb {P}_\mathfrak {e}(A^o)\).

On the other hand, let \( T=\{a_{r_1}|\mu _{r_1}, a_{r_2}|\mu _{r_2}, ..., a_{r_s}|\mu _{r_s}\}\) be a Goldman’s fuzzy testor, then from Definition 1 it follows that T also has the property \(\mathbb {P}\), i.e. \(\mathbb {P}_\mathfrak {e}(T)\). In fact, T has the property \(\mathbb {P}\) iff T is a Goldman’s fuzzy testor. Finally, taking into account that minimal elements by \(\preceq \) in \(\varPsi \) are Goldman’s typical fuzzy testors, it follows that \(\forall B \preceq T \; [\lnot {\mathbb {P}_\mathfrak {e}(B)]}\).

Now, we can introduce the following definition.

Definition 5

Let \(A^o\), \(\mathbb {P}\) and \(\preceq \) as above. Let T be a Goldman’s typical fuzzy testor. Considering that:

  1. (a)

    existence: \(\mathbb {P}_\mathfrak {e}(A^o)\);

  2. (b)

    sufficiency: \(\mathbb {P}_\mathfrak {e}(T)\);

  3. (c)

    minimization: \(\forall B \preceq T [\lnot {\mathbb {P}_\mathfrak {e}(B)]}\);

T satisfies Definition 4 so we can say that T is a Goldman’s fuzzy reduct.

4 Goldman’s Fuzzy Reducts for Supervised Classification

Goldman’s fuzzy reducts would be just a theoretical curiosity if they lacked practical use. In this section, we present an example of their application to a problem of supervised classification. Let IS be an information system, as before. Let \({u_1, u_2, ..., u_m}\) be the objects in U and \({a_1, a_2, ..., a_n}\) the attributes in \(A_t\). In a supervised classification problem, U represents the training sample. Let d be the decision attribute, and \({c_1, c_2, ..., c_s}\) be the values that d takes, i.e., \({c_1, c_2, ..., c_s}\) are the class labels, and \(I_d(u)\) denotes the class u belongs to. Let v be an object to be classified and \(\varPsi ^*\) be the set of all Goldman’s fuzzy reducts of IS. We define a rule based classifier based on the set of Goldman’s fuzzy reducts as follows:

Proposed classifier

v: object to be classified

\(supp_1 =supp_2 = ... supp_s =0\) (initializing the support counter for each class)

   for each \(T=\{a_{r_1}|\mu _{r_1}, a_{r_2}|\mu _{r_2}, ..., a_{r_s}|\mu _{r_s}\} \in \varPsi ^*\)

      for each \(u_i \in U\)

\(\qquad \quad \) if \([{\varphi }_{a_{r_1}}(v,u_i) \le \mu _{r_1}] \vee [{\varphi }_{a_{r_2}}(v,u_i) \le \mu _{r_2}] \vee ... [{\varphi }_{a_{r_s}}(v,u_i) \le \mu _{r_s}]\)

\(\qquad \quad \) and \(I_d(u_i) = c_k\) then \(supp_k = supp_k +1\)

if \(supp_k > supp_w\) for all \(w \ne k\) then v is assigned to class \(c_k\).

This algorithm behaves like a rule based classifier, where for each Goldman’s fuzzy reduct and each object in the training sample a rule is built and evaluated. If the rule built for an object in the i-th class is fulfilled, the support for this class is increased. In a general case one should take into account issues such as if the training sample is imbalanced. To show the performance of the algorithm we will avoid this issue by considering a balanced training sample.

For evaluating the practical usefulness of this classification algorithm based on Goldman’s fuzzy reducts, we study its behavior over the Iris database [18]. As it is widely known, the Iris database consists of 3 classes with 50 objects each, described in terms of 4 real-valued attributes. We built five training sets by randomly splitting the data, taking 70 % of the objects for training and the remaining 30 % for testing. Table 1(a) contains the confusion average matrix; for our proposed classifier, the mean percentage of correct classification was 94.67 %. It is important to comment that unlike other rule based classifiers, our classifier can work directly with the original data.

For comparing our results, in Table 1(b), we present the average classification results over the same five partitions using the RSES software [1], which is a well known system within the community of Rough Set Theory. For each case, classic reducts and rules based on them were calculated and evaluated.

Another experiment was done discretizing the continuous data to crisp data. For this, we used the discretization proposed by F. Coenen [4]. Then we computed the new set of reducts and rules using RSES. Results of this third experiment are shown in Table 1(c).

As it can be seen, both classification results based on classic reducts, with and without discretization, were worse than the classification results obtained by the proposed algorithm based on Goldman’s fuzzy reducts.

Table 1. Confusion matrices for Iris data applying three methods

5 Conclusions

In this paper, we introduced a new type of fuzzy reducts inspired by the Goldman’s typical fuzzy testors. These reducts are fuzzy in the sense that they are fuzzy subsets of the set of attributes in an information system. Theoretically we show that this new type of reducts can be interpreted as reducts in the sense of the subset-based conceptual definition of reducts given by Y.Y. Yao [23].

Based on our proposed conceptualization of Goldman’s fuzzy reducts, we introduce a new classifier based on rules. An advantage of the proposed method is that we can directly work with the original data without preprocessing continuous data by fuzzification or discretization.

The results achieved in our case study suggest that, as future work, it makes sense to deepen the study of these new reducts both theoretically and experimentally. For example, it would be interesting to study the case when data are described in terms of different types of attributes (nominal, Boolean, real, etc.). Likewise, it would be interesting to study the behavior of the classifier if only a proper subset of the Goldman’s fuzzy reducts are used instead of using all of them, and how to select this subset. The problem of finding efficient algorithms for calculating Goldman’s fuzzy reducts is also a research line.

We conjecture that, using Goldman’s fuzzy reducts, we could obtain robust classifiers for datasets with real-valued attributes, but this point remains to be confirmed by further research.