Properties of information sets and information processing with an application to face recognition

Sayeed, Farrukh; Hanmandlu, Madasu

doi:10.1007/s10115-016-1017-x

Properties of information sets and information processing with an application to face recognition

Regular Paper
Published: 09 January 2017

Volume 52, pages 485–507, (2017)
Cite this article

Download PDF

Knowledge and Information Systems Aims and scope Submit manuscript

Properties of information sets and information processing with an application to face recognition

Download PDF

Farrukh Sayeed¹ &
Madasu Hanmandlu²

314 Accesses
20 Citations
Explore all metrics

Abstract

This paper presents the properties of information sets that help derive local features from a face when partitioned into windows and devises the information rules from the generalized fuzzy rules for information processing that helps match the unknown test face with the known for authenticating a user. information set is constituted from the information values that result from representing the uncertainty in a type-1 fuzzy set by Hanman–Anirban entropy function. The information values are shown to be the products of information sources (gray levels) in a window and their membership function values. The Hanman filter (HF) is devised to modify the information values using a cosine function whereas the Hanman transform (HT) is devised to evaluate the information source values based on the information obtained on them. Three classifiers, namely the inner product classifier, normed error classifier, and Hanman classifier are formulated. The two feature types based on HF and HT are tested on the AT&T (ORL) database, which contains pose variations in the face images and two other face databases: Indian face Database (IIT Kanpur) and UMIST (Sheffield) using new as well as known classifiers like Euclidean distance- based, Bayesian, and support vector machine classifiers.

Three information set-based feature types for the recognition of faces

Article 20 March 2015

Farrukh Sayeed & Madasu Hanmandlu

A Framework for Face Recognition Based on Fuzzy Minimal Structure Oscillation and Independent Component Analysis

An Evaluation of Fuzzy Measure for Face Recognition

1 Introduction

A great deal of effort has been spent in representing the uncertainty in the type-1 fuzzy sets [1,2,3]. This uncertainty called fuzziness of a type-1 fuzzy set is essentially an attempt at measuring the lack of distinction between a set and its negation as suggested by Yager [2, 3]. As we are aware any crisp set is deemed to have zero fuzziness, this suggestion amounts finding the difference between the uncertainty measure and the measure of specificity [4] of a fuzzy subset, which is related to the degree to which the set contains one and only one element.

In this work, we make an attempt to represent the uncertainty in a type-1 fuzzy set using Hanman–Anirban entropy function [5]. Before embarking on this let us see the need for expanding the scope of fuzzy sets in the realm of uncertainty representation.

1.1 The motivation for the uncertainty representation

The primary objective of this paper is to represent the uncertainty associated with the information source values (attribute values) in a fuzzy set called the possibilistic uncertainty. In the literature, only the probabilistic uncertainty in the probabilities of information source values is addressed. The fuzzy set theory has no provision to represent this uncertainty as it treats each information source value and its membership function value separately. This pair is an element of a fuzzy set. What we need a connecting link between the two to pave the way for uncertainty representation. The representation of uncertainty in the spatially varying and time varying information source values is another problem. To solve these problems, the uncertainty in the information source values forming a type-1 fuzzy set is sought to be quantified by the entropy function leading to the information set theory that expands the scope of a fuzzy set by assigning the role of an agent to the membership function.

Another motivation stems from the desire to analyze, modify, and evaluate the information set based features formulated in this paper on the real-life application like face recognition by developing classifiers based on information processing.

1.2 A brief literature on face recognition

We have chosen the face recognition as an important application of the proposed information set theory. An intuitive way to recognize a face is to extract the important features and compare them with the similar ones on other faces. Thus, a majority of the contributions made to the biometric-based recognition of the human face have focused on detecting prominent parts such as the eyes, nose, mouth, and head contour.

The methods in vogue in face recognition are broadly classified into: (i) holistic matching methods [6,7,8] in which the whole face acts as an input to the recognition system, (ii) global matching methods [9,10,11,12,13,14] which deal with local features such as the eyes, mouth, and nose and their statistics which are fed into the recognition system, and (iii) hybrid methods [15,16,17,18,19,20], where the recognition system makes use of both the local features as well as the whole facial region in the recognition system.

During the past two decades, appearance-based face recognition techniques such as principal component analysis (PCA) [8] and linear discriminant analysis (LDA) [21] have dominated the scene. These two algorithms seek a compact Euclidean space for efficient face recognition. A number of manifold learning algorithms attempt to unearth the geometric properties of high-dimensional feature spaces, including locality-preserving projections (LPP) [22], discriminant LPP (DLPP)[23], orthogonal DLPP (ODLPP) [24]and uncorrelated DLPP (UDLPP) [25]. Dai and Yuen [26] have introduced a regularized discriminant analysis (RDA) to address the problem of small sample size (sss) and to enhance the recognition performance of LDA. A parametric regularized locality-preserving projections (PRLPP) method is presented in [27] for face recognition. In this case, LPP space is regulated parametrically to tap the discriminant information from the whole feature space instead of the projection subspace of PCA as in [26]. To address the problem of small sample size (sss), direct LDA [28] and maximal margin criteria (MMC) [29] are advocated.

The 2D discriminant analysis has been increasingly used in PCA and LDA for face recognition giving rise to 2-D PCA [30, 31] and 2-D LDA [32, 33] and (2-D)$^{2}$ PCA [34]and (2-D)$^{2}$ LDA [35,36,37] are their offshoots. Wang et al. [38] put forward a bidirectional PCA plus LDA method $(\hbox {BDPCA}+\hbox {LDA})$ where LDA is performed in the BDPCA space. Non-uniform selection of Gabor features from faces with variations in pose and illumination is made [39] to capture their local statistics and to classify faces using PCA and LDA.

While PCA, LDA and their variants, neural network-based approaches and their variants, etc., are widely employed for face recognition, these are all holistic approaches and we will now survey some patch-based approaches.

The patch-based methods are more suitable for the recognition and analysis of the facial expressions. A few patch-based methods are discussed here. Patch-based models contain important information.

Hu et al. [40] address the problem of face recognition on small sample size (SSS). They have also implemented a patc-based CRC (collaborative representation-based classification) known as PCRC method that classifies the query sample by combining the recognition outputs of all the overlapped patches, each of which is collaboratively represented by the corresponding patches of training samples.

The organization of the paper is as follows: Sect. 2 presents the concept of information set and its properties. The formulation of Hanman filter (HF) and Hanman transform (HT) is given under higher form of information set in Sect. 3. The three new classifiers named inner product classifier (IPC), normed error classifier (NEC), and Hanman classifier (HC) are described as part of information processing in Sect. 4. The details of face databases are detailed out in Sect. 5. The conclusions are given in Sect. 6.

2 The concept of information set and its properties

The concept Information set was introduced by Hanmandlu as a guest editorial in [5] to enlarge the scope of a fuzzy set using the Hanman–Anirban entropy function [41]. The information set arises from representing the uncertainty in a fuzzy set. The features based on information sets are used for the ear based authentication in [42] and for the infrared face recognition in [43]. The concept is elaborated here along with presenting the properties of information set for wider foreseeable applications.

Consider a fuzzy set with its elements as pairs of gray levels $I=\left\{ {I_{{ ij}} } \right\} $ in a window and the corresponding membership function values $\left\{ {\mu _{{ ij}} } \right\} $ that represent the degree of association of gray levels to the set. Let g versus h(g) be the histogram plot where g stands for distinct gray levels and h(g) is the probability of occurrence of gray levels in the same window. The probabilistic uncertainty represented by the well-known entropy functions such as Shannon entropy function uses only the probability. The possibilistic uncertainty as represented by fuzzy entropy functions gives only the uncertainty in the membership function. In both these uncertainty representations, the gray levels are disregarded. Our concern here is to represent both probabilistic and possibilistic uncertainties by the same entropy function.

2.1 Derivation of information set

The multimedia components (i.e., an image, speech, text or video) are the information sources. After granualization, they are considered as the information sources. The granualization amounts to partitioning into windows in the case of an image or text, whereas it amounts to sampling into frames in the case of speech and video. In this paper, we have employed granulation as means of partitioning an image into different window sizes. In this context, the work on the information granules by Pedrycz and his co-researchers [44,45,46,47,48] merits a mention. In a broader sense, the information granules refer to the information sets that arise out of partitioning the information source values by the fuzzy equivalence relation. The partitioning of information source here is not based on the fuzzy equivalence relation, which is a different direction. Thus our granulation differs from that of Pedrycz and his co-researchers in [44,45,46,47,48] where they use it on the interval sets, type-2 sets, rough sets, etc., to generate different information granules.

The property values, attributes, or cues comprising the information sources contained in windows or frames form the fuzzy sets. The distribution of these information sources in the fuzzy sets requires an appropriate membership function. Let us consider the commonly used membership functions such as the exponential and Gaussian type functions. The exponential and Gaussian-type membership functions are given by

$$\begin{aligned} \mu _{{ ij}}^\mathrm{e} =\mathrm{e}^{-\left\{ {\frac{\left| {I_{{ ij}} -I\left( \mathrm{ref} \right) } \right| }{f_h^2 }} \right\} };\quad \mu _{{ ij}}^\mathrm{g} =\mathrm{e}^{-\left[ {\frac{I_{{ ij}} -I\left( {ref} \right) }{\sqrt{2}f_h }} \right] ^{2}} \end{aligned}$$

(1)

The fuzzifier $f_h^2 $ in (1) is devised by Hanmandlu and Jha [46], and it gives the spread of attribute values with respect to the chosen reference (symbolized as ref). It is defined as

$$\begin{aligned} f_h^2 =\frac{{\sum }_{i=1}^W {\sum }_{j=1}^W \left( {I\left( \mathrm{ref} \right) -I_{{ ij}} } \right) ^{4}}{{\sum }_{i=1}^W {\sum }_{j=1}^W \left( {I\left( \mathrm{ref} \right) -I_{{ ij}} } \right) ^{2}} \end{aligned}$$

(2)

One can take $I (\text {ref}) =I_{\mathrm{avg}}$ or $I_{\mathrm{max}}$ or $I_{\mathrm{med}}$ from the values in a window. It may be noted that the above fuzzifier gives more spread than is possible with variance as used in the Gaussian function.

Our objective here is to convert fuzzy sets into information sets. The Hanman–Anirban entropy function [41] can be used to do the conversion. Consider the non-normalized form of this 1D entropy, given by

$$\begin{aligned} H=\sum p\mathrm{e}^{-\left( {ap^{3}+bp^{2}+cp+d} \right) } \end{aligned}$$

(3)

where a, b, c and d are real-valued parameters, p is the probability and $ap^{3}+bp^{2}+cp+d$ is assumed positive.

The Hanman–Anirban entropy function is defined originally in terms of probabilities to provide a measure of probabilistic uncertainty. We now adapt to represent the possibilistic uncertainty by replacing the probabilities information source values which can be attribute/property values or gray levels in the case of an image considered as an information source as mentioned above. In the context of face image which is our chosen application of the proposed information set theory, $p=I_{{ ij}}$. So we need 2D form of (3), i.e.,

$$\begin{aligned} H=\sum \sum I_{{ ij}} \mathrm{e}^{-\left( {aI_{{ ij}}^3 +bI_{{ ij}}^2 +cI_{{ ij}} +d} \right) } \end{aligned}$$

(4)

Taking $a=0$, $b=0$, $c=\frac{1}{2f_h^2}$; $d=-\frac{I\left( \mathrm{ref} \right) }{2f_h^2}$ in (4) leads to:

$$\begin{aligned} H_\mathrm{e} =\sum \sum I_{{ ij}} \mu _{{ ij}}^\mathrm{e} \end{aligned}$$

(5)

Substituting the above parameters in (A.2) gives:

$$\begin{aligned} H_{\mathrm{Ne}} (I_{{ ij}} )=\frac{[H(I_{{ ij}} )-C_\mathrm{e} ]}{\lambda _\mathrm{e}} \end{aligned}$$

(6)

where $C_{\mathrm{g}}$, and $\lambda _\mathrm{e}$ are constants $H(I_{{ ij}})=H_\mathrm{e}$.

With a minor adaptation of (4) using the following parameters

$$\begin{aligned} a=0,\quad b=\frac{1}{2f_h^2 },\quad c=-\frac{2I(\mathrm{ref})}{2f_h^2 },\quad d=\frac{I^{2}(\mathrm{ref})}{2f_h^2} \end{aligned}$$

Takes the form as:

$$\begin{aligned} H_\mathrm{g} =\sum \sum I_{{ ij}} \mu _{{ ij}}^\mathrm{g} \end{aligned}$$

(7)

Substituting the above parameters in (A.2) gives:

$$\begin{aligned} H_{\mathrm{Ng}} (I_{{ ij}} )=\frac{[H(I_{{ ij}} )-C_\mathrm{g}]}{\lambda _\mathrm{g}} \end{aligned}$$

(8)

where $C_{\mathrm{g}}$, and $\lambda _e$ are constants and $H(I_{{ ij}} )=H_\mathrm{g}$.

For any membership function, there are three representations : (i) normalized information $H_\mathrm{N}=H-C/\lambda $, which says that the normalized information $H_{\mathrm{N}}$ results from the information H after it is corrected by C and scaled by $\lambda $. (ii) Corrected information $H_\mathrm{C}=H{-}C$ and (iii) the basic information $H=\sum \sum I_{{ ij}} \mu _{{ ij}}$. In case iii, we do not convert the information sources as probability by taking $I_{{ ij}} /\sum \sum I_{{ ij}} $, because this makes the information value too small to have any discriminating power.

In the real-life scenario, the received information is invariably pruned either by correcting or by normalizing. The information source values received by our senses are perceived as differing information values because the perception is different depending on how much importance we attach to the source. Like fuzzy variables, information values are also natural variables.

But for simplicity and for imparting the discriminating power, we choose the basic information as H, which is the product of the information source values and its membership function values. This product misleads the readers that the information sets are no way different from the fuzzy sets. By combining both the information source values and membership values provides us new paradigm to deal with the representation of uncertainty of either type, probabilistic or possibilistic.

Definition

Information set: Any fuzzy set defined over a universe of discourse can be converted into information set. Its elements, which are the products of information sources and their membership grades, are called information values. The Information set, comprising the information values, is expressed as:

$$\begin{aligned} \mathcal{H}\left( I \right) =\left\{ {I_{{ ij}} \mu _{{ ij}} } \right\} =\{H_{{ ij}} \left( I \right) \};\quad I\in \left[ {0,1} \right] \end{aligned}$$

(9)

As we do not know the suitable membership function we need to try out the well-defined memberships like the exponential and Gaussian functions. If they do not fit, any arbitrary membership functions may be sought without affecting the definition of the information value. One such function is:

$$\begin{aligned} \mu _{{ ij}} =\frac{\left| {I_{{ ij}} -I\left( \mathrm{ref} \right) } \right| }{f_h^2 } \end{aligned}$$

(10)

Seven propertiesof the information sets referred to as Property-i, where $i=1,2,\ldots ,7$ are now elaborated.

2.2 Properties of information set

2.2.1 The information set can be converted into different forms

It has been observed that the basic information set may not be effective. Hence we convert it into different forms for dealing with different problems by assuming the information value as a unit of information. We can apply any function on the information values.

For example, the information value $\left\{ {I_{{ ij}} \mu _{{ ij}} } \right\} $ on the application of a sigmoid function S leads to

$$\begin{aligned} S\left( {I_{{ ij}} \mu _{{ ij}} } \right) =S_{{ ij}} =\frac{1}{1+\mathrm{e}^{-I_{{ ij}} \mu _{{ ij}} }} \end{aligned}$$

(11)

Thus modified information set $\{S_{{ ij}}\}$ provides betterfeatures than those generated from the basic information set $\left\{ {I_{{ ij}} \mu _{{ ij}} } \right\} $. Similarly, we can generate log $\left\{ {I_{{ ij}} \mu _{{ ij}} } \right\} $ features. It is also possible to derive information sets from texture images which are in turn obtained from an image by applying either local binary pattern (LBP) or local directional derivative (LDP) operators. The resulting information sets that carry texture information are denoted by $\left\{ {\mathrm{LBP}(I_{{ ij}} )\mu _{{ ij}} } \right\} $ or $\left\{ {\mathrm{LDP}(I_{{ ij}} )\mu _{{ ij}} } \right\} $. We can use the sigmoid function to operate on these information sets as well. If we are desirous of dimensionality reduction, then PCA or 2DPCA can be applied on these information sets.

2.2.2 Probability and possibility can be addressed very easily through information sets

For example, when the gray levels g(k) are represented by membership functions $\mu \left( k \right) $ and the frequency of their occurrences by the probability h(k). The histogram is a plot of g(k) versus h(k).

We can get two types information values: possibilistic uncertainty given by $g\left( k \right) \mu \left( k \right) $ and the possibilistic-probabilistic uncertainty $h\left( k \right) g\left( k \right) \mu \left( k \right) $ from the kth gray level.

2.2.3 The desired components can be captured from the information by a weighting function

Let us consider the weighted entropy,

$$\begin{aligned} H=f\left( p\right) p\mathrm{e}^{-\left( {ap^{3}+bp^{2}+cp+d} \right) } \end{aligned}$$

(12)

The above can be written in the form

$$\begin{aligned} H=I_{{ ij}} \mu _{{ ij}} f\left( {I_{{ ij}} } \right) \end{aligned}$$

(13)

by replacing p with $I_{{ ij}}$ and by the appropriate choice of parameters. The weighting function is chosen to get the desired information. As we will see later, a particular weighting function f acting on the information converts it into a filter.

2.2.4 The spatial and time variation of 1-D (signals) and 2-D (images) can be characterized effectively by the Information sets

The spatial variation of variable represented by a histogram and the time variation through a time function are discussed later in connection with the formulation of Hanman transform.

2.2.5 The information sets make the fuzzy modeling easier in the absence of the output information

We will now explore the role of information sets in the fuzzy modeling. Consider the generalized fuzzy model (GFM) proposed by Ahmad et al. [47]. The fuzzy rule underlying the model is of the form:

$$\begin{aligned}&\hbox {GFM Rule:}\,\mathbf{If}\,x_{1}\,\mathrm{is}\, A(x_{1})\quad \hbox {and}\quad x_{2}\,\hbox {is}\,A(x_{2})\quad \hbox {and}\quad \ldots \ldots x_{n}\,\hbox {is}\,A(x_{n})\nonumber \\&\mathbf{Then}\,y = (B, f(x)) \end{aligned}$$

(14)

where $x_{1},x_{2},{\ldots },x_{n}$ are the fuzzy variables, $A(x_{1}), {\ldots },A(x_{n})$ are fuzzy sets. B is a fuzzy set of y and f(x) is its centroid value. The gneralized fuzzy model (GFM) becomes Takagi–Sugeno model if $B=\phi $ and Mamdani–Larsen model if $f(x)=0$. The GFM rule can be converted into information rule as follows:

$$\begin{aligned}&\hbox {Information Rule:}\,\mathbf{If}\,H_{1}(x_{1})\,\hbox {is}\, \{A(x_{1})\mu (x_{1})\}\quad \hbox {and}\quad H_{2}(x_{2})\,\hbox {is}\nonumber \\&\{A(x_{2})\mu (x_{2})\}\ldots ,\quad \hbox {and}\quad H_{n}(x_{n})\,\hbox {is}\, \{A(x_{n})\mu (x_{n})\},\nonumber \\&\mathbf{Then}\,y = (B({\bar{A}_i }, \bar{\mu }_i ),\,f(x)) \end{aligned}$$

(15)

Assuming that the information sets are independent and of proportions $p_{\mathrm{i}}$ as in Gaussian mixture model (GMM), we obtain

$$\begin{aligned} {\bar{A}_i} =\frac{{\sum }_{x_i}^{p_i} H_i \left( {x_i } \right) }{{\sum }_{x_i }^{p_i} \mu \left( {x_i } \right) };\quad \bar{\mu }_i =\frac{{\sum }_{x_i }^{p_i} H_i \left( {x_i } \right) }{{\sum }_{x_i }^A\left( {x_i } \right) }\quad \hbox {and}\quad f\left( x \right) =\frac{{\sum }_i^n \bar{A}_{i\bar{\mu } _i } }{{\sum }_i^n \bar{\mu } _i } \end{aligned}$$

(16)

If all information sets have the same cardinality, then $\hbox {p}_{\mathrm{i}}=1$. Note that we are able to easily derive all the arguments of $y\,(B({\bar{A}_i},\,\bar{\mu }_i ),\,f(x))$ without any knowledge of y, thus bringing us into the realm of unsupervised learning. If y is known, then the estimated output f(x) will tell us the modeling error $(y-f(x))$. Based on (16), we state a lemma that demonstrates the usefulness of information sets in the context of fuzzy modeling.

Lemma

The representation of fuzzy sets in the GFM by the information sets converts the antecedent part of the rule into the output thus facilitating the unsupervised learning. In the aftermath of this lemma, the unsupervised neural networks can be modified to facilitate easy learning.

Interactive Information: If the information sets have overlapping information, f(x) requires the Choquet integral type of computation [48]. The interactive information set features based on S-norms are already presented in [43], but these are different from what we are going to propose here. We will make use of the adaptive Hanman–Anirban entropy function to see whether we can convert into the Choquet integral form as follows:

Substituting $p_{i}=\bar{A}_i;\,a=b=0$ and $d=-\bar{{A}}_{i-1} $ and $c=1$ in (3) we get

$$\begin{aligned}&\bar{y}=f(x)=\sum _{i=1}^d \bar{A}_{i}\mathrm{e}^{-\left( \bar{A}_{i}-\bar{A}_{i-1}\right) } =\sum _{i=1}^n {\bar{{A}}_i } \mathrm{e}^{-\Delta \bar{{A}}_i }\nonumber \\&\hbox {s.t.}\,\bar{{A}}_n =\bar{{A}}_0 \end{aligned}$$

(17)

In the Choquet integral [48], the fuzzy measures are estimated from the input sets starting with one element set and then two-element set and finally ending with the complete set as follows.

$$\begin{aligned} x_1= & {} \left\{ {\bar{A} _1 } \right\} ,\quad x_2 =\left\{ {\bar{A} _2,\quad \bar{A} _1 } \right\} ,x_{d-1}\\= & {} \left\{ {\bar{A} _{d-1} ,\ldots ,\bar{A} _2 ,\bar{A} _1 } \right\} ,\ldots ,x_1 \\= & {} \left\{ {\bar{A} _d ,\ldots ,\bar{A} _2 ,\bar{A} _1 } \right\} . \end{aligned}$$

As can be seen from (17) that it cannot be put in the Choquet integral form as the exponential gain function is a function of the difference between two adjacent values, whereas fuzzy measure is a function of all previous values. In order to convert (17) into the Choquet integral form, we need to modify the Hanman–Anirban entropy function into the form as follows:

$$\begin{aligned} \bar{y}= & {} f(x)\sum _{i=1}^d g\left( {x_i } \right) \mathrm{e}^{\left( {\bar{{A}}_i -\bar{{A}}_{i-1} } \right) }\nonumber \\= & {} \sum _{i=1}^d g\left( {x_i } \right) \mathrm{e}^{\left( {\Delta \bar{{A}}_i } \right) } \end{aligned}$$

(18)

In this form, $g(x_{\mathrm{i}})$ being a fuzzy measure requires learning of the interaction parameters.

2.2.6 The information sets can be extended to information rough and rough information sets

In real life, only the information values are available. For example, during the admission of a candidate to a program, each expert of the interview committee x gives only the relative marks $H\left( x \right) =I\left( x \right) \mu \left( x \right) $ (where I(x) refer to the candidates’ performance and $\mu \left( x \right) $ his relative grade as perceived by the expert x with respect to the previous performance of the candidates he has interviewed so far).

Consider different membership functions (agents) representing the same fuzzy set, then some are on the higher side $\left\{ {H\left( x \right) =I\left( x \right) \mu \left( x \right) ;\mu \left( x \right) \ge \alpha } \right\} $ which correspond to the lower Information Rough set and the others are on the lower side that corresponds to the upper Information Rough set. When there is a divergence (div) in the evaluations or attributes, then roughness arises.

If $\left\{ {H\left( x \right) \ge M} \right\} $, where M is a threshold, it is the lower Rough Information set; otherwise, it is the upper rough information set. There are also several other aspects of the rough set theory that can be easily embedded into the information set theory, but these are not addressed here.

2.2.7 Information sets allow the application of agent theory

When different membership functions (agents) judge the information source values differently, then we can aggregate the membership function values through t-norms or s-norms to improve the representation. If we have one sample of a user fitted with the two membership functions, then these functions can be aggregated to provide better representation. An agent is a higher form of membership function. We define an agent as the one that generates the information when its parameter is varied.

Definition of an Agent: Consider the exponential gain function $\mathrm{e}^{-\left( {cp_i +d} \right) }$ from (18) which when differentiated with respect to c gives us

$$\begin{aligned} -p_{i}\,\mathrm{e}^{-\left( {cp_i +d} \right) } \end{aligned}$$

(19)

The absolute value of this derivative is the information value associated with $p_{i}$.

In the context of an agent, it is imperative to define two types of information.

Auto Information set HA(x): If the membership $\mu \left( x \right) $ is obtained from the statistics of the information source I(x), then we can derive the auto information value, i.e., $HA\left( x \right) =I\left( x \right) \mu \left( x \right) $.

Hetero Information set HC(x): If the membership $\mu \left( y \right) $ is obtained from the statistics of another information source I(y), then we can derive the hetero information value, i.e., $HC\left( x \right) =I\left( x \right) \mu \left( y \right) $.

Some Important notations used in this paper I stand for the information source (say image). $\hbox {I}_{\mathrm{ij}}$ is the information source value. $\mu _{{ ij}}$ Stands for the membership function (also an agent). H stands for the information and $\mathcal{H}$ stands for the information set $\{H_{\mathrm{ij}}\}$. Note that $H_{\mathrm{ij}}$ is the information value. $H_{\mathrm{t}}$ is the Hanman transform; the subscript “t” denotes that it is a transform. The other subscripts on H such as e, g, $N_{\mathrm{e}}, N_{\mathrm{g}}$ denotes that the transforms are based on exponential, Gaussian, normalized exponential, normalized Gaussian respectively.

$H_{\mathrm{N}}$ is the normalized entropy and H(p) is the entropy as a function of probability “p”. $\mathcal{H}\left( s\right) $ is an information set as a function of s. $\mathcal{H}\left( {s,F} \right) $ is the Hanman filter. $F_{\mathrm{ij}}(s,u)$ is the filter function. $\mu _{{ ij}} \left( s \right) $ as a function of s. The superscripts e and g on $\mu _{{ ij}} $ indicate the type of membership function (exponential and Gaussian). $f_{\mathrm{ij}}$ is the feature vector. In the context of classifier design, $e_{\mathrm{ij}}$ is taken as the error vector and $E_{\mathrm{ij}}$ is the normed error vector. The less important notations are ignored to save space.

3 Higher form of information sets

We will now make use of information sets in the formulation of Hanman filter and Hanman transform which are higher form of information sets.

3.1 Hanman filter

Development of a filter is motivated by the desire to change the information set as per our requirement. For example an image convolved with Gabor filter displays the highlighted frequency components. The under exposed image can be made to have the pleasing look by applying an enhancement operator. The original information that is not very useful needs to be modified by a filter or an operator. There are two ways to change the information: (i) by changing the membership function/agent that multiplies the information source, and (ii) by devising a filter function or an operator.

We will now discuss the change of information or the generation of different information sets by changing the parameter of a membership function. Consider a set of information values originating from the information sources (gray levels) in a window and assume that they are fitted with the membership function of Gaussian type as a function of the fuzzifier. When the fuzzifier is varied by a scale factor, it gives rise to different membership functions. The generation of information sets $\mathcal{H}\left( s \right) $ accomplished by varying the scale factor s in the membership function, $\mu \left( s \right) $, is expressed as

$$\begin{aligned} \mathcal{H}\left( s \right)= & {} \left\{ {\mu _{{ ij}} \left( s \right) I_{{ ij}} } \right\} \nonumber \\ \mu _{{ ij}} \left( s \right)= & {} \mathrm{e}^{-\left[ {\frac{\left( {I_{{ ij}} -I_{\mathrm{avg}}} \right) ^{2}}{sf_h^2 }} \right] }\quad \hbox {for}\quad s \ \epsilon \left\{ {0.4,0.6,0.8,1} \right\} \end{aligned}$$

(20)

where $I_{{ ij}}$ is the gray level in a window. The membership function need not be Gaussian and $\mathcal{H}\left( s \right) $ can also be modified by applying any function such as sigmoid function.

We will now see how to change the information sets. In this the original information set is modified by the choice of a filter function or an operator. As an example, we consider a suitable cosine to achieve our objective.

Following Property-1 and Property-3, the desired frequency components can be filtered out (captured) from the information sets by the cosine function, $\cos \left( {2\pi F_{{ ij}} \left( {s,u} \right) } \right) $called the Hanman filter. Invoking this filter modifies the information sets in (20) to:

$$\begin{aligned} \mathcal{H}\left( {s,F} \right) =\mu _{{ ij}} \left( s \right) I_{{ ij}}\,\hbox {cos}\left( {2\pi F_{{ ij}} \left( {s,u} \right) } \right) \end{aligned}$$

(21)

Note that unlike the Gabor filter, Hanman filter is not a function of orientation but is a function of scale s, frequency u and translation of the information source $I_{\mathrm{ij}}$ by the amount $I_{\mathrm{avg}}$. Thus, it has the capability of a wavelet function. The filter function $F_{\mathrm{ij}}(s, u)$ acts on the original information to separate out the frequency components. It is chosen as:

$$\begin{aligned} F_{{ ij}} \left( {s,u} \right) =F_u \left[ {\frac{\left| {I_{{ ij}} -I_{\mathrm{avg}}} \right| }{2^{\mathrm{s}}}} \right] ;\quad s=0.4,0.6,0.8,1.0 \end{aligned}$$

(22)

where we have taken $F_u =\frac{F_{\mathrm{max}} }{2^{\left( {u/2} \right) }}$ with $u=1,2,3$; $F_{\max }=0.25$. While the symbol of scale parameter in (22) for generality could have been changed, it has been avoided for simplicity. Another function that one could opt for is $\mathrm{e}^{-i2\pi F_{{ ij}} \left( {s,u} \right) }$ instead of the cosine function but this produces both real and imaginary components. To simplify (22) further, one can do away with $F_{\mathrm{u}}$ by incorporating its effect as follows:

$$\begin{aligned} F_{{ ij}} \left( {s,u} \right) =\frac{\left| {I_{{ ij}} -I_{\mathrm{avg}} } \right| }{2^{su}} \end{aligned}$$

(23)

Neglecting s and taking appropriate value for u in the range 3–5 converts $F_{\mathrm{ij}}(s,\,u)$ into

$$\begin{aligned} F_{{ ij}} \left( u \right) =\left[ {\frac{I_{{ ij}} -I_{\mathrm{avg}} }{2^{u}}} \right] \end{aligned}$$

(24)

It may be noted that we have not accounted for the orientation in the filter (21) for the simple reason that the face images are bereft of large pose variation. As regards frequency content, it is hard to determine the most suitable frequency components for a particular problem. The frequency components of our choice can be retrieved. For instance, uth frequency is determined from (21) using $s=1$ and $u=2$ as:

$$\begin{aligned} \mathcal{H}\left( {1,F} \right) =\sum _{i=1}^W \sum _{j=1}^w I_{{ ij}} \mu _{{ ij}} \left( 1 \right) \,\hbox {cos}\left( {2\pi F_{{ ij}} \left( {1,2} \right) } \right) \end{aligned}$$

(25)

Algorithm: The steps for the Hanman filter features from (20)–(21) are as follows: (1) Generate 12 information sets from a window of size $W\times W$ for $W=3,5,7,9$ in an image by taking 3 values of u and four values of s, (2) Compute the composite information set by aggregating all 12 sets, (3) Compute the average value from each window as the feature, (4) Repeat Steps 1–3 until all windows in a face image are covered, thereby producing a feature vector, and (5) Generate different features corresponding to different values of W.

The effectiveness of Hanman filter over Gabor filter is demonstrated on face recognition by Sayeed and Hanmandlu [49]. The generality of the information values arising from the flexibility that they can be changed in several ways bestows it an immense power to Hanman filter whereas Gabor filter is stuck to Gaussian membership function only.

3.2 Hanman transforms

The second and fourth properties of information sets are used to derive a transform to assess higher form of uncertainty in the information source values in a window of an image based on the initial uncertainty representation. This can be accomplished by a possibilistic version of the adaptive Hanman–Anirban entropy function having variable parameters. The transforms have realistic applications. For example, we gather information about an unknown person of some interest to us. This is the first level of information (set) and then evaluate him again to get the second level of information camped with the first one.

We will now see the formulation of transforms using the Hanman–Anirban entropy function. For this, the parameters of the Hanman–Anirban entropy function (4) are selected as $a=b=d=0$ and $c=-\left( {\mu _{{ ij}} /I_{\mathrm{max}}} \right) $ leading to the Hanman Transform:

$$\begin{aligned} H_t \left( I \right) =\sum _{i=1}^W \sum _{j=1}^W I_{{ ij}} \mathrm{e}^{-\left( {\mu _{{ ij}} I_{{ ij}} /I_{\mathrm{max}} } \right) } \end{aligned}$$

(26)

Here one of the parameters, c is taken to be a function of $\mu _{{ ij}} $. As can be seen from (30), the information source is weighted as a function of the information value. This can be observed in social contexts for example, where a person (information source) is judged by the opinions of others (information value).

If we have some prior information H0 in (26), then the exponential gain depends on the relative information as follows:

$$\begin{aligned} H_t \left( I \right) =\sum _{i=1}^W \sum _{j=1}^W I_{{ ij}} \mathrm{e}^{-\{(\mu _{i,j} .I_{{ ij}}-H0\}/I_{\mathrm{max}})} \end{aligned}$$

(27)

If the subimage is represented as a histogram of g(k) versus h(k), where g(k) is the kth gray level and h(k) is the frequency of occurrence of kth gray level, then the transform using Property-2 becomes

$$\begin{aligned} H_t \left( g \right) =\sum _k h\left( k \right) g\left( k \right) \mathrm{e}^{-\mu _k g\left( k \right) } \end{aligned}$$

(28)

where $\mu _k$ is the membership function value of kth gray level. An integral form of the transform not satisfying the properties of the entropy function is expressed as:

$$\begin{aligned} H_t \left( h \right) =\int _g h\left( g \right) \mathrm{e}^{-\mu \left( g \right) h\left( g \right) }\hbox {d}g \end{aligned}$$

(29)

A simple representation of histogram is to have h(k) both as the membership function and g(k) as the information source values.

$$\begin{aligned} H_t \left( g \right) =\sum _k g\left( k \right) \mathrm{e}^{-h\left( k \right) g\left( k \right) } \end{aligned}$$

(30)

In some applications, the probability density function (PDF) (life expectancy h(a) of a person ‘a’ having the age a(y)) serves as the membership function and in such cases, (30) can be written as

$$\begin{aligned} H_t \left( a \right) =\sum _y a\left( y \right) \mathrm{e}^{-h\left( a \right) a\left( y \right) } \end{aligned}$$

(31)

The extension of (30) to the time varying signal of some fixed duration is very simple.

$$\begin{aligned} H_t \left( t \right) =\sum _t g\left( t \right) \mathrm{e}^{-\mu _t g\left( t \right) } \end{aligned}$$

(32)

However, this work is limited to spatial variations

Algorithm: The Hanman transform features are extracted from (30) in the following steps: (1) Compute the membership value associated with each gray level in a window of size $W\times W$, (2) Compute the information as the product of the gray level and its membership function value, divided by the maximum gray level in the window, (3) Take the exponential of the normalized information and multiply it with the gray level, (4) Repeat steps 1–3 on all gray levels in a window and sum the values to obtain a feature, (5) Repeat steps 1–4 on all windows in a face image to get all features, and (6) repeat steps 1–5 for $W=3,5,7,9$ for the performance evaluation.

The Hanman transform was used to transform the structure function for the representation of multispectral palmprints in [50] by Grover and Hanmandlu. Aggarwal and Hanmandlu [51] provide a comprehensive treatment of possibilistic uncertainty using higher order Shannon transforms along with several uncertainty measures. These transforms are offshoots of Hanman transforms that were proposed much before the Shannon transforms.

3.3 The new entropy function

With view to represent the uncertainty in the information source values under the unconstrained conditions that exist in surveillance application, a new entropy function called the Mamta–Hanman entropy is proposed in [52]. The unified features are derived to represent the three modalities, IR face, iris, and ear for the development of face-based multimodal biometric system by fusing them using the score level fusion in [53].

The Mamta–Hanman entropy function is of the form

$$\begin{aligned} H=\sum \limits _{i=1}^n \sum _{i=1}^n I_{{ ij}} ^{\gamma }\mathrm{e}^{-\left( {cI_{{ ij}} ^{\alpha }+d} \right) ^{\beta }} \end{aligned}$$

(33)

In view of this, the basic information set now becomes

$$\begin{aligned} \mathcal{H}\left( I \right) =\left\{ {I_{{ ij}}^\gamma \mu _{{ ij}} } \right\} \end{aligned}$$

(34)

The membership function can be chosen appropriately. The Hanman transform in (26) can be written as

$$\begin{aligned} H_t \left( I \right) =\sum _{i=1}^W \sum _{j=1}^W I_{{ ij}}^\gamma \mathrm{e}^{-\left( {\mu _{{ ij}} I_{{ ij}}^\alpha /I_{{\max }} } \right) } \end{aligned}$$

(35)

As can be seen that (35) offers a lot of flexibility in the choice of parameters but at the cost of increased learning.

4 Information processing

Here we make use of Property-5 of the information sets for facilitating the information processing. We define two rules: information modeling (IM) for the extraction of features based on information sets and information processing (IP) for the matching of training features with the test features. Let $I_{\mathrm{tr1}}(l),{\ldots },I_{\mathrm{trn}}(l)$ be the n sub images(windows) of lth training sample and $f_{\mathrm{tr}}$ (l, j) be the corresponding training feature vector and $f_{\mathrm{te}}(j)$ be the test feature vector. The IM-Rule is of the form.

$$\begin{aligned}&\hbox {IM-Rule:}\, \mathbf{If}\,f_{\mathrm{tr}}(i,j);\,i\varepsilon ;\,M\quad and\quad f_{\mathrm{ts}}(j)\, \hbox {are training and the test}\nonumber \\&\hbox {feature vectors}\,\mathbf{Devise}\,\hbox {a Criterion function} \end{aligned}$$

(36)

The devise construct is intended to provide a choice for the user to devise any objective function. The Information set theory underlying the IPC and the Hanman integral has been encapsulated in the above rule.

Development of classifiers

An attempt is made to formulate both inner product classifier (IPC) and Hanman classifier (HC) from Hanman–Anirban conditional entropy function, given the feature vectors of the training set and the test set.

4.1 Inner product classifier

This classifier is built on a basis of the training features and the absolute errors between the training and test sample features. We consider the average of the two training features and aggregation of errors using t-norms for the development of the classifier. The purpose of aggregation is to account for the interactions between the errors. Their inner product between the average of the training features and the fused errors must be the least for the test features to match with the training features. This is the concept behind the proposed classifier.

The aggregated error vectors act as the support vectors which when projected onto the average of the feature vectors, become the inner products that are akin to the margins in support vector machine (SVM). The difference between the highest and the lowest inner products gives the range of margins. The training feature vectors associated with the lowest margin give the identity to the test feature vector. As the absolute errors are considered, the margin is toward the positive side of the projection plane, i.e., the hyperplane. The other forms of errors like square of the errors can also be investigated in the future.

The t-norms generalize the logical conjunction of two fuzzy variables (feature vectors) a and b in the interval [0,1]. If the training set contains a number of sample faces, then the t-norm is taken between the feature vectors of two training samples to increase the difference, i.e., margin between them thus facilitating easy classification. The choice of t-norms suitable for a feature set is by trial. Of the many families of t-norms, Yager t-norm is found to be most suitable to the face recognition for it gives the maximum margin. It has a parametric form given by [54]:

$$\begin{aligned} t_Y =\hbox {max}\left[ 1-[(1-x)^{p}+\left( 1-y)^{p} \right] ^{1/{p}},0 \right] ;\quad p>0 \end{aligned}$$

(37)

In our case $p=22$. We will now present the steps involved in the IPC algorithm.

Algorithm for IPC

It may be noted that the normalization is done on the entire feature data. Each feature vector is normalized using only the minimum and maximum feature values. Even if we use the maximum and minimum of the training feature vectors, the results will not be affected. The discriminating power of the classifier comes from the use of appropriate t-norms.

Normalize all the features of all users ($\forall i)$ column-wise ($j=1,2,\ldots ,N$) using

$$\begin{aligned} \bar{f} \left( {i,j} \right) =\frac{f\left( {i,j} \right) -\min \left( {f\left( {i,j} \right) } \right) }{\max \left( {f\left( {i,j} \right) } \right) -\hbox {min}\left( {f\left( {i,j} \right) } \right) } \end{aligned}$$

(38)

where $f\left( {i,j} \right) $ is the jth feature of ith sample.

1.
Divide the normalized feature set $\left\{ {\bar{f} \left( {i,j} \right) } \right\} $ into the training set $\{f_{\mathrm{tr}} \left( {i,j} \right) \}$ and test set $\left\{ {f_{\mathrm{te}} \left( j \right) } \right\} $.

Here $i =1,2,{\ldots },M;\,j= 1,2,{\ldots },N$; M being the total number of samples for each user in the training set and N being the total number of features from a sample; $f_{\mathrm{tr}}$ and $f_{\mathrm{te}}$ are the feature vectors of the training and test samples respectively.
2.
Calculate the absolute errors $e_{{ ij}} $ between the features of the ith and kth training samples of a user and any test sample as:
$$\begin{aligned} e_{{ ij}}= & {} \left| {f_{\mathrm{tr}} \left( {i,j} \right) -f_{\mathrm{te}} \left( j \right) } \right| \nonumber \\ e_{kj}= & {} \left| {f_{\mathrm{tr}} \left( {k,j} \right) -f_{\mathrm{te}} \left( j \right) } \right| \end{aligned}$$
(39)
3.
Fuse the absolute errors of ith and kth training samples by the Yager t-norm denoted by
$$\begin{aligned} E_{ik} (j)=t_Y \{e_{{ ij}},e_{kj}\},i\ne k \end{aligned}$$
(40)
We consider all possible combinations of the training sample errors in (40) entailing a marginal computation but with the prospect of obtaining the least value of $E_{ik} (j)$
4.
Find the average feature value of ith and $\hbox {k}$th training samples
$$\begin{aligned} f_{ik} \left( j \right) =1/2\left\{ {f_{\mathrm{tr}} \left( {i,j} \right) +f_{\mathrm{tr}}\left( {k,j} \right) } \right\} \end{aligned}$$
(41)

The normed error vectors in (41) behave as the support vectors and average feature vectors in (41) as the weights of SVM. So it is necessary and sufficient that the inner product of $E_{ik}(j)$ and $\hbox {f}_{\mathrm{ik}}(j)$ must be the least for the training sample to be close to the test sample.

$$\begin{aligned} h_{ik} \left( l \right) =\sum _{j=1}^{N} f_{ik} \left( j\right) E_{ik} \left( j \right) =\left\langle {f_{ik} ,E_{ik}} \right\rangle ;\quad i\ne k \end{aligned}$$

(42)

As $i,k=1,2,\ldots ,M$, the number of products generated from (42) is ${\sum }_{i=2}^M \left( {M-i+1} \right) $. The minimum of $h_{ik} \left( l \right) $ is the measure of dissimilarity corresponding to the lth user. While matching, whichever user corresponds to the infimum of $h_{ik} \left( l \right) $ for all l gives the identity of the test user. Note that $f_{\mathrm{tik}} \left( j \right) $ is the jth information source (feature) and fusion of two errors gives the confidence about the information. As per experiments, another variant of (42), $h_{ik} \left( l \right) ={\sum }_{j=1}^{N} f_{ik} \left( j \right) {\sum }_{j=1}^{N} E_{ik} \left( j \right) $ cannot be overlooked and must be given a trial. An interesting result emerges if we introduce the membership functions of the terms in this relation:

$$\begin{aligned} H_{ik} \left( l \right) =\sum _{j=1}^{N} f_{ ik} \left( j\right) \mu _{\mathrm{fik}} \left( j \right) \sum _{ j=1}^{N} E_{ik} \left( j \right) \mu _{\mathrm{Eik}} \left( j \right) =H_{\mathrm{fik}} H_{\mathrm{Eik}} , \end{aligned}$$

which is the product of the information of the training features and that of the errors.

4.2 Hanman classifier and normed error classifier(NEC)

The conditional Hanman–Anirban entropy of a partition $A_{\mathrm{i}}$, given that $B_{\mathrm{j}}$ has occurred, is expressed as:

$$\begin{aligned} H[A_i |B_j ]=\sum _{i=1}^n {p_{i|j} \mathrm{e}^{-[ap_{i|j} ^{3}+bp_{i|j} ^{2}+cp_{i|j} +d]}} \end{aligned}$$

(43)

where $p_{i|j} =p_{{ ij}} /q_j =\Pr [A_i |B_j ]=\Pr [A_i B_j ]/\Pr [B_j ]$

The Bayesian conditional entropy is not applicable here as we do not have the joint probability density function of $A_{i}$ and $B_{j}$. We now propose the possibilistic versions of the above conditional entropy function. Assuming that $A=\{A_i =f_{\mathrm{tr}} (i,j)\}$ and $B=\{B_i =f_{\mathrm{ts}} (j)\}$ which refer to the training and the test information sets, respectively, the conditional possibility cposs is defined as

$$\begin{aligned} \hbox {cposs}\left( {A/B} \right) =\left\{ {f_{\mathrm{tr}} \left( {i,j} \right) -f_{\mathrm{ts}} \left( j \right) } \right\} =\left\{ {e_{{ ij}} } \right\} \end{aligned}$$

(44)

If we take $A=\{f_{\mathrm{tr}} (i,j)\mu _{\mathrm{tr}} (i,j)\}$ and $B=\{f_{\mathrm{ts}} (j)\mu _{\mathrm{ts}} (j)\}$, then (44) becomes

$$\begin{aligned} \hbox {cposs}\left( {A/B} \right)= & {} \left[ {f_{\mathrm{tr}} \left( {i,j} \right) \mu _{\mathrm{tr}} \left( {i,j} \right) -f_{\mathrm{ts}} \left( j \right) \mu _{\mathrm{ts}} \left( j \right) } \right] \nonumber \\= & {} \left\{ {e_{{ ij}} } \right\} \end{aligned}$$

(45)

The above definition of the possibility justifies the fact that if we have already some information A and if new information B is received, then it is easy to observe its difference from A. Taking $a=b=0$ in (43) gives rise to the Hanman distance, given by

$$\begin{aligned} H\left( {A|B} \right)= & {} \hbox {cposs}\left( {A|B} \right) \mathrm{e}^{-\left[ {\mathrm{c.poss}(A|B+d} \right] }\nonumber \\= & {} \sum _{i=1}^n e_{{ ij}} \mathrm{e}^{-\left[ {c.e_{{ ij}} +d} \right] } \end{aligned}$$

(46)

It can be proved that this is more general than Euclidean and Mahalanobis distances. Taking $d=-1$ and $c= -1/\hbox {cov}\,(e_{{ ij}})$ with $\hbox {exp}({-}x) =1-x$ in (46) results in Mahalanobis distance and with $d=-1$ and $c=-1$, the Euclidean distance.

We will extend the conditional possibility to the case of two training feature vectors, $\{A_i =f_{\mathrm{tr}} (i,j)\}$, $\{C_i =f_{\mathrm{tr}} (k,j)\}$ and one test feature vector $\{B=f_{\mathrm{te}} (j)\}$. The conditional possibility can now be written as:

$$\begin{aligned} \{\hbox {cposs}(A\cap C/B)\}=(A-B)\cap (C-B) \end{aligned}$$

(47)

Substituting for A, C and B we get

$$\begin{aligned}&\hbox {cposs}\left\{ (f_{\mathrm{tr}} (i,j)\cap f_{\mathrm{tr}}(k,j)/f_{\mathrm{te}} (j)\right\} \nonumber \\&\quad =\left\{ f_{\mathrm{tr}} (i,j)-f_{\mathrm{te}} (j)\right\} \cap \left\{ f_{\mathrm{tr}} (k,j)-f_{\mathrm{te}} (j)\right\} \nonumber \\&\quad =\left\{ e_{{ ij}} \cap e_{kj} \right\} =\left\{ t(e_{{ ij}} ,e_{kj} )\right\} =E_{ik} (j) \end{aligned}$$

(48)

where $E_{ik} (j)$ is the normed error vector. As our aim is to build a classifier similar to IPC, the error transform named the Hanman classifier is obtained on replacing $e_{{\mathrm{ij}}}$ by $E_{\mathrm{ik}}(j)$ from (48) in (4).

$$\begin{aligned} H(E_{ik} )=\sum _{j=1}^n {E_{ik} (j)\mathrm{e}^{-[c.E_{ik}(j)+d]}} \end{aligned}$$

(49)

To avoid learning, we have taken $c=1$ and $d=0$ for implementation on databases. Let $\varphi _{\mathrm{i}} \left( x \right) =x\mathrm{e}^{x}\forall i=1,2,\ldots n$ such that $\varphi _i^{{\prime }{\prime }} \left( x \right) =\left( {x+2} \right) \mathrm{e}^{x}>0\,\forall \,x\in R$. Thus $\varphi _{\mathrm{i}}$ is convex and twice differentiable hence it acts a splitting function. Thus $H(E_{ik})$ is a splitting function. If the exponential gain is ignored in (49), we get the normed error classifier (NEC) given by

$$\begin{aligned} h_{ik} \left( l \right) =\sum _{j=1}^n E_{{\mathrm{ik}}} \left( j \right) \quad \hbox {with}\quad E_{ik} (j)=t_Y (e_{{ ij}},e_{kj} ) \end{aligned}$$

(50)

The entropy function in (49) permits another form for (50) such as $H(E_{ik} )={\sum }_{j=1}^n {E_{ik} (j)\mathrm{e}^{-[E_{ik}^2 (j)/2]}}$ which can be seen as the product of the error function and the Gaussian membership function with zero mean and unit variance.

5 Results of face recognition

5.1 Face databases

The information set based features are tested on three face databases using SVM, IPC, and HC. The first one, ORL (AT&T) database [55] has 40 users with 10 samples per user. Of which, 7 are used for the training and 3 for the testing. The second is the Indian face database [56] that contains 53 persons, each having 11 images. The orientations of the face (both male and female) include: looking front, looking left, looking right, looking up, looking up toward left, looking up toward right, looking down. The emotions include: neutral, smile, laughter, sad/disgust. The third is the Sheffield Face database (UMIST database) [57] containing 20 users each having 23 sample images. The images have a range of pose variations from profile to frontal. In addition to these three databases, we have used two more databases, viz., Face-95 [58] having 72 users with 1440 images and FEI [59] having 100 users with 1400 images.

In the case of HF features, Euclidean distance, Bayesian LDC, SVM, IPC, NEC and HC give (See Table 1) the maximum recognition rates of 90% ($7\times 7$), 95.83% ($9\times 9$), 96.67% ($9\times 9$) for the polynomial degree 1, 96.67% ($3\times 3$), 97.5% ($7\times 7$) and 98.83% ($5\times 5$ and $7\times 7$) respectively for the training to test ratio of 7:3. Note that the recognition rate on HF features has 98.83% with HC on $7\times 7$ window and 97.5% with NEC but it drops to 96.67% with SVM(PR-Tools) for the polynomial of degree 3 on $9\times 9$ window and also with IPC on $3\times 3$ window.

The recognition rates obtained on HT features using two versions of SVM (PR-Tools and LIB) [60,61,62] differ widely. PR-Tools give the best recognition rate of 93.33% ($7\times 7$ for polynomial of degree3 and $9\times 9$ for degree 2) and LIB gives 98.33% (on both $5\times 5$ and $7\times 7$) polynomial of degree 1 in Table 2. The same result is also obtained with IPC on HT features on the window size of $7\times 7$, but the slightly improved rate of 99.2% is obtained with both NEC and HC on $5\times 5$ and $7\times 7$ windows. However with Bayesian LDC, the performance deteriorates to 94.17% ($5\times 5$). Thus, HT features are found superior to HF features because of the difficulty in the selection of appropriate frequency components in the latter. Here HC is more consistent and its performance fares well over that of IPC and slightly better than NEC and SVM. The recognition rate using Gabor features with HC and NEC is 97.5% but is 95.3% with SVM as in Table 3.

The performance of the Gabor, HF, and HT features is also evaluated on IIT Kanpur Indian Face database and UMIST Face databases using three classifiers and SVM in Tables 4, 5, 6, 7, 8 and 9. The best recognition rate of 98.48% is achieved with HC on HF features on $9\times 9$ window size. In the case of UMIST database, the best result of 95% is obtained with HC and NEC.

The Gabor filter features are found to yield the best result of 96.22% recognition rate with the HC on IIT Kanpur Face Database whereas the same features give 95% on the UMIST database with three new classifiers and SVM.

Table 1 Recognition rates with HF features for the ratio of 7:3 (AT&T database)

Full size table

Table 2 Recognition rates with HT features for the ratio of 7:3 (AT&T database)

Full size table

Table 3 Recognition rates with Gabor features for the ratio of 7:3 (AT&T database)

Full size table

Table 4 Recognition rates with HF features for the ratio of 8:3 (Indian face database)

Full size table

Table 5 Recognition rates with HT features for the ratio of 8:3 (Indian face database)

Full size table

Table 6 Recognition rates with Gabor features for the ratio of 8:3 (Indian face database)

Full size table

Table 7 Recognition rates with HF features for the ratio of 18:5 (UMIST database)

Full size table

Table 8 Recognition rates with HT features for the ratio of 18:5(UMIST database)

Full size table

Table 9 Recognition rates with Gabor features for the ratio of 18:5 (UMIST database)

Full size table

Table 10 Recognition rates with different classifiers for Faces-95 database with HF and HT features

Full size table

Table 11 Recognition rates with different classifiers for FEI face database with HF and HT features

Full size table

Table 12 Recognition rates with different classifiers and database with LBP features

Full size table

Table 13 Recognition rates with Gabor information features for the ratio of 6:4 (AT&T database)

Full size table

In addition to the above three databases, we have also tested HF and HT features on two new databases Faces-95 and FEI using SVM, IPC, NEC, HC and cosine similarity. The results are tabulated in Tables 10 and 11 which show that Faces-95 has good performance with both features whereas FEI database has poor performance because the database has a large variation in poses and illumination. However, cosine similarity performs well on all other classifiers on FEI.

5.2 A comparison with LBP, sift and other features

The linear Binary Pattern (LBP) and Scale Invariant Feature Transform (SIFT) features are implemented on all five databases. The results of LBP are given Table 12 which show good results only on IIT Kanpur Indian Face database and those of SIFT are not given as the results are extremely poor.

For derivation of Gabor information features, we use the Gabor filter bank consisting of a set of 2D Gabor filter $f_{{ ij}}$ at different orientations and frequencies. Each Gabor filter is convolved with the original gray level $I_{{ ij}}$ resulting in the convolved image or Gabor image.

All the outputs of the Gabor filter bank called Gabor images (12 Gabor filters with 3 frequencies and 4 orientations) are aggregated and Gabor information features are extracted from the aggregated Gabor image for different window sizes by finding the information using the membership value for each pixel in the window.

In similar lines, wavelet transform is applied on the image to yield the approximation image which is divided into windows and membership function for every pixel a window is computed to obtain the wavelet information features [49]. The performance of Gabor-Information features on AT&T database is given in Table 13 and that of wavelet information features is given in Table 14. These tables indicate that HC gives the best results of 98.1 and 98.7% recognition rates on these features respectively. However, the best accuracy of 99.2% is obtained on this database with HT features in this work.

6 Conclusions

This paper presents a brief theory on Information sets and Information processing. The elements of an information set called information values are shown to be the products of information source values (gray levels) and their membership function or agent values. The usefulness of information sets is brought about by devising Hanman filter that modifies the information values and Hanman transforms that evaluate the information sources. Six features are developed using the information set, Hanman filter, and Hanman transforms. The generalized fuzzy rules describing the generalized fuzzy model (GFM) are transformed into the information rules that facilitate the unsupervised learning using three classifiers: inner product classifier (IPC), normed error classifier (NEC) and Hanman classifier (HC).

Table 14 Recognition rates with wavelet information features for the ratio of 6:4 (AT&T database)

Full size table

The information values modified by a cosine function are the outcome of the HF whose features have been tested on three databases ORL, Indian and UMIST Face databases using Euclidean distance, Bayesian, SVM (PR-Tools), IPC, NEC and HC. The maximum recognition rates of 98.33% (ORL), 98.48% (IIT Kanpur) and 95% (UMIST) are obtained using HC. The next best is NEC with the corresponding recognition rates (97.7, 96.2, 95%) followed by SVM with (96.7, 91.82, and 93%). The Gabor features give the recognition rate 96.22% using HC on IIT Kanpur Face database and 95% on UMIST using HC. This shows that HF features fare well over Gabor features. HF features have also been implemented on Faces-95 and FEI but the best performance of 95.48% is obtained with IPC on Faces-95.

It may be noted that the Gabor-information features and wavelet information features yield 98.1 and 98.7% recognition rates respectively with HC.

The HT features provide superior recognition rates over HF features on three databases. The recognition rates due to HC are 99.2% (AT&T), 95.5% (Indian face database), 95% (UMIST) and those due to NEC are 99.2% (AT&T), 96.62% (Indian face database), 95% (UMIST). The SVM has the corresponding recognition rates as (98.33, 94.33, 93%). The best result with IPC on Faces-95 s 95.48%.

It is observed that the performance of the new classifiers depends on the choice of t-norms for a particular modality. Out of several t-norms, Yager t-norms are found to be suitable on face databases.

The contributions of the paper include the proposition of information sets and information processing as well as the development of two feature extraction methods, viz., HF, HT, and three classifiers, IPC, NEC, and HC.

References

Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353
Article MATH Google Scholar
Yager RR (1979) On the measure of fuzziness and negation part I: membership in the unit interval. Int Gen Syst 5(4):221–229
Article MATH Google Scholar
Yager RR (1980) On the measure of fuzziness and negation part II: lattices. Inf Control 44(3):236–260
Article MATH Google Scholar
Yager RR (1992) On the specificity of a possibility distribution. Fuzzy Sets Syst 50(3):279–292
Article MathSciNet MATH Google Scholar
Hanmandlu M (2011) Information sets and information processing. Def Sci J 61(5):405–407
Google Scholar
Kirby M, Sirovich L (1987) Low-dimensional procedure for the characterization of human faces. Opt Soc Am 4:519–524
Article Google Scholar
Kirby M, Sirovich L (1990) Application of the Karhunen–Loève procedure for the characterization of human faces. IEEE Trans Pattern Anal Mach Intell 12:831–835
Article Google Scholar
Pentland A, Turk M (1991) Eigenfaces for recognition. Cognit Neurosci 3:71–86
Article Google Scholar
Buhmann J, Konen M, Lades M, Lange M, Von Der Malsburg C, Vorbruggen JC, Wurtz RP (1993) Distortion invariant object recognition in the dynamic link architecture. IEEE Trans Comput 42:300–311
Article Google Scholar
Von der Malsburg C, Wiskott L (1996) Recognizing faces by dynamic link matching. Neuroimage 4(3):14–18
Article Google Scholar
Kawa H, Mitsumoto H, Tamura S (1996) Male/female identification from 8_6 very low resolution face images by neural network. Pattern Recognit 29:331–335
Article Google Scholar
Kanade T (1973) Picture processing by computer complex and recognition of human faces. Technical report, Department of Information Science, Kyoto University
Cox IJ, Ghosn J, Yianios PN (1996) Feature-based face recognition using mixture-distance. In: Computer vision and pattern recognition. San Francisco, CA, USA, pp 209–216
Chellappa R, Manjunath BS, Von der Malsburg C (1992) A feature based approach to face recognition. In: Proceedings of the IEEE CS conference on computer vision and pattern recognition. Champaign, IL, USA. pp 373–378
Akamatsu S, Fukamachi H, Masuri N, Sakaki T, Suenaga Y (1992) An accurate and robust face identification scheme. In: Proceedings of the international conference on pattern recognition. The Hague, The Netherlands, pp 217–220
Beymer DJ (1993) Face recognition under varying Pose. Technical Report 1461, MIT Artificial Intelligence Laboratory
Malsburg CVD, Maurer T (1996) Single-view based recognition of faces rotated in Depth. In: Proceedings of the international workshop on automatic face and gesture recognition, pp 176–181
Basri R, Ullman S (1991) Recognition by linear combinations of models. IEEE Trans Pattern Anal Mach Intell 13:992–1006
Article Google Scholar
Poggio T, Vetter T (1997) Linear object classes and image synthesis from a single example image. IEEE Trans Pattern Anal Mach Intell 19(7):733–742
Article Google Scholar
Fellous JM, von der Malsburg C, Viskott L (1997) Face recognition by elastic bunch graph matching. IEEE Trans Pattern Anal Mach Intell 19:775–779
Article Google Scholar
Belhumenur PN, Hepanha JP, Kriegman DJ (1997) Eigen faces vs fisher face: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Article Google Scholar
He X, Hu Y, Niyogi P, Yan S, Zhang HJ (2005) Face recognition using Laplacianfaces. IEEE Trans Pattern Anal Mach Intell 17(3):328–340
Google Scholar
Liu C, Teng X, Yu W (2006) Face recognition using discriminant locality preserving projections. Image Vis Comput 24(3):239–248
Article Google Scholar
Zhu L, Zhu S (2007) Face recognition based on orthogonal discriminant locality preserving projections. Neurocomputing 70(7–9):1543–1546
Article Google Scholar
Wang X, Yu X (2008) Uncorrelated discriminant locality preserving projections. IEEE Signal Process Lett 15(5):361–364
Google Scholar
Dai DQ, Yuen PC (2007) Face recognition by regularized discriminant analysis. IEEE Trans Syst Man Cybern B Cybern 37(4):1080–1085
Article Google Scholar
Lu J, Tan Y-P (2010) Regularized locality preserving projections and its extensions for face recognition. IEEE Trans Syst Man Cybern B Cybern 40(3):958–963
Article MathSciNet Google Scholar
Yang J, Yu H (2001) A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recognit 34(10):2067–2070
Article MATH Google Scholar
Jiang T, Li H, Zhang K (2006) Efficient and robust feature extraction by maximum margin criterion. IEEE Trans Neural Netw 17(1):157–165
Article Google Scholar
Yang J, Yang JY (2002) From image vector to matrix: a straightforward image projection technique–IMPCA vs. PCA. Pattern Recognit 35(9):1997–1999
Article MATH Google Scholar
Frangi F, Yang J, Yang JY, Zhang D (2004) Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26(1):131–137
Article Google Scholar
Li M, Yuan B (2005) 2D-LDA: a statistical linear discriminant analysis for image matrix. Pattern Recognit Lett 26(5):527–532
Article Google Scholar
Yang J, Yang JY, Yong X, Zhang DX (2005) Two-dimensional discriminant transform for face recognition. Pattern Recognit 38(7):1125–1129
Article MATH Google Scholar
Zhang D, Zhou ZH (2005) (2D) 2PCA: two-directional two-dimensional PCA for efficient face representation and recognition. Neurocomputing 69(1–3):224–231
Article Google Scholar
Hemantha Kumar G, Noushath S, Shivakumara P (2006) (2D)2LDA: an efficient approach for face recognition. Pattern Recognit 39(7):1396–1400
Article MATH Google Scholar
Ye J (2005) Generalized low rank approximations of matrices. Mach Learn 61(1–3):167–191
Article MATH Google Scholar
Janardan R, Li Q, Ye J (2005) Two-dimensional linear discriminant analysis. In: Advances in neural information processing systems, vol 17. MIT Press, Cambridge, pp 1569–1576
Wang K, Yang J, Zhang D, Zuo WM (2006) BDPCA plus LDA: a novel fast feature extraction technique for face recognition. IEEE Trans Syst Man Cybern B Cybern 36(4):946–953
Article Google Scholar
Du Shan, RababKreidieh Ward (2009) Improved face representation by nonuniform multilevel selection of Gabor convolution features. IEEE Trans Syst Man Cybern B Cybern 39(6):1408–1419
Article Google Scholar
Hu Q, Shiu S, Zhang L, Zhu P (2012) Multi-scale Patch based Collaborative Representation for Face Recognition with Margin Distribution Optimization. In: ECCV
Das A, Hanmandlu M (2011) Content-based image retrieval by information theoretic measure. Def Sci J 61(5):415–430
Article Google Scholar
Hanmandlu M, Mamata (2013) Robust ear based authentication using local principal independent components. Expert Syst Appl 40(16):6478–6490
Hanmandlu M, Mamata (2014) Robust authentication using the unconstrained infra-red face images. Expert Syst Appl 41(14):6494–6511
Kreinovich V, Pedrycz W, Skowron A (2008) Handbook of granular computing. Wiley, West Sussex
Google Scholar
Bargiela AA, Pedrycz W (2009) Human-centric information processing through granular modelling. Springer, Berlin
Book Google Scholar
Hanmandlu M, Jha D (2006) An optimal fuzzy system for color image enhancement. IEEE Trans Image Process 15(10):2956–2966
Article Google Scholar
Ahmad N, Azeem MF, Hanmandlu M (2003) Structure identification of generalized adaptive neuro-fuzzy inference systems. IEEE Trans Fuzzy Syst 11(5):666–681
Article Google Scholar
Hanmandlu M, Verma NK (2007) From a gaussian mixture model to non-additive fuzzy systems. IEEE Trans Fuzzy Syst 15(5):809–827
Article Google Scholar
Sayeed F, Hanmandlu M (2016) Three information set based feature types for the recognition of faces. Signal Image Video Process 10(2):327–334
Article Google Scholar
Grover J, Hanmandlu M (2015) Hybrid fusion of score level and adaptive fuzzy decision level fusions for the finger knuckle print based authentication. Appl Soft Comput 31:1–13
Article Google Scholar
Agarwal M, Hanmandlu M (2016) Representing uncertainty with information sets. IEEE Trans Fuzzy Syst 24(1):1–15
Article Google Scholar
Hanmandlu M, Mamta (2014) A new entropy function and a classifier for thermal face recognition. Eng Appl Artif Intell 36:269–286
Hanmandlu M, Mamta (2015) Multimodal biometric system built on the new entropy function for feature extraction and the refined scores as a classifier. Expert Syst Appl 42:3702–3723
Pep E, Klement EP, Mesiar R (2000) Triangular norms. Kluwer Academic Publications, The Netherlands
MATH Google Scholar
Samaria F, Harter A (1994) Parameterization of a stochastic model for human face identification. In: Proceedings of 2nd IEEE workshop on applications of computer vision. Sarasota, FL
Mukherjee A, Vidit J (2002) The Indian FaceDatabase. http://viswww.cs.umass.edu/~vidit/IndianFaceDatabase/,2002
Bruce V, Fogelman-Soulie F, Huang TS, Phillips PJ, Wechsler H (1998) Face recognition: from theory to applications, NATO ASI series F. Comput Syst Sci 163:446–456
MATH Google Scholar
Libor Spacek’s facial Image database. http://cswww.essex.ac.uk/mv/allfaces/faces95.html
Giraldi GA, Thomaz CE (2010) A new ranking method for principal components analysis and its application to face image analysis. Image Vis Comput 28(6):902–913
Article Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York
MATH Google Scholar
Duin RPW, Juszczak P, Paclik P, Pekalska E, Ridder D, de Tax DMJ, Verzakov S (2007) PRTools4.1, A Matlab toolbox for pattern recognition. Delft University of Technology
Chang C, Lin C (2011) LIBSVM: a library for support vector machines 27. ACM Trans Intell Syst Technol 2(27):1–27
Article Google Scholar

Download references

Author information

Authors and Affiliations

EC Department, ACE College of Engineering, Trivandrum, 695027, India
Farrukh Sayeed
EE Department, I.I.T. Delhi, Hauz Khas, New Delhi, India
Madasu Hanmandlu

Authors

Farrukh Sayeed
View author publications
You can also search for this author in PubMed Google Scholar
Madasu Hanmandlu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farrukh Sayeed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sayeed, F., Hanmandlu, M. Properties of information sets and information processing with an application to face recognition. Knowl Inf Syst 52, 485–507 (2017). https://doi.org/10.1007/s10115-016-1017-x

Download citation

Received: 05 February 2016
Accepted: 23 December 2016
Published: 09 January 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s10115-016-1017-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Properties of information sets and information processing with an application to face recognition

Abstract

Similar content being viewed by others

Three information set-based feature types for the recognition of faces

A Framework for Face Recognition Based on Fuzzy Minimal Structure Oscillation and Independent Component Analysis

An Evaluation of Fuzzy Measure for Face Recognition

1 Introduction

1.1 The motivation for the uncertainty representation

1.2 A brief literature on face recognition

2 The concept of information set and its properties

2.1 Derivation of information set

Definition

2.2 Properties of information set

2.2.1 The information set can be converted into different forms

2.2.2 Probability and possibility can be addressed very easily through information sets

2.2.3 The desired components can be captured from the information by a weighting function

2.2.4 The spatial and time variation of 1-D (signals) and 2-D (images) can be characterized effectively by the Information sets

2.2.5 The information sets make the fuzzy modeling easier in the absence of the output information

Lemma

2.2.6 The information sets can be extended to information rough and rough information sets

2.2.7 Information sets allow the application of agent theory

3 Higher form of information sets

3.1 Hanman filter

3.2 Hanman transforms

3.3 The new entropy function

4 Information processing

4.1 Inner product classifier

4.2 Hanman classifier and normed error classifier(NEC)

5 Results of face recognition

5.1 Face databases

5.2 A comparison with LBP, sift and other features

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation