Measuring consistency of two datasets using fuzzy techniques and the concept of indiscernibility: Application to human perceptions on fabrics

https://doi.org/10.1016/j.engappai.2014.07.010Get rights and content

Abstract

This paper presents an approach for developing a new consistency degree of two datasets, obtained from two different measuring systems on the same collection of items. In this approach, the concept of indiscernibility, frequently used in rough set approaches, is used to discover the classification consistency-based inclusion of one dataset to another. Next, in order to take into account the influence of neighboring relations of different data, we modify the previous index by proposing a fuzzy classification consistency-based inclusion degree. Also, the ordinal correlation between these two datasets, measured using a non-parametric method called Kendall׳s coefficient, is introduced. Finally, in order to create a reasonable integration of the previous two indices, a general consistency measure is constituted by introducing the expert knowledge into a fuzzy inference system. The overall procedure is believed to be capable of detecting nonlinear patterns lying beneath data while being safe to use a comparatively small number of experimental samples. Moreover, this new method can prevent the “black box” phenomenon encountered in many modeling techniques and produce robust and interpretable results. In practice, the proposed method is particularly significant for validating one measuring or evaluation system with respect to a standard reference. In order to validate the effectiveness of the proposed consistency degree, we apply it to study the relationship between tactile properties of a collection of fabric samples and their visual representations. The obtained results confirm that most of the tactile information can be perceived correctly by assessors through either video or image displays, while a better performance is detected in video scenarios.

Introduction

A great number of data mining methods have been developed for exploiting complex relations among multiple datasets. Nowadays, the most commonly used methods are based on statistics, including Linear Regression Analysis (Weisgerg, 2005), Principal Component Analysis (PCA) (Jolliffe, 2002), Multidimensional Scaling (Hollins et al., 1997, Picard, 2003), Multiple Factor Analysis (Howorth and Oliver, 1958, Le Dien, 2003) and various kinds of correlation coefficient analysis (Härdle and Simar, 2003). These methods are efficient in solving a lot of characterization and modeling problems from datasets due to their good capacity of identifying linear patterns from different information sources and then discovering correlations between data and between attributes from a big quantity of numerical data. And also for this reason, these methods have been widely employed to various research fields, including economics, medicine, biology, chemistry and engineering. (Agresti and Finlay, 1997).

However, modeling of relations between different datasets usually encounters problems of uncertainty and imprecision, and the classical methodologies are gradually showing their drawbacks in practice. First, when a problem is dealing with human knowledge, the concerned relations are often nonlinear. The application of the frequently used statistical techniques might cause important information loss due to their linearly structured models. Next, in many cases, there exists high uncertainty and imprecision in data structures due to non-unified linguistic human evaluation scores. But most of the classical analysis methods can only process perfect and complete numerical data without any uncertainty and imprecision. Next, the classical methods cannot always lead to precise and significant physical interpretation of data, and the obtained correlation results cannot be used to analyze all types of relations between datasets such as inclusion, causal and association relations. Finally, the classical methods often have strict requests on the size and distribution of the database. But collection of a great number of physically measured or human evaluation data is quite time-consuming and sometimes not practical for many research, for example in pilot studies. With a limited collection of samples, it is unlikely to obtain good fit modeling results using the classical methods.

In this situation, intelligent computational techniques, such as artificial neural network (ANNs) (Fausett, 1994), genetic algorithm (GA) (Goldberg, 1989), fuzzy logic (Zadeh, 1965, Sugeno and Yasukawa, 1993) and many hybrid applications of these tools (Ruan, 1997), have largely been applied to modeling and analysis with physical and human data. They have high capacity in (1) solving nonlinear problems, (2) dealing with both numerical and linguistic data, (3) modeling human expert reasoning so as to produce precise and straightforward interpretation of results, and (4) computing with relatively small sets of data and without need of any preliminary or additional information like probabilistic distributions in statistics. Of all these intelligent computational tools, the importance of the introduced notion of fuzzy set was properly realized by the research worker in all the branches of science and technology and has successfully been applied. Recently fuzzy set theory has been applied in matrix transformations by Tripathy and Baruah (2010), in mathematical analysis by Tripathy and Borgogain (2011), Tripathy et al. (2012), in topology by Tripathy and Debnath (2013), in mixed fuzzy topology by Tripathy and Ray (2012), in fuzzy logic by Sugeno and Yasukawa (1993), in rough mereology by Polkowski and Skowron (1994a) and many others.

In the current study, assuming that we have two datasets obtained from different measuring systems on one collection of items (products, consumers,…), we propose a novel method to measure how much the information quantity of one dataset is included in another one. It is particularly significant for validating one measuring or evaluation system with respect to a standard reference. For example, using this method, we can determine if a newly developed cheap measuring device can completely take into account the main features given by an old expensive measuring system. The proposed method was built using fuzzy techniques and the concept of indiscernibility of rough sets theories (Pawlak, 1982).

Rough set theory has been widely applied for measuring classification-based consistency or inclusion degree of two datasets (Qian et al., 2008, Xu et al., 2012). In practice, it is considered as more relevant for processing small sets of objects. However, in our study, there exist two following particularities different from the classical application scenarios of rough set theory: (1) the objects in our datasets can be fully arranged in order for each specific attribute while those in a general dataset can just be classified and (2) the objects in our datasets mostly correspond to real measuring data. Therefore, in this special context, the classical inclusion degree, dealing with classification of objects, has been modified by introducing fuzzy techniques in order to take into account of neighboring relations of objects and their relations with respect to normalized measuring scales. The crisp partition of objects is then transformed into a fuzzy partition of objects related to a set of normalized scores. Also, the general consistency has been obtained by combining the previous classification-based consistency and Kendall׳s coefficient-based ranking consistency. Therefore, the equivalence and neighboring relations of objects as well as ranks of objects in each dataset have been taken into account in this general consistency degree. The overall consistency criterion is believed to be capable of detecting nonlinear patterns lying beneath datasets while being safe to use a comparatively small number of experimental samples. Moreover, this new method can prevent the “black box” phenomenon encountered in many modeling techniques and produce robust and interpretable results.

Section snippets

Classification consistency (CCons)

The classification consistency-based (CCons) inclusion degree of one dataset to another one is defined by using the concept of indiscernibility of the rough sets theory, developed by Zdzislaw Pawlak in the early 1980s. It is a relatively new soft computing tool for solving imperfect data analysis. Rough set approach has become a popular mathematical framework in many research areas such as data mining, knowledge discovery from database, decision support, feature selection and pattern

Ranking consistency (RCons)

The previous fuzzy classification consistency measure CCons(C,D) quantifies the extent to which the classification of the condition attribute C is consistent with that of the decision attribute D. However, the ordinal consistency between the datasets is not well taken into consideration. For example, according to the idea of inclusion degree, the two classifications, {(1 2 3) (4 5)} and {(3 2 1) (5 4)}, may have no difference, since according to the available classification criteria, the inclusion

Context of the application

According to many previous studies on human neuropsychology (Klatzky and Lederman, 2010, James and Kim, 2010, Newell, 2010), most of the tactile information in our daily life can be interpreted through human׳s visual perception, which is in accordance with our real-life experience. For any specific object, we can judge its size, weight and even its texture without really touching it. In fact, our brain has a very sophisticated memory association mechanism permitting us to perceive the outside

Conclusion

In this paper, a novel consistency degree is proposed to study the consistency of two datasets in order to determine if one physical measuring or human evaluation system can be completely or partly replaced by another one for a specific collection of items. This method is initially developed according to the idea of the inclusion degree used in rough set theory to measure the classification consistency of two datasets. Next, in order to take into account neighboring relations of different data,

References (42)

  • D. Dubois et al.

    Fuzzy Information Engineering: A Guide Tour of Applications

    (1996)
  • Fabric Hand: Guidelines for the Subjective Evaluation of 2007 AATCC Technical Manual, pp....
  • L. Fausett

    Fundamentals of Neural Networks: Architectures, Algorithms, and Applications

    (1994)
  • D. Goldberg

    Genetic Algorithm in Search, Optimization and Machine Learning

    (1989)
  • W. Härdle et al.

    Applied Multivariate Statistical Analysis

    (2003)
  • M. Hollins et al.

    Perceptual dimensions of tactile surface texture: a multidimensional scaling analysis

    Percept. Psychophys.

    (1997)
  • W.S. Howorth et al.

    The application of multiple factor analysis to the assessment of fabric handle

    J. Text. Inst.

    (1958)
  • T.W. James et al.

    Dorsal and Ventral Cortical Pathways for Visuo-haptic Shape Integration Revealed Using fMRI. Multisensory Object Perception in the Primate Brain

    (2010)
  • I.T. Jolliffe

    Principal Component Analysis

    (2002)
  • R.L. Klatzky et al.

    Multisensory Texture Perception. Multisensory Object Perception in the Primate Brain

    (2010)
  • O. Lahave et al.

    Construction of cognitive maps of unknown spaces using a multi-sensory virtual environment for people who are blind

    Comput. Hum. Behav.

    (2008)
  • Cited by (14)

    • A study on the subjective feeling affecting tactile satisfaction of leather in automobile: A structural equation modeling approach

      2021, International Journal of Industrial Ergonomics
      Citation Excerpt :

      The evaluation methods are mainly semantic differential, similarity estimation, classifications, etc. (Okamoto et al., 2013), and it is important to adopt the most suitable method considering the experimental methods and the aim of study. There have been many studies endeavoring at investigation of optimal tactile qualities and classifications by material types (Chen et al., 2009; Semnani and Vadood, 2010; Xue et al., 2014). In terms of leathers, Hope et al. (2013) clarified in their study that leather is generally regarded as a key material used in luxury products, and accordingly leathers are prevalently facilitated for vehicle interiors in order to enhance the overall quality of vehicles such as luxuriousness.

    • Information fusion and numerical characterization of a multi-source information system

      2018, Knowledge-Based Systems
      Citation Excerpt :

      It is pointed out that some types of covering approximations can be measured by evidence theory, but some others do not have the same characteristic. Similar to evidence theory, information entropy theory is another theoretical method for uncertainty measurement [10,18–20,42]. Therefore, based on the concept of inclusion degree in rough set theory and information entropy, Xue et al. [42] presented an approach to develop a new consistency degree of two datasets.

    • A rule-based support system for dissonance discovery and control applied to car driving

      2016, Expert Systems with Applications
      Citation Excerpt :

      Such affordances lead to dissonance. Another kind of dissonance relates to inconsistency between rules, data, beliefs, intentions, perceptions, interpretations or decisions for instance (Ben-David & Jagerman, 1997; Dash, Dash, Dehuri, Cho, & Wang, 2013; Hunter & Summerton, 2006; Ma, Zhang, & Lu, 2010; McBriar et al., 2003; Telci, Maden, & Kantur, 2011; Wu & Liu, 2014; Xue, Zeng, Koehl, & Chen, 2014). Automation surprise, barrier removal and cognitive blindness are examples of such inconsistency.

    • Consistency of incomplete data

      2015, Information Sciences
      Citation Excerpt :

      Similarly, some imputation methods, taking into account inconsistency, were discussed in [2]. Consistency of data based on fuzzy set theory was presented in [35]. An idea of variable consistency, for complete data sets, was introduced in [37].

    View all citing articles on Scopus
    View full text