Skip to main content

Algorithms for Similarity Relation Learning from High Dimensional Data

  • Chapter
Transactions on Rough Sets XVII

Part of the book series: Lecture Notes in Computer Science ((TRS,volume 8375))

Abstract

The notion of similarity plays an important role in machine learning and artificial intelligence. It is widely used in tasks related to a supervised classification, clustering, an outlier detection and planning. Moreover, in domains such as information retrieval or case-based reasoning, the concept of similarity is essential as it is used at every phase of the reasoning cycle. The similarity itself, however, is a very complex concept that slips out from formal definitions. A similarity of two objects can be different depending on a considered context. In many practical situations it is difficult even to evaluate the quality of similarity assessments without considering the task for which they were performed. Due to this fact the similarity should be learnt from data, specifically for the task at hand. This paper presents a research on the problem of similarity learning, which is a part of author’s PHD dissertation. It describes a similarity model, called Rule-Based Similarity, and shows algorithms for constructing this model from available data. The model utilizes notions from the rough set theory to derive a similarity function that allows to approximate the similarity relation in a given context. It is largely inspired by the idea of Tversky’s feature contrast model and it has several analogical properties. In the paper, those theoretical properties are described and discussed. Moreover, the paper presents results of experiments on real-life data sets, in which a quality of the proposed model is thoroughly evaluated and compared with the state-of-the-art algorithms.

The research was supported by the grants DEC-2011/01/B/ST6/03867 and DEC-2012/05/B/ST6/03215 from the National Research Centre, and the National Centre for Research and Development (NCBiR) under the grant SP/I/1/77065/10 by the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pinker, S.: How the mind works. W. W. Norton (1998)

    Google Scholar 

  2. Schank, R.C.: Dynamic Memory: A Theory of Learning in Computers and People. Cambridge University Press, New York (1982)

    Google Scholar 

  3. Thagard, P.: 10. In: Mind: Introduction to Cognitive Science. Segunda edn. MIT Press, Cambridge (2005)

    Google Scholar 

  4. Hahn, U., Chater, N.: Understanding similarity: A joint project for psychology, case based reasoning, and law. Artificial Intelligence Review 12, 393–427 (1998)

    Google Scholar 

  5. Tversky, A.: Features of similarity. Psychological Review 84, 327–352 (1977)

    Google Scholar 

  6. Aamodt, A., Plaza, E.: Case-based reasoning: Foundational issues, methodological variations, and system approaches. Artificial Intelligence Communications 7(1), 39–59 (1994)

    Google Scholar 

  7. Mitchell, T.M.: Machine Learning. McGraw Hill series in computer science. McGraw-Hill (1997)

    Google Scholar 

  8. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Boston (2006)

    Google Scholar 

  9. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is ”nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)

    Google Scholar 

  10. Krantz, D.H., Tversky, A.: Similarity of rectangles: An analysis of subjective dimensions. Journal of Mathematical Psychology 12(1), 4–34 (1975)

    MATH  Google Scholar 

  11. Tversky, A., Krantz, D.H.: The dimensional representation and the metric structure of similarity data. Journal of Mathematical Psychology 7(3), 572–596 (1970)

    MATH  MathSciNet  Google Scholar 

  12. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), pp. 539–546. IEEE Computer Society, Washington, DC (2005)

    Google Scholar 

  13. Hechenbichler, K., Schliep, K.: Weighted k-Nearest-Neighbor Techniques and Ordinal Classification (October 2004), a Discussion paper

    Google Scholar 

  14. Martín-Merino, M., De Las Rivas, J.: Improving k-NN for human cancer classification using the gene expression profiles. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 107–118. Springer, Heidelberg (2009)

    Google Scholar 

  15. Nguyen, S.H.T.: Regularity analysis and its applications in data mining. PhD thesis, Warsaw University, Faculty of Mathematics, Informatics and Mechanics, Part II: Relational Patterns (1999)

    Google Scholar 

  16. Stahl, A., Gabel, T.: Using evolution programs to learn local similarity measures. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 537–551. Springer, Heidelberg (2003)

    Google Scholar 

  17. Wojna, A.: Analogy-based reasoning in classifier construction. PhD thesis, Warsaw University, Faculty of Mathematics, Informatics and Mechanics (2004)

    Google Scholar 

  18. Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.J.: Distance metric learning with application to clustering with side-information. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, NIPS 2002, December 9-14, pp. 505–512. MIT Press, Vancouver (2002)

    Google Scholar 

  19. Xiong, H., Chen, X.W.: Kernel-based distance metric learning for microarray data classification. BMC Bioinformatics 7(299) (2006) (online)

    Google Scholar 

  20. Gati, I., Tversky, A.: Studies of similarity. In: Rosch, E., Lloyd, B. (eds.) Cognition and Categorization, pp. 81–99. L. Erlbaum Associates, Hillsdale (1978)

    Google Scholar 

  21. Goldstone, R., Medin, D., Gentner, D.: Relational similarity and the nonindependence of features in similarity judgments. Cognitive Psychology 23, 222–262 (1991)

    Google Scholar 

  22. Sebag, M., Schoenauer, M.: A rule-based similarity measure. In: Wess, S., Richter, M., Althoff, K.-D. (eds.) EWCBR 1993. LNCS, vol. 837, pp. 119–130. Springer, Heidelberg (1994)

    Google Scholar 

  23. Janusz, A.: Similarity relation in classification problems. In: Chan, C.-C., Grzymala-Busse, J.W., Ziarko, W.P. (eds.) RSCTC 2008. LNCS (LNAI), vol. 5306, pp. 211–222. Springer, Heidelberg (2008)

    Google Scholar 

  24. Janusz, A.: Learning a Rule-Based Similarity: A comparison with the Genetic Approach. In: Proceedings of the Workshop on Concurrency, Specification and Programming (CS&P 2009), Kraków-Przegorzały, Poland, September 28-30, vol. 1, pp. 241–252 (2009)

    Google Scholar 

  25. Janusz, A.: Rule-based similarity for classification. In: Proceedings of the WI/IAT 2009 Workshops, Milan, Italy, September 15-18, pp. 449–452. IEEE Computer Society, Los Alamitos (2009)

    Google Scholar 

  26. Janusz, A.: Discovering rules-based similarity in microarray data. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. LNCS, vol. 6178, pp. 49–58. Springer, Heidelberg (2010)

    Google Scholar 

  27. Janusz, A.: Utilization of dynamic reducts to improve performance of the rule-based similarity model for highly-dimensional data. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology - Workshops, pp. 432–435. IEEE (2010)

    Google Scholar 

  28. Janusz, A.: Dynamic rule-based similarity model for DNA microarray data. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets XV. LNCS, vol. 7255, pp. 1–25. Springer, Heidelberg (2012)

    Google Scholar 

  29. Janusz, A., Ślęzak, D., Nguyen, H.S.: Unsupervised similarity learning from textual data. Fundamenta Informaticae 119(3)

    Google Scholar 

  30. Janusz, A.: Combining multiple classification or regression models using genetic algorithms. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 130–137. Springer, Heidelberg (2010)

    Google Scholar 

  31. Janusz, A.: Combining multiple predictive models using genetic algorithms. Intelligent Data Analysis 16(5), 763–776 (2012)

    Google Scholar 

  32. Janusz, A., Nguyen, H.S., Ślęzak, D., Stawicki, S., Krasuski, A.: JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 422–431. Springer, Heidelberg (2012)

    Google Scholar 

  33. Janusz, A., Ślęzak, D.: Utilization of attribute clustering methods for scalable computation of reducts from high-dimensional data. In: Ganzha, M., Maciaszek, L.A., Paprzycki, M. (eds.) Proceedings of Federated Conference on Computer Science and Information Systems - FedCSIS 2012, Wrocław, Poland, September 9-12, pp. 295–302 (2012)

    Google Scholar 

  34. Janusz, A., Stawicki, S.: Applications of approximate reducts to the feature selection problem. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS, vol. 6954, pp. 45–50. Springer, Heidelberg (2011)

    Google Scholar 

  35. Kurach, K., Pawłowski, K., Romaszko, Ł., Tatjewski, M., Janusz, A., Nguyen, H.S.: An ensemble approach to multi-label classification of textual data. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS, vol. 7713, pp. 306–317. Springer, Heidelberg (2012)

    Google Scholar 

  36. Ślęzak, D., Janusz, A.: Ensembles of bireducts: Towards robust classification and simple representation. In: Kim, T.-H., Adeli, H., Slezak, D., Sandnes, F.E., Song, X., Chung, K.-I., Arnett, K.P. (eds.) FGIT 2011. LNCS, vol. 7105, pp. 64–77. Springer, Heidelberg (2011)

    Google Scholar 

  37. Wojnarski, M., Janusz, A., Nguyen, H.S., Bazan, J., Luo, C., Chen, Z., Hu, F., Wang, G., Guan, L., Luo, H., Gao, J., Shen, Y., Nikulin, V., Huang, T.-H., McLachlan, G.J., Bošnjak, M., Gamberger, D.: RSCTC’2010 discovery challenge: Mining DNA microarray data for medical diagnosis and treatment. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 4–19. Springer, Heidelberg (2010)

    Google Scholar 

  38. Janusz, A., Świeboda, W., Krasuski, A., Nguyen, H.S.: Interactive document indexing method based on explicit semantic analysis. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 156–165. Springer, Heidelberg (2012)

    Google Scholar 

  39. Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H.S., Bazan, J.G., Skowron, A.: Semantic analytics of pubMed content. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 63–74. Springer, Heidelberg (2011)

    Google Scholar 

  40. Szczuka, M., Janusz, A., Herba, K.: Clustering of rough set related documents with use of knowledge from dBpedia. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS, vol. 6954, pp. 394–403. Springer, Heidelberg (2011)

    Google Scholar 

  41. Pawlak, Z.: Information systems, theoretical foundations. Information Systems 3(6), 205–218 (1981)

    Google Scholar 

  42. Pawlak, Z., Skowron, A.: Rough sets and boolean reasoning. Information Sciences 177(1), 41–73 (2007)

    MATH  MathSciNet  Google Scholar 

  43. Pawlak, Z., Skowron, A.: Rough sets: Some extensions. Information Sciences 177(1), 28–40 (2007)

    MATH  MathSciNet  Google Scholar 

  44. Pawlak, Z., Skowron, A.: Rudiments of rough sets. Information Sciences 177(1), 3–27 (2007)

    MATH  MathSciNet  Google Scholar 

  45. Bazan, J.: Hierarchical classifiers for complex spatio-temporal concepts. In: Peters, J.F., Skowron, A., Rybiński, H. (eds.) Transactions on Rough Sets IX. LNCS, vol. 5390, pp. 474–750. Springer, Heidelberg (2008)

    Google Scholar 

  46. Ngo, C.L., Nguyen, H.S.: A tolerance rough set approach to clustering web search results. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 515–517. Springer, Heidelberg (2004)

    Google Scholar 

  47. Pawlak, Z.: Rough sets, rough relations and rough functions. Fundamenta Informaticae 27(2-3), 103–108 (1996)

    MATH  MathSciNet  Google Scholar 

  48. Peters, G., Lingras, P., Ślęzak, D., Yao, Y.: Rough Sets: Selected Methods and Applications in Management and Engineering. In: Advanced Information and Knowledge Processing. Springer, London (2012)

    Google Scholar 

  49. Sikora, M., Sikora, B.: Rough natural hazards monitoring. In: Peters, G., Lingras, P., Ślęzak, D., Yao, Y. (eds.) Selected Methods and Applications of Rough Sets in Management and Engineering. Advanced Information and Knowledge Processing, pp. 163–179. Springer, London (2012)

    Google Scholar 

  50. Nguyen, S.H., Bazan, J., Skowron, A., Nguyen, H.S.: Layered learning for concept synthesis. In: Peters, J.F., Skowron, A., Grzymała-Busse, J.W., Kostek, B.z., Swiniarski, R.W., Szczuka, M.S. (eds.) Transactions on Rough Sets I. LNCS, vol. 3100, pp. 187–208. Springer, Heidelberg (2004)

    Google Scholar 

  51. Skowron, A., Stepaniuk, J.: Approximation of relations. In: RSKD 1993: Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, pp. 161–166. Springer, London (1994)

    Google Scholar 

  52. Szczuka, M.S., Skowron, A., Stepaniuk, J.: Function approximation and quality measures in rough-granular systems. Fundamenta Informaticae 109(3), 339–354 (2011)

    MATH  MathSciNet  Google Scholar 

  53. Gomolinska, A.: Approximation spaces based on relations of similarity and dissimilarity of objects. Fundamenta Informaticae 79(3-4), 319–333 (2007)

    MATH  MathSciNet  Google Scholar 

  54. Greco, S., Matarazzo, B., Słowiński, R.: Fuzzy similarity relation as a basis for rough approximations. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 283–289. Springer, Heidelberg (1998)

    Google Scholar 

  55. Polkowski, L.T., Skowron, A., Zytkow, J.M.: Rough foundations for rough sets. In: Lin, T.Y. (ed.) Rough Sets and Soft Computing. Conference Proceedings, pp. 142–149. San Jose State University, San Jose (1994)

    Google Scholar 

  56. Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundamenta Informaticae 27(2/3), 245–253 (1996)

    MATH  MathSciNet  Google Scholar 

  57. Słowiński, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. In: Wang, P. (ed.) Advances in Machine Intelligence and Soft-Computing, vol. IV, pp. 17–33. Duke University Press, Durham (1997)

    Google Scholar 

  58. Słowiński, R., Vanderpooten, D.: A generalized definition of rough approximations based on similarity. IEEE Transactions on Data and Knowledge Engineering 12, 331–336 (2000)

    Google Scholar 

  59. Yao, Y.: Semantics of fuzzy sets in rough set theory. In: Peters, J.F., Skowron, A., Dubois, D., Grzymała-Busse, J.W., Inuiguchi, M., Polkowski, L. (eds.) Transactions on Rough Sets II. LNCS, vol. 3135, pp. 297–318. Springer, Heidelberg (2004)

    Google Scholar 

  60. Hu, X., Cercone, N.: Rough sets similarity-based learning from databases. In: KDD, pp. 162–167 (1995)

    Google Scholar 

  61. Maurer, A.: Learning similarity with operator-valued large-margin classifiers. Journal of Machine Learning Research 9, 1049–1082 (2008)

    MATH  Google Scholar 

  62. Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: A tutorial (1998)

    Google Scholar 

  63. Dubois, D., Prade, H.: Rough fuzzy sets and fuzzy rough sets. International Journal of General Systems 17(2-3), 191–209 (1990)

    MATH  Google Scholar 

  64. Pal, S.K.: Soft data mining, computational theory of perceptions, and rough-fuzzy approach. Information Sciences 163(1-3), 5–12 (2004)

    Google Scholar 

  65. Pal, S.K., Meher, S.K., Dutta, S.: Class-dependent rough-fuzzy granular space, dispersion index and classification. Pattern Recognition 45(7), 2690–2707 (2012)

    Google Scholar 

  66. Zadeh, L.A.: Fuzzy sets. Information and Control 8(3), 338–353 (1965)

    MATH  MathSciNet  Google Scholar 

  67. Świeboda, W., Nguyen, H.S.: Rough Set Methods for Large and Sparse Data in EAV Format. In: 2012 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), Ho Chi Minh City, Vietnam, February 27-March 1, pp. 1–6. IEEE (2012)

    Google Scholar 

  68. Greco, S., Matarazzo, B., Słowiński, R.: Handling missing values in rough set analysis of multi-attribute and multi-criteria decision problems. In: Zhong, N., Skowron, A., Ohsuga, S. (eds.) RSFDGrC 1999. LNCS (LNAI), vol. 1711, pp. 146–157. Springer, Heidelberg (1999)

    Google Scholar 

  69. Latkowski, R.: Flexible indiscernibility relations for missing attribute values. Fundamenta Informaticae 67(1-3), 131–147 (2005)

    MATH  MathSciNet  Google Scholar 

  70. Stefanowski, J., Tsoukiàs, A.: Incomplete information tables and rough classification. Computational Intelligence 17(3), 545–566 (2001)

    Google Scholar 

  71. Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Lin, T.Y., Ohsuga, S., Liau, C.J., Hu, X. (eds.) Foundations and Novel Approaches in Data Mining. SCI, vol. 9, pp. 197–212. Springer, Heidelberg (2006)

    Google Scholar 

  72. Grzymala-Busse, J.W., Rzasa, W.: Local and global approximations for incomplete data. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS (LNAI), vol. 4259, pp. 244–253. Springer, Heidelberg (2006)

    Google Scholar 

  73. Skowron, A., Stepaniuk, J., Świniarski, R.W.: Modeling rough granular computing based on approximation spaces. Information Sciences 184(1), 20–43 (2012)

    Google Scholar 

  74. Pawlak, Z.: Decision logik. Bulletin of the EATCS 44, 201–225 (1991)

    MATH  Google Scholar 

  75. Delimata, P., Moshkov, M.J., Skowron, A., Suraj, Z.: Inhibitory Rules in Data Analysis: A Rough Set Approach. SCI, vol. 163. Springer (2009)

    Google Scholar 

  76. An, A., Cercone, N.: Rule quality measures for rule induction systems: Description and evaluation. Computational Intelligence 17(3), 409–424 (2001)

    Google Scholar 

  77. Dean, P., Famili, A.: Comparative performance of rule quality measures in an induction system. Applied Intelligence 7, 113–124 (1997)

    Google Scholar 

  78. Lavrač, N., Flach, P.A., Zupan, B.: Rule Evaluation Measures: A Unifying View. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 174–185. Springer, Heidelberg (1999)

    Google Scholar 

  79. Džeroski, S., Cestnik, B., Petrovski, I.: Using the m-estimate in rule induction. Journal of Computing and Information Technology 1(1), 37–46 (1993)

    Google Scholar 

  80. Pawlak, Z.: Rough sets - Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers (1991)

    Google Scholar 

  81. Modrzejewski, M.: Feature selection using rough sets theory. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 213–226. Springer, Heidelberg (1993)

    Google Scholar 

  82. Nguyen, H.S., Skowron, A.: Boolean reasoning for feature extraction problems. In: Raś, Z.W., Skowron, A. (eds.) ISMIS 1997. LNCS, vol. 1325, pp. 117–126. Springer, Heidelberg (1997)

    Google Scholar 

  83. Zhong, N., Dong, J., Ohsuga, S.: Using rough sets with heuristics for feature selection. Journal of Intelligent Information Systems 16(3), 199–214 (2001)

    MATH  Google Scholar 

  84. Katzberg, J.D., Ziarko, W.: Variable precision rough sets with asymmetric bounds. In: Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, RSKD 1993, pp. 167–177. Springer, London (1994)

    Google Scholar 

  85. Ziarko, W.: Variable precision rough set model. Journal of Computer and System Sciences 46, 39–59 (1993)

    MATH  MathSciNet  Google Scholar 

  86. Pawlak, Z.: Rough sets: present state and the future. Foundations of Computing and Decision Sciences 18(3-4), 157–166 (1993)

    MATH  MathSciNet  Google Scholar 

  87. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  88. Guyon, I., et al.: Feature Extraction: Foundations and Applications. Studies in Fuzziness and Soft Computing. Springer (August 2006)

    Google Scholar 

  89. Nguyen, H.S., Nguyen, S.H., Skowron, A.: Searching for features defined by hyperplanes. In: Michalewicz, M., Raś, Z.W. (eds.) ISMIS 1996. LNCS, vol. 1079, pp. 366–375. Springer, Heidelberg (1996)

    Google Scholar 

  90. Valdés, J., Barton, A.: Relevant attribute discovery in high dimensional data: Application to breast cancer gene expressions, 482–489 (2006)

    Google Scholar 

  91. Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems, pp. 331–362. Kluwer, Dordrecht (1992)

    Google Scholar 

  92. Nguyen, H.S.: On the decision table with maximal number of reducts. Electronic Notes in Theoretical Computer Science 82(4), 198–205 (2003)

    Google Scholar 

  93. Ślęzak, D.: Various approaches to reasoning with frequency based decision reducts: a survey, pp. 235–285. Physica-Verlag GmbH, Heidelberg (2000)

    Google Scholar 

  94. Ślęzak, D.: Rough sets and functional dependencies in data: Foundations of association reducts. In: Gavrilova, M.L., Kenneth Tan, C.J., Wang, Y., Chan, K.C.C. (eds.) Transactions on Computational Science V. LNCS, vol. 5540, pp. 182–205. Springer, Heidelberg (2009)

    Google Scholar 

  95. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)

    MATH  Google Scholar 

  96. Nguyen, H.S.: Approximate boolean reasoning: Foundations and applications in data mining. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets V. LNCS, vol. 4100, pp. 334–506. Springer, Heidelberg (2006)

    Google Scholar 

  97. Nguyen, H.S., Ślęzak, D.: Approximate reducts and association rules. In: Zhong, N., Skowron, A., Ohsuga, S. (eds.) RSFDGrC 1999. LNCS (LNAI), vol. 1711, pp. 137–145. Springer, Heidelberg (1999)

    Google Scholar 

  98. Ślęzak, D.: Approximate reducts in decision tables. In: Proceedings of IPMU 1996 (1996)

    Google Scholar 

  99. Ślęzak, D.: Approximate entropy reducts. Fundamenta Informaticae 53(3-4), 365–390 (2002)

    MathSciNet  Google Scholar 

  100. Bazan, J.G., Skowron, A., Synak, P.: Dynamic reducts as a tool for extracting laws from decisions tables. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1994. LNCS, vol. 869, pp. 346–355. Springer, Heidelberg (1994)

    Google Scholar 

  101. Bazan, J.G.: A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision tables. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 2: Applications, Case Studies and Software Systems, pp. 321–365. Physica Verlag (1998)

    Google Scholar 

  102. Wróblewski, J.: Ensembles of classifiers based on approximate reducts. Fundamenta Informaticae 47(3-4), 351–360 (2001)

    MATH  MathSciNet  Google Scholar 

  103. Ślęzak, D., Widz, S.: Is it important which rough-set-based classifier extraction and voting criteria are applied together? In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 187–196. Springer, Heidelberg (2010)

    Google Scholar 

  104. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36(1-2), 105–139 (1999)

    Google Scholar 

  105. Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)

    Google Scholar 

  106. Stefanowski, J.: An experimental study of methods combining multiple classifiers - diversified both by feature selection and bootstrap sampling. In: Atanassov, K.T., Kacprzyk, J., Krawczak, M., Szmidt, E. (eds.) Issues in the Representation and Processing of Uncertain and Imprecise Information, pp. 337–354. Akademicka Oficyna Wydawnicza EXIT, Warsaw (2005)

    Google Scholar 

  107. Smyth, B., McClave, P.: Similarity vs. diversity. In: Aha, D.W., Watson, I. (eds.) ICCBR 2001. LNCS (LNAI), vol. 2080, pp. 347–361. Springer, Heidelberg (2001)

    Google Scholar 

  108. Husserl, E.: The Crisis of European Sciences and Transcendental Phenomenology. Northwestern University Press, Evanston (1970) German original written in 1937

    Google Scholar 

  109. Schütz, A.: The Phenomenology of the Social World. Northwestern University Press, Evanston (1967)

    Google Scholar 

  110. Coomans, D., Massart, D.: Alternative k-nearest neighbour rules in supervised pattern recognition: Part 1. k-nearest neighbour classification by using alternative voting rules. Analytica Chimica Acta 136, 15–27 (1982)

    Google Scholar 

  111. Patrick, E.A., Fischer III, F.P.: A generalized k-nearest neighbor rule. Information and Control 16(2), 128–152 (1970)

    MATH  MathSciNet  Google Scholar 

  112. Basu, S.: Semi-supervised Clustering: Probabilistic Models, Algorithms and Experiments. PhD thesis, The University of Texas at Austin (2005)

    Google Scholar 

  113. Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.: Information retrieval by semantic similarity. Int. Journal on Semantic Web and Information Systems (IJSWIS). Special Issue of Multimedia Semantics 3(3), 55–73 (2006)

    Google Scholar 

  114. Rinaldi, A.M.: An ontology-driven approach for semantic information retrieval on the web. ACM Transactions on Internet Technology 9, 10:1–10:24 (2009)

    Google Scholar 

  115. Feldman, R., Sanger, J. (eds.): The Text Mining Handbook. Cambridge University Press (2007)

    Google Scholar 

  116. Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. International Journal of Intelligent Systems 17, 199–212 (2002)

    MATH  Google Scholar 

  117. Janusz, A.: A similarity relation in machine learning. Master’s thesis, University Warsaw, Faculty of Mathematics, Informatics and Mechanics (2007) (in Polish)

    Google Scholar 

  118. Beals, R., Krantz, D.H., Tversky, A.: Foundations of multidimensional scaling. Psychological Review 75(2), 127–142 (1968)

    MATH  Google Scholar 

  119. Bazan, J.: Behavioral pattern identification through rough set modeling. Fundamenta Informaticae 72(1–3), 37–50 (2006)

    MATH  Google Scholar 

  120. Bazan, J., Kruczek, P., Bazan-Socha, S., Skowron, A., Pietrzyk, J.J.: Automatic planning of treatment of infants with respiratory failure through rough set modeling. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS (LNAI), vol. 4259, pp. 418–427. Springer, Heidelberg (2006)

    Google Scholar 

  121. Kumar, N., Lolla, N., Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Time-series bitmaps: a practical visualization tool for working with large time series databases. In: SIAM 2005 Data Mining Conference, pp. 531–535. SIAM (2005)

    Google Scholar 

  122. Strong, G., Gong, M.: Similarity-based image organization and browsing using multi-resolution self-organizing map. Image Vision Comput. 29(11), 774–786 (2011)

    Google Scholar 

  123. Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and Applications. Springer (2005)

    Google Scholar 

  124. Claveau, V.: IRISA Participation in JRS 2012 Data-Mining Challenge: Lazy-Learning with Vectorization. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 447–454. Springer, Heidelberg (2012)

    Google Scholar 

  125. Vempala, S.: The Random Projection Method. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. American Mathematical Society (2004)

    Google Scholar 

  126. Greco, S., Matarazzo, B., Słowiński, R.: Dominance-based rough set approach to case-based reasoning. In: Torra, V., Narukawa, Y., Valls, A., Domingo-Ferrer, J. (eds.) MDAI 2006. LNCS (LNAI), vol. 3885, pp. 7–18. Springer, Heidelberg (2006)

    Google Scholar 

  127. Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Interscience, New York (1990)

    Google Scholar 

  128. Böhm, C., Faloutsos, C., Plant, C.: Outlier-robust clustering using independent components. In: SIGMOD Conference, pp. 185–198 (2008)

    Google Scholar 

  129. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of The Twentieth International Joint Conference for Artificial Intelligence, Hyderabad, India, pp. 1606–1611 (2007)

    Google Scholar 

  130. Ślęzak, D.: Rough sets and few-objects-many-attributes problem: The case study of analysis of gene expression data sets. Frontiers in the Convergence of Bioscience and Information Technologies, 437–442 (2007)

    Google Scholar 

  131. Deutsch, J.M.: Evolutionary algorithms for finding optimal gene sets in microarray prediction. BMC Bioinformatics 19(1), 45–52 (2003)

    Google Scholar 

  132. Jirapech-Umpai, T., Aitken, S.: Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6(148) (2005) (online)

    Google Scholar 

  133. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer (October 2002)

    Google Scholar 

  134. John, G.H., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problem. In: Proceeding of 11th International Conference on Machine Learning, pp. 121–129. Morgan Kaufmann (1994)

    Google Scholar 

  135. Hall, M.: Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato (1999)

    Google Scholar 

  136. Liao, C., Li, S., Luo, Z.: Gene selection using wilcoxon rank sum test and support vector machine for cancer classification. In: Wang, Y., Cheung, Y.-m., Liu, H. (eds.) CIS 2006. LNCS (LNAI), vol. 4456, pp. 57–66. Springer, Heidelberg (2007)

    Google Scholar 

  137. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1226–1238 (2005)

    Google Scholar 

  138. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, ML 1992, pp. 249–256. Morgan Kaufmann Publishers Inc., San Francisco (1992)

    Google Scholar 

  139. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the 2003 IEEE Bioinformatics Conference, pp. 523–528 (2003)

    Google Scholar 

  140. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    MATH  Google Scholar 

  141. Dramiński, M., Kierczak, M., Koronacki, J., Komorowski, J.: Monte Carlo Feature Selection and Interdependency Discovery in Supervised Classification. In: Koronacki, J., Raś, Z.W., Wierzchoń, S.T., Kacprzyk, J. (eds.) Advances in Machine Learning II. SCI, vol. 263, pp. 371–385. Springer, Heidelberg (2010)

    Google Scholar 

  142. Marill, T., Green, D.: On the effectiveness of receptors in recognition systems. IEEE Transactions on Information Theory 9(1), 11–17 (1963)

    Google Scholar 

  143. Whitney, A.W.: A Direct Method of Nonparametric Measurement Selection. IEEE Transactions on Computers 20, 1100–1103 (1971)

    MATH  MathSciNet  Google Scholar 

  144. Siedlecki, W., Sklansky, J.: Handbook of pattern recognition & computer vision, pp. 63–87. World Scientific Publishing Co., Inc., River Edge (1993)

    Google Scholar 

  145. Furey, T.S., Duffy, N., David, W., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data (2000)

    Google Scholar 

  146. Vapnik, V.N.: The nature of statistical learning theory. Springer-Verlag New York, Inc., New York (1995)

    MATH  Google Scholar 

  147. Schölkopf, B.: The kernel trick for distances. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA, pp. 301–307. MIT Press (2000)

    Google Scholar 

  148. Graupe, D.: Principles of Artificial Neural Networks, 2nd edn. World Scientific Publishing Co., Inc., River Edge (2007)

    MATH  Google Scholar 

  149. Wojnarski, M.: LTF-C: Architecture, training algorithm and applications of new neural classifier. Fundamenta Informaticae 54(1), 89–105 (2003)

    MATH  MathSciNet  Google Scholar 

  150. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer (1996)

    Google Scholar 

  151. Skowron, A., Stepaniuk, J., Peters, J.F., Swiniarski, R.W.: Calculi of approximation spaces. Fundamenta Informaticae 72(1-3), 363–378 (2006)

    MATH  MathSciNet  Google Scholar 

  152. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Google Scholar 

  153. United States National Library of Medicine: Introduction to MeSH - 2011 (2011), http://www.nlm.nih.gov/mesh/introduction.html

  154. Nguyen, H.S.: On efficient handling of continuous attributes in large data bases. Fundamenta Informaticae 48(1), 61–81 (2001)

    MATH  MathSciNet  Google Scholar 

  155. Jensen, R., Shen, Q.: New approaches to fuzzy-rough feature selection. IEEE Transactions on Fuzzy Systems 17(4), 824–838 (2009)

    Google Scholar 

  156. Ganter, B., Stumme, G., Wille, R. (eds.): Formal Concept Analysis. LNCS (LNAI), vol. 3626. Springer, Heidelberg (2005)

    Google Scholar 

  157. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer (1998)

    Google Scholar 

  158. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008)

    Google Scholar 

  159. Frank, A., Asuncion, A.: UCI machine learning repository (2010)

    Google Scholar 

  160. Parkinson, H.E., et al.: ArrayExpress update - from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Research 37(Database-Issue), 868–872 (2009)

    Google Scholar 

  161. Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles. Journal of Computational Biology 7(3-4), 559–583 (2000)

    Google Scholar 

  162. Bouckaert, R.R.: Choosing between two learning algorithms based on calibrated tests. In: Fawcett, T., Mishra, N. (eds.) Machine Learning, Proceedings of the Twentieth International Conference, ICML 2003, August 21-24, pp. 51–58. AAAI Press, Washington, DC (2003)

    Google Scholar 

  163. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, pp. 1137–1145 (1995)

    Google Scholar 

  164. Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MATH  Google Scholar 

  165. Baldi, P., Hatfield, G.W.: DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling. Cambridge University Press (2002)

    Google Scholar 

  166. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F.C., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vilo, J., Vingron, M.: Minimum Information About a Microarray Experiment (MIAME) - Toward Standards for Microarray Data. Nature Genetics 29(4), 365–371 (2001)

    Google Scholar 

  167. Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(3) (2006) (online)

    Google Scholar 

  168. Roberts, R.J.: PubMed Central: The GenBank of the published literature. Proceedings of the National Academy of Sciences of the United States of America 98(2), 381–382 (2001)

    Google Scholar 

  169. Spearman, C.: The proof and measurement of association between two things. By C. Spearman, 1904. The American Journal of Psychology 100(3-4), 441–471 (1987)

    Google Scholar 

  170. Stawicki, S., Widz, S.: Decision bireducts and approximate decision reducts: Comparison of two approaches to attribute subset ensemble construction. In: Ganzha, M., Maciaszek, L.A., Paprzycki, M. (eds.) Proceedings of Federated Conference on Computer Science and Information Systems - FedCSIS 2012, Wrocław, Poland, September 9-12, pp. 331–338 (2012)

    Google Scholar 

  171. Bazan, J., Nguyen, S.H., Nguyen, H.S., Skowron, A.: Rough set methods in approximation of hierarchical concepts. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 346–355. Springer, Heidelberg (2004)

    Google Scholar 

  172. Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: Alternatives and implications. Data Mining and Knowledge Discovery 4(2/3), 89–125 (2000)

    Google Scholar 

  173. Ślęzak, D., Synak, P., Borkowski, J., Wroblewski, J., Toppin, G.: A rough-columnar rdbms engine – a case study of correlated subqueries. IEEE Data Engineering Bulletin 35(1), 34–39 (2012)

    Google Scholar 

  174. Bazan, J., Szczuka, M.S.: RSES and rSESlib - A collection of tools for rough set computations. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 106–113. Springer, Heidelberg (2001)

    Google Scholar 

  175. Ahrn, A., Komorowski, J.: ROSETTA – a rough set toolkit for analysis of data. In: Proceedings Third International Joint Conference on Information Sciences, pp. 403–407 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Janusz, A. (2014). Algorithms for Similarity Relation Learning from High Dimensional Data. In: Peters, J.F., Skowron, A. (eds) Transactions on Rough Sets XVII. Lecture Notes in Computer Science, vol 8375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54756-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54756-0_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54755-3

  • Online ISBN: 978-3-642-54756-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics