Abstract
In multi-label classification problems, instances can be associated with several decision classes (labels) simultaneously. One of the most successful algorithms to deal with this kind of problem is the ML-kNN method, which is lazy learner adapted to the multi-label scenario. All the computational models that realize inferences from examples have the common problem of the selection of those examples that should be included into the training set to increase the algorithm’s efficiency. This problem in known as training sets edition. Despite the extensive work in multi-label classification, there is a lack of methods for editing multi-label training sets. In this research, we propose three reduction techniques for editing multi-label training sets that rely on the Rough Set Theory. The simulations show that these methods reduce the number of examples in the training sets without affecting the overall performance, while in some case the performance is even improved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barandela, R., Cortés, N., Palacios, A.: The nearest neighbor rule and the reduction of the training sample size. In: Proceedings 9th Symposium on Pattern Recognition and Image Analysis, vol. 1, pp. 103–108 (2001)
Bello, R., Falcón, R., Pedrycz, W.: Granular Computing: At the Junction of Rough Sets and Fuzzy Sets. Studies in Fuzziness and Soft Computing, vol. 224. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76973-6
Bello, R., Verdegay, J.L.: Rough sets in the soft computing environment. Inf. Sci. 212, 1–14 (2012)
Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Discov. 6(2), 153–172 (2002)
Caballero, Y., Bello, R., Salgado, Y., Garcia, M.M.: A method to edit training set based on rough sets. Int. J. Comput. Intell. Res. 3(3), 219–229 (2007)
Calvo-Zaragoza, J., Valero-Mas, J.J., Rico-Juan, J.R.: Improving kNN multi-label classification in prototype selection scenarios using class proposals. Pattern Recognit. 48(5), 1608–1622 (2015)
Charte, F., Charte, D., Rivera, A., del Jesus, M.J., Herrera, F.: R ultimate multilabel dataset repository. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 487–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_41
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: On the impact of dataset complexity and sampling strategy in multilabel classifiers performance. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 500–511. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_42
Chen, X.j., Zhan, Y.z., Ke, J., Chen, X.b.: Complex video event detection via pairwise fusion of trajectory and multi-label hypergraphs. Multimed. Tools Appl. 75(22), 15079–15100 (2016)
Cortijo, J.: Techniques of approximation II: non parametric approximation. Ph.D. thesis, Thesis. Department of Computer Science and Artificial Intelligence, Universidad de Granada, Spain (2001)
Dasarathy, B.V.: Nearest neighbor (\(\{\)NN\(\}\)) norms:\(\{\)NN\(\}\) pattern classification techniques (1991)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Guan, D., Yuan, W., Lee, Y.K., Lee, S.: Nearest neighbor editing aided by unlabeled data. Inf. Sci. 179(13), 2273–2282 (2009)
Herrera, F., Charte, F., Rivera, A.J., Del Jesus, M.J.: Multilabel classification. In: Multilabel Classification, pp. 17–31. Springer (2016). https://doi.org/10.1007/978-3-319-41111-8_2
Jiang, Y., Zhou, Z.-H.: Editing training data for knn classifiers with neural network ensemble. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 356–361. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28647-9_60
Jin, B., Muller, B., Zhai, C., Lu, X.: Multi-label literature classification based on the gene ontology graph. BMC Bioinform. 9(1), 525 (2008)
Kanj, S., Abdallah, F., Denœux, T.: Purifying training data to improve performance of multi-label classification algorithms. In: 2012 15th International Conference on Information Fusion (FUSION), pp. 1784–1791. IEEE (2012)
Kanj, S., Abdallah, F., Denœux, T., Tout, K.: Editing training data for multi-label classification with the k-nearest neighbor rule. Pattern Anal. Appl. 19(1), 145–161 (2016)
Komorowski, J., Pawlal, Z., Polkowski, L., Skowron, A.: B6. A rough set perspective on data and knowledge. In: Klösgen, W., Zytkow, J.M. (eds.) The Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Oxford (1999)
Madjarov, G., Kocev, D., Gjorgjevikj, D., Džeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recognit. 45(9), 3084–3104 (2012)
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)
Pedrycz, W., Skowron, A., Kreinovich, V.: Handbook of Granular Computing. Wiley, New York (2008)
Pereira, R.B., Plastino, A., Zadrozny, B., Merschmann, L.H.: Correlation analysis of performance measures for multi-label classification. Inf. Process. Manage. 54(3), 359–369 (2018)
Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 17–26. ACM (2007)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Slowinski, R., Vanderpooten, D.: A generalized definition of rough approximations based on similarity. IEEE Trans. Knowl. Data Eng. 12(2), 331–336 (2000)
Tsoumakas, G., Xioufis, E., Vilcek, J., Vlahavas, I.: MULAN multi-label dataset repository (2014). http://mulan.sourceforge.net/datasets.html
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_34
Van Hulse, J., Khoshgoftaar, T.: Knowledge discovery from imbalanced and noisy data. Data Knowl. Eng. 68(12), 1513–1542 (2009)
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)
Wilson, R., Martinez, T.R.: Reduction techniques for exemplar-based learning algorithms. Machine Learning. Computer Science Department, Brigham Young University, USA (1998)
Xu, Z., Liang, J., Dang, C., Chin, K.S.: Inclusion degree: a perspective on measures for rough set data analysis. Inf. Sci. 141(3–4), 227–236 (2002)
Yao, Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180(3), 341–353 (2010)
Yao, Y.: Information granulation and rough set approximation. Int. J. Intell. Syst. 16(1), 87–104 (2001)
Zadeh, L.A.: Key roles of information granulation and fuzzy logic in human reasoning, concept formulation and computing with words. In: Proceedings of IEEE 5th International Fuzzy Systems, vol. 1, p. 1. IEEE (1996)
Zhang, M.L., Zhou, Z.H.: Ml-KNN: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Bello, M., Nápoles, G., Vanhoof, K., Bello, R. (2019). Methods to Edit Multi-label Training Sets Using Rough Sets Theory. In: Mihálydeák, T., et al. Rough Sets. IJCRS 2019. Lecture Notes in Computer Science(), vol 11499. Springer, Cham. https://doi.org/10.1007/978-3-030-22815-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-22815-6_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22814-9
Online ISBN: 978-3-030-22815-6
eBook Packages: Computer ScienceComputer Science (R0)