Methods to Edit Multi-label Training Sets Using Rough Sets Theory

Bello, Marilyn; Nápoles, Gonzalo; Vanhoof, Koen; Bello, Rafael

doi:10.1007/978-3-030-22815-6_29

Marilyn Bello^21,22,
Gonzalo Nápoles²²,
Koen Vanhoof²² &
…
Rafael Bello²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11499))

Included in the following conference series:

International Joint Conference on Rough Sets

788 Accesses

Abstract

In multi-label classification problems, instances can be associated with several decision classes (labels) simultaneously. One of the most successful algorithms to deal with this kind of problem is the ML-kNN method, which is lazy learner adapted to the multi-label scenario. All the computational models that realize inferences from examples have the common problem of the selection of those examples that should be included into the training set to increase the algorithm’s efficiency. This problem in known as training sets edition. Despite the extensive work in multi-label classification, there is a lack of methods for editing multi-label training sets. In this research, we propose three reduction techniques for editing multi-label training sets that rely on the Rough Set Theory. The simulations show that these methods reduce the number of examples in the training sets without affecting the overall performance, while in some case the performance is even improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Expanding MLkNN Using Extended Rough Set Theory

Attribute reduction for multi-label classification based on labels of positive region

Article 25 February 2020

Rough sets-based tri-trade for partially labeled data

Article 11 January 2023

References

Barandela, R., Cortés, N., Palacios, A.: The nearest neighbor rule and the reduction of the training sample size. In: Proceedings 9th Symposium on Pattern Recognition and Image Analysis, vol. 1, pp. 103–108 (2001)
Google Scholar
Bello, R., Falcón, R., Pedrycz, W.: Granular Computing: At the Junction of Rough Sets and Fuzzy Sets. Studies in Fuzziness and Soft Computing, vol. 224. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76973-6
Book Google Scholar
Bello, R., Verdegay, J.L.: Rough sets in the soft computing environment. Inf. Sci. 212, 1–14 (2012)
Article MathSciNet Google Scholar
Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Discov. 6(2), 153–172 (2002)
Article MathSciNet Google Scholar
Caballero, Y., Bello, R., Salgado, Y., Garcia, M.M.: A method to edit training set based on rough sets. Int. J. Comput. Intell. Res. 3(3), 219–229 (2007)
Article Google Scholar
Calvo-Zaragoza, J., Valero-Mas, J.J., Rico-Juan, J.R.: Improving kNN multi-label classification in prototype selection scenarios using class proposals. Pattern Recognit. 48(5), 1608–1622 (2015)
Article Google Scholar
Charte, F., Charte, D., Rivera, A., del Jesus, M.J., Herrera, F.: R ultimate multilabel dataset repository. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 487–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_41
Chapter Google Scholar
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: On the impact of dataset complexity and sampling strategy in multilabel classifiers performance. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 500–511. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_42
Chapter Google Scholar
Chen, X.j., Zhan, Y.z., Ke, J., Chen, X.b.: Complex video event detection via pairwise fusion of trajectory and multi-label hypergraphs. Multimed. Tools Appl. 75(22), 15079–15100 (2016)
Article Google Scholar
Cortijo, J.: Techniques of approximation II: non parametric approximation. Ph.D. thesis, Thesis. Department of Computer Science and Artificial Intelligence, Universidad de Granada, Spain (2001)
Google Scholar
Dasarathy, B.V.: Nearest neighbor ($\{$NN$\}$) norms:$\{$NN$\}$ pattern classification techniques (1991)
Google Scholar
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Article Google Scholar
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Article Google Scholar
Guan, D., Yuan, W., Lee, Y.K., Lee, S.: Nearest neighbor editing aided by unlabeled data. Inf. Sci. 179(13), 2273–2282 (2009)
Article Google Scholar
Herrera, F., Charte, F., Rivera, A.J., Del Jesus, M.J.: Multilabel classification. In: Multilabel Classification, pp. 17–31. Springer (2016). https://doi.org/10.1007/978-3-319-41111-8_2
Google Scholar
Jiang, Y., Zhou, Z.-H.: Editing training data for knn classifiers with neural network ensemble. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 356–361. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28647-9_60
Chapter Google Scholar
Jin, B., Muller, B., Zhai, C., Lu, X.: Multi-label literature classification based on the gene ontology graph. BMC Bioinform. 9(1), 525 (2008)
Article Google Scholar
Kanj, S., Abdallah, F., Denœux, T.: Purifying training data to improve performance of multi-label classification algorithms. In: 2012 15th International Conference on Information Fusion (FUSION), pp. 1784–1791. IEEE (2012)
Google Scholar
Kanj, S., Abdallah, F., Denœux, T., Tout, K.: Editing training data for multi-label classification with the k-nearest neighbor rule. Pattern Anal. Appl. 19(1), 145–161 (2016)
Article MathSciNet Google Scholar
Komorowski, J., Pawlal, Z., Polkowski, L., Skowron, A.: B6. A rough set perspective on data and knowledge. In: Klösgen, W., Zytkow, J.M. (eds.) The Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Oxford (1999)
Google Scholar
Madjarov, G., Kocev, D., Gjorgjevikj, D., Džeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recognit. 45(9), 3084–3104 (2012)
Article Google Scholar
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)
Article Google Scholar
Pedrycz, W., Skowron, A., Kreinovich, V.: Handbook of Granular Computing. Wiley, New York (2008)
Book Google Scholar
Pereira, R.B., Plastino, A., Zadrozny, B., Merschmann, L.H.: Correlation analysis of performance measures for multi-label classification. Inf. Process. Manage. 54(3), 359–369 (2018)
Article Google Scholar
Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 17–26. ACM (2007)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Article Google Scholar
Slowinski, R., Vanderpooten, D.: A generalized definition of rough approximations based on similarity. IEEE Trans. Knowl. Data Eng. 12(2), 331–336 (2000)
Article Google Scholar
Tsoumakas, G., Xioufis, E., Vilcek, J., Vlahavas, I.: MULAN multi-label dataset repository (2014). http://mulan.sourceforge.net/datasets.html
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_34
Chapter Google Scholar
Van Hulse, J., Khoshgoftaar, T.: Knowledge discovery from imbalanced and noisy data. Data Knowl. Eng. 68(12), 1513–1542 (2009)
Article Google Scholar
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)
Article MathSciNet Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)
Article MathSciNet Google Scholar
Wilson, R., Martinez, T.R.: Reduction techniques for exemplar-based learning algorithms. Machine Learning. Computer Science Department, Brigham Young University, USA (1998)
Google Scholar
Xu, Z., Liang, J., Dang, C., Chin, K.S.: Inclusion degree: a perspective on measures for rough set data analysis. Inf. Sci. 141(3–4), 227–236 (2002)
Article MathSciNet Google Scholar
Yao, Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180(3), 341–353 (2010)
Article MathSciNet Google Scholar
Yao, Y.: Information granulation and rough set approximation. Int. J. Intell. Syst. 16(1), 87–104 (2001)
Article Google Scholar
Zadeh, L.A.: Key roles of information granulation and fuzzy logic in human reasoning, concept formulation and computing with words. In: Proceedings of IEEE 5th International Fuzzy Systems, vol. 1, p. 1. IEEE (1996)
Google Scholar
Zhang, M.L., Zhou, Z.H.: Ml-KNN: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Universidad Central de Las Villas, Santa Clara, Cuba
Marilyn Bello & Rafael Bello
Faculty of Business Economics, Hasselt University, Hasselt, Belgium
Marilyn Bello, Gonzalo Nápoles & Koen Vanhoof

Authors

Marilyn Bello
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Nápoles
View author publications
You can also search for this author in PubMed Google Scholar
Koen Vanhoof
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Bello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marilyn Bello .

Editor information

Editors and Affiliations

University of Debrecen, Debrecen, Hungary
Tamás Mihálydeák
Southwest Petroleum University, Chengdu, China
Fan Min
Chongqing University of Posts and Telecommunications, Chongqing, China
Guoyin Wang
Indian Institute of Technology Kanpur, Kanpur, India
Mohua Banerjee
Fujian Normal University, Fuzhou, China
Ivo Düntsch
University of Rzeszów, Rzeszow, Poland
Zbigniew Suraj
University of Milano-Bicocca, Milan, Italy
Davide Ciucci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bello, M., Nápoles, G., Vanhoof, K., Bello, R. (2019). Methods to Edit Multi-label Training Sets Using Rough Sets Theory. In: Mihálydeák, T., et al. Rough Sets. IJCRS 2019. Lecture Notes in Computer Science(), vol 11499. Springer, Cham. https://doi.org/10.1007/978-3-030-22815-6_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-22815-6_29
Published: 09 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22814-9
Online ISBN: 978-3-030-22815-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Methods to Edit Multi-label Training Sets Using Rough Sets Theory

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Expanding MLkNN Using Extended Rough Set Theory

Attribute reduction for multi-label classification based on labels of positive region

Rough sets-based tri-trade for partially labeled data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Methods to Edit Multi-label Training Sets Using Rough Sets Theory

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Expanding MLkNN Using Extended Rough Set Theory

Attribute reduction for multi-label classification based on labels of positive region

Rough sets-based tri-trade for partially labeled data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation