Skip to main content

Methods to Edit Multi-label Training Sets Using Rough Sets Theory

  • Conference paper
  • First Online:
Rough Sets (IJCRS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11499))

Included in the following conference series:

  • 788 Accesses

Abstract

In multi-label classification problems, instances can be associated with several decision classes (labels) simultaneously. One of the most successful algorithms to deal with this kind of problem is the ML-kNN method, which is lazy learner adapted to the multi-label scenario. All the computational models that realize inferences from examples have the common problem of the selection of those examples that should be included into the training set to increase the algorithm’s efficiency. This problem in known as training sets edition. Despite the extensive work in multi-label classification, there is a lack of methods for editing multi-label training sets. In this research, we propose three reduction techniques for editing multi-label training sets that rely on the Rough Set Theory. The simulations show that these methods reduce the number of examples in the training sets without affecting the overall performance, while in some case the performance is even improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Barandela, R., Cortés, N., Palacios, A.: The nearest neighbor rule and the reduction of the training sample size. In: Proceedings 9th Symposium on Pattern Recognition and Image Analysis, vol. 1, pp. 103–108 (2001)

    Google Scholar 

  2. Bello, R., Falcón, R., Pedrycz, W.: Granular Computing: At the Junction of Rough Sets and Fuzzy Sets. Studies in Fuzziness and Soft Computing, vol. 224. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76973-6

    Book  Google Scholar 

  3. Bello, R., Verdegay, J.L.: Rough sets in the soft computing environment. Inf. Sci. 212, 1–14 (2012)

    Article  MathSciNet  Google Scholar 

  4. Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Discov. 6(2), 153–172 (2002)

    Article  MathSciNet  Google Scholar 

  5. Caballero, Y., Bello, R., Salgado, Y., Garcia, M.M.: A method to edit training set based on rough sets. Int. J. Comput. Intell. Res. 3(3), 219–229 (2007)

    Article  Google Scholar 

  6. Calvo-Zaragoza, J., Valero-Mas, J.J., Rico-Juan, J.R.: Improving kNN multi-label classification in prototype selection scenarios using class proposals. Pattern Recognit. 48(5), 1608–1622 (2015)

    Article  Google Scholar 

  7. Charte, F., Charte, D., Rivera, A., del Jesus, M.J., Herrera, F.: R ultimate multilabel dataset repository. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 487–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_41

    Chapter  Google Scholar 

  8. Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: On the impact of dataset complexity and sampling strategy in multilabel classifiers performance. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 500–511. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_42

    Chapter  Google Scholar 

  9. Chen, X.j., Zhan, Y.z., Ke, J., Chen, X.b.: Complex video event detection via pairwise fusion of trajectory and multi-label hypergraphs. Multimed. Tools Appl. 75(22), 15079–15100 (2016)

    Article  Google Scholar 

  10. Cortijo, J.: Techniques of approximation II: non parametric approximation. Ph.D. thesis, Thesis. Department of Computer Science and Artificial Intelligence, Universidad de Granada, Spain (2001)

    Google Scholar 

  11. Dasarathy, B.V.: Nearest neighbor (\(\{\)NN\(\}\)) norms:\(\{\)NN\(\}\) pattern classification techniques (1991)

    Google Scholar 

  12. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)

    Article  Google Scholar 

  13. Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)

    Article  Google Scholar 

  14. Guan, D., Yuan, W., Lee, Y.K., Lee, S.: Nearest neighbor editing aided by unlabeled data. Inf. Sci. 179(13), 2273–2282 (2009)

    Article  Google Scholar 

  15. Herrera, F., Charte, F., Rivera, A.J., Del Jesus, M.J.: Multilabel classification. In: Multilabel Classification, pp. 17–31. Springer (2016). https://doi.org/10.1007/978-3-319-41111-8_2

    Google Scholar 

  16. Jiang, Y., Zhou, Z.-H.: Editing training data for knn classifiers with neural network ensemble. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 356–361. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28647-9_60

    Chapter  Google Scholar 

  17. Jin, B., Muller, B., Zhai, C., Lu, X.: Multi-label literature classification based on the gene ontology graph. BMC Bioinform. 9(1), 525 (2008)

    Article  Google Scholar 

  18. Kanj, S., Abdallah, F., Denœux, T.: Purifying training data to improve performance of multi-label classification algorithms. In: 2012 15th International Conference on Information Fusion (FUSION), pp. 1784–1791. IEEE (2012)

    Google Scholar 

  19. Kanj, S., Abdallah, F., Denœux, T., Tout, K.: Editing training data for multi-label classification with the k-nearest neighbor rule. Pattern Anal. Appl. 19(1), 145–161 (2016)

    Article  MathSciNet  Google Scholar 

  20. Komorowski, J., Pawlal, Z., Polkowski, L., Skowron, A.: B6. A rough set perspective on data and knowledge. In: Klösgen, W., Zytkow, J.M. (eds.) The Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Oxford (1999)

    Google Scholar 

  21. Madjarov, G., Kocev, D., Gjorgjevikj, D., Džeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recognit. 45(9), 3084–3104 (2012)

    Article  Google Scholar 

  22. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)

    Article  Google Scholar 

  23. Pedrycz, W., Skowron, A., Kreinovich, V.: Handbook of Granular Computing. Wiley, New York (2008)

    Book  Google Scholar 

  24. Pereira, R.B., Plastino, A., Zadrozny, B., Merschmann, L.H.: Correlation analysis of performance measures for multi-label classification. Inf. Process. Manage. 54(3), 359–369 (2018)

    Article  Google Scholar 

  25. Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 17–26. ACM (2007)

    Google Scholar 

  26. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)

    Article  Google Scholar 

  27. Slowinski, R., Vanderpooten, D.: A generalized definition of rough approximations based on similarity. IEEE Trans. Knowl. Data Eng. 12(2), 331–336 (2000)

    Article  Google Scholar 

  28. Tsoumakas, G., Xioufis, E., Vilcek, J., Vlahavas, I.: MULAN multi-label dataset repository (2014). http://mulan.sourceforge.net/datasets.html

  29. Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_34

    Chapter  Google Scholar 

  30. Van Hulse, J., Khoshgoftaar, T.: Knowledge discovery from imbalanced and noisy data. Data Knowl. Eng. 68(12), 1513–1542 (2009)

    Article  Google Scholar 

  31. Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)

    Article  MathSciNet  Google Scholar 

  32. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)

    Article  MathSciNet  Google Scholar 

  33. Wilson, R., Martinez, T.R.: Reduction techniques for exemplar-based learning algorithms. Machine Learning. Computer Science Department, Brigham Young University, USA (1998)

    Google Scholar 

  34. Xu, Z., Liang, J., Dang, C., Chin, K.S.: Inclusion degree: a perspective on measures for rough set data analysis. Inf. Sci. 141(3–4), 227–236 (2002)

    Article  MathSciNet  Google Scholar 

  35. Yao, Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180(3), 341–353 (2010)

    Article  MathSciNet  Google Scholar 

  36. Yao, Y.: Information granulation and rough set approximation. Int. J. Intell. Syst. 16(1), 87–104 (2001)

    Article  Google Scholar 

  37. Zadeh, L.A.: Key roles of information granulation and fuzzy logic in human reasoning, concept formulation and computing with words. In: Proceedings of IEEE 5th International Fuzzy Systems, vol. 1, p. 1. IEEE (1996)

    Google Scholar 

  38. Zhang, M.L., Zhou, Z.H.: Ml-KNN: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marilyn Bello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bello, M., Nápoles, G., Vanhoof, K., Bello, R. (2019). Methods to Edit Multi-label Training Sets Using Rough Sets Theory. In: Mihálydeák, T., et al. Rough Sets. IJCRS 2019. Lecture Notes in Computer Science(), vol 11499. Springer, Cham. https://doi.org/10.1007/978-3-030-22815-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22815-6_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22814-9

  • Online ISBN: 978-3-030-22815-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics