Skip to main content

Neighborhood Component Feature Selection for Multiple Instance Learning Paradigm

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2024)

Abstract

In a multiple instance learning (MIL) scenario, the outcome annotation is usually only reported at the bag level. Considering simplicity and convergence criteria, the lazy learning approach, i.e., k-nearest neighbors (kNN), plays a crucial role in predicting bag labels in the MIL domain. Notably, two variations of the kNN algorithm tailored to the MIL framework have been introduced, namely Bayesian-kNN (BkNN) and Citation-kNN (CkNN). These adaptations leverage the Hausdorff metric along with Bayesian or citation approaches. However, neither BkNN nor CkNN explicitly integrates feature selection methodologies, and when irrelevant and redundant features are present, the model’s generalization decreases. In the single-instance learning scenario, to overcome this limitation of kNN, a feature weighting algorithm named Neighborhood Component Feature Selection (NCFS) is often applied to find the optimal degree of influence of each feature. To address the significant gap existing in the literature, we introduce the NCFS method for the MIL framework. The proposed methodologies, i.e. NCFS-BkNN, NCFS-CkNN, and NCFS-Bayesian Citation-kNN (NCFS-BCkNN), learn the optimal features weighting vector by minimizing the regularized leave-one-out error of the training bags. Hence, the prediction of unseen bags is computed by combining the Bayesian and citation approaches based on the minimum optimally weighted Hausdorff distance. Through experiments with various benchmark MIL datasets in the biomedical informatics and affective computing fields, we provide statistical evidence that the proposed methods outperform state-of-the-art MIL algorithms that do not employ any a priori feature weighting strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aziz, Y., Memon, K.H.: Fast geometrical extraction of nearest neighbors from multi-dimensional data. Pattern Recogn. 136, 109183 (2023)

    Article  Google Scholar 

  2. Carbonneau, M.A., Cheplygina, V., Granger, E., Gagnon, G.: Multiple instance learning: a survey of problem characteristics and applications. Pattern Recogn. 77, 329–353 (2018)

    Article  Google Scholar 

  3. Cunningham, P., Delany, S.J.: K-nearest neighbour classifiers-a tutorial. ACM Comput. Surv. (CSUR) 54(6), 1–25 (2021)

    Article  Google Scholar 

  4. Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1), 31–71 (1997)

    Article  Google Scholar 

  5. Goldberger, J., Hinton, G.E., Roweis, S.T., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Advances in Neural Information Processing Systems, pp. 513–520 (2005)

    Google Scholar 

  6. Herrera, F., et al.: Multi-instance regression. In: Multiple Instance Learning, pp. 127–140. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47759-6_6

    Chapter  Google Scholar 

  7. Herrera, F., et al.: Multiple instance learning. In: Multiple Instance Learning, pp. 17–33. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47759-6_2

    Chapter  Google Scholar 

  8. Jiang, L., Cai, Z., Wang, D., Zhang, H.: Bayesian citation-KNN with distance weighting. Int. J. Mach. Learn. Cybern. 5(2), 193–199 (2014)

    Article  Google Scholar 

  9. Jung, T.P., Sejnowski, T.J., et al.: Utilizing deep learning towards multi-modal bio-sensing and vision-based affective computing. IEEE Trans. Affect. Comput. 13(1), 96–107 (2019)

    Google Scholar 

  10. Kim, H., Lee, T.H., Kwon, T.: Normalized neighborhood component feature selection and feasible-improved weight allocation for input variable selection. Knowl.-Based Syst. 218, 106855 (2021)

    Article  Google Scholar 

  11. Koelstra, S., et al.: DEAP: a database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2012)

    Article  Google Scholar 

  12. Li, J., Wang, J.Q.: An extended QUALIFLEX method under probability hesitant fuzzy environment for selecting green suppliers. Int. J. Fuzzy Syst. 19, 1866–1879 (2017)

    Google Scholar 

  13. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)

    Article  MathSciNet  Google Scholar 

  14. Mera, C., Orozco-Alzate, M., Branch, J.: Incremental learning of concept drift in multiple instance learning for industrial visual inspection. Comput. Ind. 109, 153–164 (2019)

    Article  Google Scholar 

  15. Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)

    Article  Google Scholar 

  16. Rodrigues, É.O.: An efficient and locality-oriented Hausdorff distance algorithm: proposal and analysis of paradigms and implementations. Pattern Recogn. 117, 107989 (2021)

    Article  Google Scholar 

  17. Paul, Y., Kumar, N.: A comparative study of famous classification techniques and data mining tools. In: Singh, P.K., Kar, A.K., Singh, Y., Kolekar, M.H., Tanwar, S. (eds.) Proceedings of ICRIC 2019. LNEE, vol. 597, pp. 627–644. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29407-6_45

    Chapter  Google Scholar 

  18. Ren, T., Jia, X., Li, W., Chen, L., Li, Z.: Label distribution learning with label-specific features. In: IJCAI, pp. 3318–3324 (2019)

    Google Scholar 

  19. Romeo, L., Cavallo, A., Pepa, L., Bianchi-Berthouze, N., Pontil, M.: Multiple instance learning for emotion recognition using physiological signals. IEEE Trans. Affect. Comput. 13(1), 389–407 (2019)

    Article  Google Scholar 

  20. Shahrjooihaghighi, A., Frigui, H.: Local feature selection for multiple instance learning. J. Intell. Inf. Syst., 1–25 (2021)

    Google Scholar 

  21. Sudharshan, P., Petitjean, C., Spanhol, F., Oliveira, L.E., Heutte, L., Honeine, P.: Multiple instance learning for histopathological breast cancer image classification. Expert Syst. Appl. 117, 103–111 (2019)

    Article  Google Scholar 

  22. Taunk, K., De, S., Verma, S., Swetapadma, A.: A brief review of nearest neighbor algorithm for learning and classification. In: 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp. 1255–1260. IEEE (2019)

    Google Scholar 

  23. Tuncer, T., Dogan, S., Acharya, U.R.: Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowl.-Based Syst. 211, 106547 (2021)

    Article  Google Scholar 

  24. Tuncer, T., Dogan, S., Subasi, A.: EEG-based driving fatigue detection using multilevel feature extraction and iterative hybrid feature selection. Biomed. Signal Process. Control 68, 102591 (2021)

    Article  Google Scholar 

  25. Vatsavai, R.R.: Gaussian multiple instance learning approach for mapping the slums of the world using very high resolution imagery. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1419–1426 (2013)

    Google Scholar 

  26. Wang, J., Zucker, J.D.: Solving the multiple-instance problem: a lazy learning approach. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 1119–1126. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)

    Google Scholar 

  27. Xiao, Y., Yang, X., Liu, B.: A new self-paced method for multiple instance boosting learning. Inf. Sci. 515, 80–90 (2020)

    Article  Google Scholar 

  28. Yaman, O.: An automated faults classification method based on binary pattern and neighborhood component analysis using induction motor. Measurement 168, 108323 (2021)

    Article  Google Scholar 

  29. Yang, W., Wang, K., Zuo, W.: Neighborhood component feature selection for high-dimensional data. J. Comput. 7(1), 161–168 (2012)

    Article  Google Scholar 

Download references

Acknowledgments

Funded by the European Union - NextGenerationEU and by the Ministry of University and Research (MUR), National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.5, project “RAISE - Robotics and AI for Socio-economic Empowerment” (ECS00000035). G.T. is part of RAISE Innovation Ecosystem.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Romeo .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Turri, G., Romeo, L. (2024). Neighborhood Component Feature Selection for Multiple Instance Learning Paradigm. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14941. Springer, Cham. https://doi.org/10.1007/978-3-031-70341-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70341-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70340-9

  • Online ISBN: 978-3-031-70341-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics