Abstract
In a multiple instance learning (MIL) scenario, the outcome annotation is usually only reported at the bag level. Considering simplicity and convergence criteria, the lazy learning approach, i.e., k-nearest neighbors (kNN), plays a crucial role in predicting bag labels in the MIL domain. Notably, two variations of the kNN algorithm tailored to the MIL framework have been introduced, namely Bayesian-kNN (BkNN) and Citation-kNN (CkNN). These adaptations leverage the Hausdorff metric along with Bayesian or citation approaches. However, neither BkNN nor CkNN explicitly integrates feature selection methodologies, and when irrelevant and redundant features are present, the model’s generalization decreases. In the single-instance learning scenario, to overcome this limitation of kNN, a feature weighting algorithm named Neighborhood Component Feature Selection (NCFS) is often applied to find the optimal degree of influence of each feature. To address the significant gap existing in the literature, we introduce the NCFS method for the MIL framework. The proposed methodologies, i.e. NCFS-BkNN, NCFS-CkNN, and NCFS-Bayesian Citation-kNN (NCFS-BCkNN), learn the optimal features weighting vector by minimizing the regularized leave-one-out error of the training bags. Hence, the prediction of unseen bags is computed by combining the Bayesian and citation approaches based on the minimum optimally weighted Hausdorff distance. Through experiments with various benchmark MIL datasets in the biomedical informatics and affective computing fields, we provide statistical evidence that the proposed methods outperform state-of-the-art MIL algorithms that do not employ any a priori feature weighting strategy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aziz, Y., Memon, K.H.: Fast geometrical extraction of nearest neighbors from multi-dimensional data. Pattern Recogn. 136, 109183 (2023)
Carbonneau, M.A., Cheplygina, V., Granger, E., Gagnon, G.: Multiple instance learning: a survey of problem characteristics and applications. Pattern Recogn. 77, 329–353 (2018)
Cunningham, P., Delany, S.J.: K-nearest neighbour classifiers-a tutorial. ACM Comput. Surv. (CSUR) 54(6), 1–25 (2021)
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1), 31–71 (1997)
Goldberger, J., Hinton, G.E., Roweis, S.T., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Advances in Neural Information Processing Systems, pp. 513–520 (2005)
Herrera, F., et al.: Multi-instance regression. In: Multiple Instance Learning, pp. 127–140. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47759-6_6
Herrera, F., et al.: Multiple instance learning. In: Multiple Instance Learning, pp. 17–33. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47759-6_2
Jiang, L., Cai, Z., Wang, D., Zhang, H.: Bayesian citation-KNN with distance weighting. Int. J. Mach. Learn. Cybern. 5(2), 193–199 (2014)
Jung, T.P., Sejnowski, T.J., et al.: Utilizing deep learning towards multi-modal bio-sensing and vision-based affective computing. IEEE Trans. Affect. Comput. 13(1), 96–107 (2019)
Kim, H., Lee, T.H., Kwon, T.: Normalized neighborhood component feature selection and feasible-improved weight allocation for input variable selection. Knowl.-Based Syst. 218, 106855 (2021)
Koelstra, S., et al.: DEAP: a database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2012)
Li, J., Wang, J.Q.: An extended QUALIFLEX method under probability hesitant fuzzy environment for selecting green suppliers. Int. J. Fuzzy Syst. 19, 1866–1879 (2017)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)
Mera, C., Orozco-Alzate, M., Branch, J.: Incremental learning of concept drift in multiple instance learning for industrial visual inspection. Comput. Ind. 109, 153–164 (2019)
Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Rodrigues, É.O.: An efficient and locality-oriented Hausdorff distance algorithm: proposal and analysis of paradigms and implementations. Pattern Recogn. 117, 107989 (2021)
Paul, Y., Kumar, N.: A comparative study of famous classification techniques and data mining tools. In: Singh, P.K., Kar, A.K., Singh, Y., Kolekar, M.H., Tanwar, S. (eds.) Proceedings of ICRIC 2019. LNEE, vol. 597, pp. 627–644. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29407-6_45
Ren, T., Jia, X., Li, W., Chen, L., Li, Z.: Label distribution learning with label-specific features. In: IJCAI, pp. 3318–3324 (2019)
Romeo, L., Cavallo, A., Pepa, L., Bianchi-Berthouze, N., Pontil, M.: Multiple instance learning for emotion recognition using physiological signals. IEEE Trans. Affect. Comput. 13(1), 389–407 (2019)
Shahrjooihaghighi, A., Frigui, H.: Local feature selection for multiple instance learning. J. Intell. Inf. Syst., 1–25 (2021)
Sudharshan, P., Petitjean, C., Spanhol, F., Oliveira, L.E., Heutte, L., Honeine, P.: Multiple instance learning for histopathological breast cancer image classification. Expert Syst. Appl. 117, 103–111 (2019)
Taunk, K., De, S., Verma, S., Swetapadma, A.: A brief review of nearest neighbor algorithm for learning and classification. In: 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp. 1255–1260. IEEE (2019)
Tuncer, T., Dogan, S., Acharya, U.R.: Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowl.-Based Syst. 211, 106547 (2021)
Tuncer, T., Dogan, S., Subasi, A.: EEG-based driving fatigue detection using multilevel feature extraction and iterative hybrid feature selection. Biomed. Signal Process. Control 68, 102591 (2021)
Vatsavai, R.R.: Gaussian multiple instance learning approach for mapping the slums of the world using very high resolution imagery. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1419–1426 (2013)
Wang, J., Zucker, J.D.: Solving the multiple-instance problem: a lazy learning approach. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 1119–1126. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)
Xiao, Y., Yang, X., Liu, B.: A new self-paced method for multiple instance boosting learning. Inf. Sci. 515, 80–90 (2020)
Yaman, O.: An automated faults classification method based on binary pattern and neighborhood component analysis using induction motor. Measurement 168, 108323 (2021)
Yang, W., Wang, K., Zuo, W.: Neighborhood component feature selection for high-dimensional data. J. Comput. 7(1), 161–168 (2012)
Acknowledgments
Funded by the European Union - NextGenerationEU and by the Ministry of University and Research (MUR), National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.5, project “RAISE - Robotics and AI for Socio-economic Empowerment” (ECS00000035). G.T. is part of RAISE Innovation Ecosystem.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Turri, G., Romeo, L. (2024). Neighborhood Component Feature Selection for Multiple Instance Learning Paradigm. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14941. Springer, Cham. https://doi.org/10.1007/978-3-031-70341-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-70341-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70340-9
Online ISBN: 978-3-031-70341-6
eBook Packages: Computer ScienceComputer Science (R0)