Online multi-label stream feature selection based on neighborhood rough set with missing labels

Liang, Shunpan; Liu, Ze; You, Dianlong; Pan, Weiwei

doi:10.1007/s10044-022-01067-2

Online multi-label stream feature selection based on neighborhood rough set with missing labels

Industrial and Commercial Application
Published: 15 April 2022

Volume 25, pages 1025–1039, (2022)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Shunpan Liang¹,
Ze Liu¹,
Dianlong You¹ &
…
Weiwei Pan¹

548 Accesses
Explore all metrics

Abstract

Multi-label feature selection has been essential in many big data applications and plays a significant role in processing high-dimensional data. However, the existing online stream feature selection methods ignore the existence of missing labels. Inspired by the neighborhood rough set that does not require prior knowledge of the feature space, we propose a novel online multi-label stream feature selection algorithm called OFS-Mean. We define a neighborhood relationship that can automatically select an appropriate number of neighbors. Without any prior space and parameters, the algorithm’s performance of the algorithm is improved by real-time online prediction of missing labels based on the similarity between the instance and its neighbors. The proposed OFS-Mean divides the feature selection process into two stages: online feature importance evaluation and online redundancy update to screen important features. With the support of neighborhood rough set, the proposed OFS-Mean can adapt to various types of datasets, improving the algorithm generalization ability. In the experiment, the similarity test is used to verify the prediction results; the comparison with the traditional semi-supervised feature selection method under the condition of selecting the same number of features has achieved ideal results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ASFS: A novel streaming feature selection for multi-label data based on neighborhood rough set

Article 02 May 2022

Dynamic multi-label feature selection algorithm based on label importance and label correlation

Article Open access 13 March 2024

Multi-label feature selection based on fuzzy neighborhood rough sets

Article Open access 10 January 2022

Availability of data and material

The dataset generated during and the current study are available in the [Multi-Label Classification Dataset Repository] repository, http://www.uco.es/kdis/mllresources/.

References

Ni J, Fei H, Wei F, Xiang Z (2017) Automated medical diagnosis by ranking clusters across the symptom-disease network. In: IEEE international conference on data mining. https://doi.org/10.1109/ICDM.2017.130
Shen Y, Wu C, Liu C, Wu Y, Xiong N (2018) Oriented feature selection SVM applied to cancer prediction in precision medicine. IEEE Access 6:1. https://doi.org/10.1109/ACCESS.2018.2868098
Article Google Scholar
Lewis DD, Yang Y, Rose TG, Fan L (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5(2):361–397
Google Scholar
Schapire Robert E, Singer Yoram (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168
Article Google Scholar
Wu X, Yu K, Ding W, Wang H, Zhu X (2013) Online feature selection with streaming features. IEEE Trans Pattern Anal Mach Intell 35(5):1178–1192. https://doi.org/10.1109/TPAMI.2012.197
Article Google Scholar
Aharoni E, Rosset S (2015) Generalized alpha investing: definitions, optimality results, and application to public databases. J R Stat Soc 76(4):771–794. https://doi.org/10.1111/rssb.12048
Article MATH Google Scholar
Eskandari S, Javidi MM (2016) Online streaming feature selection using rough sets. Int J Approx Reason 69(C):35–57. https://doi.org/10.1016/j.ijar.2015.11.006
Article MathSciNet MATH Google Scholar
Peng Z, Hu X, Li P, Wu X (2018) Online streaming feature selection using adapted neighborhood rough set. Inf Sci. https://doi.org/10.1016/j.ins.2018.12.074
Article Google Scholar
Javidi MM, Eskandari S (2016) Streamwise feature selection: a rough set method. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-016-0595-y
Article MATH Google Scholar
Maji P, Paul S (2011) Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. Int J Approx Reason 52(3):408–426. https://doi.org/10.1016/j.ijar.2010.09.006
Article Google Scholar
Hu Q, Liu J, Yu D (2008) Mixed feature selection based on granulation and approximation. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2007.07.001
Article Google Scholar
Udhaya KS, Hannah IH (2017) PSO-based feature selection and neighborhood rough set-based classification for BCI multiclass motor imagery task. Neural Comput Appl. https://doi.org/10.1007/s00521-016-2236-5
Article Google Scholar
Hu QH, Yu DR, Xie ZX (2008) Numerical attribute reduction based on neighborhood granulation and rough approximation. J Softw. https://doi.org/10.3724/SP.J.1001.2008.00640
Article MATH Google Scholar
Zhang J, Li T, Da R, Liu D (2012) Neighborhood rough sets for dynamic data mining. Int J Intell Syst. https://doi.org/10.1002/int.21523
Article Google Scholar
Ning G, Ge D, Hu Z (2019) AFS: an attention-based mechanism for supervised feature selection
Borisov V, Haug J, Kasneci G (2019) CancelOut: a layer for feature selection in deep neural networks
Liao Y, Latty R, Yang B (2020) Feature selection using batch-wise attenuation and feature mask normalization
Gharroudi O, Elghazel H, Aussem A (2014) A comparison of multi-label feature selection methods using the random forest paradigm. In: Canadian conference on artificial intelligence
Zhang ML, Pe A JM, Robles V (2009) Feature selection for multi-label Naive Bayes classification. Inf Sci 179(19):3218–3229. https://doi.org/10.1016/j.ins.2009.06.010
Article MATH Google Scholar
Wang C, Huang Y, Shao M, Hu Q, Chen D (2019) Feature selection based on neighborhood self-information. IEEE Trans Cybern 99:1–12. https://doi.org/10.1109/TCYB.2019.2923430
Article Google Scholar
Yang Y, Chen D, Hui W (2016) Active sample selection based incremental algorithm for attribute reduction with rough sets. IEEE Trans Fuzzy Syst 99:1. https://doi.org/10.1109/TFUZZ.2016.2581186
Article Google Scholar
Gu Q, Li Z, Han J (2011) Correlated multi-label feature selection. In: ACM international conference on information & knowledge management. https://doi.org/10.1145/2063576.2063734
Maldonado S, Weber R (2009) A wrapper method for feature selection using support vector machines. Inf Sci 179(13):2208–2217. https://doi.org/10.1016/j.ins.2009.02.014
Article Google Scholar
Ning Z, Dong J, Ohsuga S (2001) Using rough sets with heuristics for feature selection. J Intell Inf Syst 16(3):199–214
Article Google Scholar
Yu K, Wu X, Ding W, Pei J (2016) Scalable and accurate online feature selection for big data. ACM Trans Knowl Discov Data (TKDD). https://doi.org/10.1145/2976744
Article Google Scholar
Liu J, Lin Y, Li Y, Wei W, Wu S (2018) Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recognit 84:273–287. https://doi.org/10.1016/j.patcog.2018.07.021
Article Google Scholar
Lee J, Kim DW (2015) Memetic feature selection algorithm for multi-label classification. Inf Sci 293:80–96. https://doi.org/10.1016/j.ins.2014.09.020
Article Google Scholar
Lin Y, Hu Q, Liu J, Chen J, Duan J (2016) Multi-label feature selection based on neighborhood mutual information. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2015.10.009
Article Google Scholar
Yaojin Lin AB, Qinghua HuB, Jia Zhang A, Xindong WuC (2016) Multi-label feature selection with streaming labels. Inf Sci 372:256–275. https://doi.org/10.1016/j.ins.2016.08.039
Article Google Scholar
Yue P, Gang C, Ming X, Wang C, Xie J (2017) Multi-label learning by exploiting label correlations with LDA. In: 2017 IEEE 29th international conference on tools with artificial intelligence (ICTAI), 2017. https://doi.org/10.1109/ICTAI.2017.00036
Zhu Y, Kwok JT, Zhou ZH (2017) Multi-label learning with global and local label correlation. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2017.2785795
Article Google Scholar
Ma J, Chow Tws (2018) Topic-based algorithm for multilabel learning with missing labels. IEEE Trans Neural Netw Learn Syst 99:1–15. https://doi.org/10.1109/TNNLS.2018.2874434
Article Google Scholar
Zhu P, Xu Q, Hu Q, Zhang C, Zhao H (2018) Multi-label feature selection with missing labels. Pattern Recognit: J Pattern Recognit Soc. https://doi.org/10.1016/j.patcog.2017.09.036
Article Google Scholar
Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. AAAI Press, Palo Alto, CA
Book Google Scholar
Ma J, Tian Z, Zhang H, Chow Tws (2017) Multi-label low-dimensional embedding with missing labels. Knowl-Based Syst 137(dec.1):65–82
Article Google Scholar
Ma F, Huang SL, Zhang L (2021) An efficient approach for audio-visual emotion recognition with missing labels and missing modalities. In: 2021 IEEE international conference on multimedia and Expo (ICME)
Stawicki S, Slezak Dominik (2013) Recent advances in decision bireducts: complexity, heuristics and streams
Lin TY (1998) Granular computing on binary relation I: data mining and neighborhood systems, II: rough set representations and belief functions. In: Rough sets in knowledge discovery 1: methodology and applications
Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci Int J 178(18):3577–3594. https://doi.org/10.1016/j.ins.2008.05.024
Article MathSciNet MATH Google Scholar
Zhang ML, Zhou ZH (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048. https://doi.org/10.1016/j.patcog.2006.12.019
Article MATH Google Scholar
Ma Z, Nie F, Yang Y, Uijlings Jrr, Sebe N (2012) Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans Multimedia 14(4):1021–1030. https://doi.org/10.1109/TMM.2012.2187179
Article Google Scholar
Hutter M (2002) Robust feature selection using distributions of mutual information
Nie F, Huang H, Xiao C, Ding Chq (2010) Efficient and robust feature selection via joint $\ell$2, 1-norms minimization. In: International conference on neural information processing systems
Demiar Janez, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004, Hebei, China
Shunpan Liang, Ze Liu, Dianlong You & Weiwei Pan

Authors

Shunpan Liang
View author publications
You can also search for this author inPubMed Google Scholar
Ze Liu
View author publications
You can also search for this author inPubMed Google Scholar
Dianlong You
View author publications
You can also search for this author inPubMed Google Scholar
Weiwei Pan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Shunpan Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, S., Liu, Z., You, D. et al. Online multi-label stream feature selection based on neighborhood rough set with missing labels. Pattern Anal Applic 25, 1025–1039 (2022). https://doi.org/10.1007/s10044-022-01067-2

Download citation

Received: 06 May 2021
Accepted: 12 March 2022
Published: 15 April 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s10044-022-01067-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online multi-label stream feature selection based on neighborhood rough set with missing labels

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

ASFS: A novel streaming feature selection for multi-label data based on neighborhood rough set

Dynamic multi-label feature selection algorithm based on label importance and label correlation

Multi-label feature selection based on fuzzy neighborhood rough sets

Availability of data and material

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now