A weighted ensemble-based active learning model to label microarray data

De, Rajonya; Chakraborty, Anuran; Chatterjee, Agneet; Sarkar, Ram

doi:10.1007/s11517-020-02238-1

A weighted ensemble-based active learning model to label microarray data

Original Article
Published: 08 August 2020

Volume 58, pages 2427–2441, (2020)
Cite this article

Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Rajonya De¹,
Anuran Chakraborty ORCID: orcid.org/0000-0002-7682-9072¹,
Agneet Chatterjee¹ &
…
Ram Sarkar¹

294 Accesses
Explore all metrics

Abstract

Classification of cancerous genes from microarray data is an important research area in bioinformatics. Large amount of microarray data are available, but it is very costly to label them. This paper proposes an active learning model, a semi-supervised classification approach, to label the microarray data using which predictions can be made even with lesser amount of labeled data. Initially, a pool of unlabeled instances is given from which some instances are randomly chosen for labeling. Successive selection of instances to be labeled from unlabeled pool is determined by selection algorithms. The proposed method is devised following an ensemble approach to combine the decisions of three classifiers in order to arrive at a consensus which provides a more accurate prediction of the class label to ensure that each individual classifier learns in an uncorrelated manner. Our method combines the heuristic techniques used by an active learning algorithm to choose training samples with the multiple learning paradigm attained by an ensemble to optimize the search space by choosing efficiently from an already sparse learning pool. On evaluating the proposed method on 10 microarray datasets, we achieve performance which is comparable with state-of-the-art methods. The code and datasets are given at https://github.com/anuran-Chakraborty/Active-learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active Learning Using Fuzzy k-NN for Cancer Classification from Microarray Gene Expression Data

A Diverse Meta Learning Ensemble Technique to Handle Imbalanced Microarray Dataset

Iterative ensemble feature selection for multiclass classification of imbalanced microarray data

Article Open access 04 July 2016

References

Dasgupta S, Hsu DJ, Monteleoni C (2008) “A general agnostic active learning algorithm,” in Advances in neural information processing systems 20, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds. Curran Associates, Inc., pp. 353–360
Krishnamurthy V (2002) Algorithms for optimal scheduling and management of hidden Markov model sensors. IEEE Trans Signal Process 50(6):1382–1397. https://doi.org/10.1109/TSP.2002.1003062
Article Google Scholar
McCallum A, Nigam K (1998) “Employing EM and pool-based active learning for text classification,” in Proceedings of the Fifteenth International Conference on Machine Learning, pp. 350–358
Settles B, Craven M (2008) “An analysis of active learning strategies for sequence labeling tasks,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079
Holub A, Perona P, Burl MC (2008) “Entropy-based active learning for object recognition,” in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8, doi: https://doi.org/10.1109/CVPRW.2008.4563068
Mitra P, Murthy CA, Pal SK (2004) A probabilistic active support vector learning algorithm. IEEE Trans Pattern Anal Mach Intell 26(3):413–418. https://doi.org/10.1109/TPAMI.2004.1262340
Article PubMed Google Scholar
Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28(2–3):133–168. https://doi.org/10.1023/A:1007330508534
Article Google Scholar
Zhang C, Chen T (2002) “An active learning framework for content-based information retrieval,” IEEE Trans Multimed, vol. 4, pp. 260–268
Hoi SCH, Jin R, Lyu MR (2006) “Large-scale text categorization by batch mode active learning,” in Proceedings of the 15th International Conference on World Wide Web, pp. 633–642, doi: https://doi.org/10.1145/1135777.1135870
Warmuth MK, Liao J, Rätsch G, Mathieson M, Putta S, Lemmen C (2003) Active learning with support vector machines in the drug discovery process. J Chem Inf Comput Sci 43(2):667–673. https://doi.org/10.1021/ci025620t
Article CAS PubMed Google Scholar
Liu Y (2004) Active learning with support vector machine applied to gene expression data for cancer classification. J Chem Inf Comput Sci 44(6):1936–1941. https://doi.org/10.1021/ci049810a
Article CAS PubMed Google Scholar
Hoi SCH, Jin R, Zhu J, Lyu MR (2006) “Batch mode active learning and its application to medical image classification,” in Proceedings of the 23rd International Conference on Machine Learning, pp. 417–424, doi: https://doi.org/10.1145/1143844.1143897
Ruskin HJ (2016) Computational modeling and analysis of microarray data: new horizons. Microarrays (Basel, Switzerland) 5(4):26. https://doi.org/10.3390/microarrays5040026
Article CAS Google Scholar
Epstein CB, Butow RA (2000) Microarray technology - enhanced versatility, persistent challenge. Curr Opin Biotechnol 11(1):36–41. https://doi.org/10.1016/s0958-1669(99)00065-8
Article CAS PubMed Google Scholar
Fan J, Ren Y (2006) Statistical analysis of DNA microarray data in cancer research. Clin Cancer Res 12(15):4469–4473. https://doi.org/10.1158/1078-0432.CCR-06-1033
Article CAS PubMed Google Scholar
Schalper KA, Velcheti V, Carvajal D, Wimberly H, Brown J, Pusztai L, Rimm DL (2014) In situ tumor PD-L1 mRNA expression is associated with increased TILs and better outcome in breast carcinomas. Clin Cancer Res 20(10):2773–2782. https://doi.org/10.1158/1078-0432.CCR-13-2702
Article CAS PubMed Google Scholar
Xu P, Brock GN, Parrish RS (2009) Modified linear discriminant analysis approaches for classification of high-dimensional microarray data. Comput Stat Data Anal 53(5):1674–1687. https://doi.org/10.1016/j.csda.2008.02.005
Article Google Scholar
Kittler J, Hatef M, Duin RPW, Matas J (Mar. 1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239. https://doi.org/10.1109/34.667881
Article Google Scholar
Joshi AJ, Porikli F, Papanikolopoulos N (2009) “Multi-class active learning for image classification,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2372–2379, doi: https://doi.org/10.1109/CVPR.2009.5206627
Ali K (1995) “On the link between error correlation and error reduction in decision tree ensembles,”
Xu L, Krzyzak A, Suen CY (May 1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435. https://doi.org/10.1109/21.155943
Article Google Scholar
Ho TK, Hull JJ, Srihari SN (1994) Decision combination in multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 16(1):66–75. https://doi.org/10.1109/34.273716
Article Google Scholar
Wolpert DH (2011) Stacked generalization. Neural Netw 5(2):241–260. https://doi.org/10.1360/zd-2013-43-6-1064
Article Google Scholar
Cao J, Ahmadi M, Shridhar M (1995) Recognition of handwritten numerals with multiple feature and multistage classifier. Pattern Recogn 28(2):153–160. https://doi.org/10.1016/0031-3203(94)00094-3
Article Google Scholar
Kimura F, Shridhar M (1991) Handwritten numerical recognition based on multiple algorithms. Pattern Recogn 24(10):969–983. https://doi.org/10.1016/0031-3203(91)90094-L
Article Google Scholar
Franke J, Mandler E (1992) “A comparison of two approaches for combining the votes of cooperating classifiers,” in Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems, pp. 611–614, doi: https://doi.org/10.1109/ICPR.1992.201786
Bagui SC, Pal NR (1995) A multistage generalization of the rank nearest neighbor classification rule. Pattern Recogn Lett 16(6):601–614. https://doi.org/10.1016/0167-8655(95)80006-F
Article Google Scholar
Hashem S, Schmeiser B (May 1995) Improving model accuracy using optimal linear combinations of trained neural networks. IEEE Trans Neural Netw 6(3):792–794. https://doi.org/10.1109/72.377990
Article CAS PubMed Google Scholar
Kittler J, Hater M, Duin RPW (1996) “Combining classifiers,” in Proceedings of 13th International Conference on Pattern Recognition, vol. 2, pp. 897–901 vol.2, doi: https://doi.org/10.1109/ICPR.1996.547205
Kittler TWJ, Hojjatoleslami A (1997) “Weighting factors in multiple expert fusion,” in Proc. British Machine Vision Conf., Colchester, England, pp. 41–50
Rogova G (1994) Combining the results of several neural network classifiers. Neural Netw 7(5):777–781. https://doi.org/10.1016/0893-6080(94)90099-X
Article Google Scholar
Tresp V, Taniguchi M (1995) “Combining estimators using non-constant weighting functions,” in Advances in Neural Information Processing Systems 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. MIT Press, pp. 419–426
Ghosh M, Begum S, Sarkar R, Chakraborty D, Maulik U (2019) Recursive memetic algorithm for gene selection in microarray data. Expert Syst Appl 116:172–185. https://doi.org/10.1016/j.eswa.2018.06.057
Article Google Scholar
Ghosh M, Adhikary S, Ghosh KK, Sardar A, Begum S, Sarkar R (Jan. 2019) Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods. Med Biol Eng Comput 57(1):159–176. https://doi.org/10.1007/s11517-018-1874-4
Article PubMed Google Scholar
Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. https://doi.org/10.1016/j.patcog.2007.02.007
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. https://doi.org/10.1109/TPAMI.2005.159
Article PubMed Google Scholar
Singh PK, Sarkar R, Nasipuri M (2016) Significance of non-parametric statistical tests for comparison of classifiers over multiple datasets. Int J Comput Sci Math. https://doi.org/10.1504/IJCSM.2016.080073
Singh PK, Sarkar R, Nasipuri M (2015) Statistical validation of multiple classifiers over multiple datasets in the field of pattern recognition. Int J Appl Pattern Recognit. https://doi.org/10.1504/ijapr.2015.068929

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, Jadavpur University, 188, Raja Subodh Chandra Mallick Road, Kolkata, 700032, India
Rajonya De, Anuran Chakraborty, Agneet Chatterjee & Ram Sarkar

Authors

Rajonya De
View author publications
You can also search for this author inPubMed Google Scholar
Anuran Chakraborty
View author publications
You can also search for this author inPubMed Google Scholar
Agneet Chatterjee
View author publications
You can also search for this author inPubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Anuran Chakraborty.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

De, R., Chakraborty, A., Chatterjee, A. et al. A weighted ensemble-based active learning model to label microarray data. Med Biol Eng Comput 58, 2427–2441 (2020). https://doi.org/10.1007/s11517-020-02238-1

Download citation

Received: 12 November 2019
Accepted: 26 July 2020
Published: 08 August 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11517-020-02238-1

Keywords

Profiles

Ram Sarkar View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A weighted ensemble-based active learning model to label microarray data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Active Learning Using Fuzzy k-NN for Cancer Classification from Microarray Gene Expression Data

A Diverse Meta Learning Ensemble Technique to Handle Imbalanced Microarray Dataset

Iterative ensemble feature selection for multiclass classification of imbalanced microarray data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now