Multiple Bayesian discriminant functions for high-dimensional massive data classification

Zhang, Jianfei; Wang, Shengrui; Chen, Lifei; Gallinari, Patrick

doi:10.1007/s10618-016-0481-y

Multiple Bayesian discriminant functions for high-dimensional massive data classification

Published: 28 October 2016

Volume 31, pages 465–501, (2017)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Jianfei Zhang¹,
Shengrui Wang¹,
Lifei Chen² &
…
Patrick Gallinari³

902 Accesses
12 Citations
Explore all metrics

Abstract

The presence of complex distributions of samples concealed in high-dimensional, massive sample-size data challenges all of the current classification methods for data mining. Samples within a class usually do not uniformly fill a certain (sub)space but are individually concentrated in certain regions of diverse feature subspaces, revealing the class dispersion. Current classifiers applied to such complex data inherently suffer from either high complexity or weak classification ability, due to the imbalance between flexibility and generalization ability of the discriminant functions used by these classifiers. To address this concern, we propose a novel representation of discriminant functions in Bayesian inference, which allows multiple Bayesian decision boundaries per class, each in its individual subspace. For this purpose, we design a learning algorithm that incorporates the naive Bayes and feature weighting approaches into structural risk minimization to learn multiple Bayesian discriminant functions for each class, thus combining the simplicity and effectiveness of naive Bayes and the benefits of feature weighting in handling high-dimensional data. The proposed learning scheme affords a recursive algorithm for exploring class density distribution for Bayesian estimation, and an automated approach for selecting powerful discriminant functions while keeping the complexity of the classifier low. Experimental results on real-world data characterized by millions of samples and features demonstrate the promising performance of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse overlapped linear discriminant analysis

Article 24 November 2022

Partially Supervised Sparse Factor Regression For Multi-Class Classification

Feature selection for high-dimensional data

Article 15 February 2016

Notes

They are available at: Macfee http://www.mcafee.com/ca/mcafee-labs.aspx; Symantec http://www.symantec.com/security_response/publications/threatreport.jsp; Panda http://www.pandasecurity.com/mediacenter/reports/.
The terminologies “feature”, “attribute” and “dimension” are used interchangeably throughout the paper.
We have borrowed the terminology “class-dispersion” from Vilalta and Rish (2003); the dispersion of our case is much more complicated, however, because it is subspace-related.
http://datam.i2r.a-star.edu.sg/datasets/krbd/OvarianCancer/OvarianCancer-NCI-PBSII.html.
http://qwone.com/~jason/20Newsgroups.
http://vxheaven.org.
http://info.usherbrooke.ca/Prospectus/Members/JZhang/project.

References

Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of the thirtieth international conference on very large data bases (VLDB), pp 852–863
Chapter Google Scholar
Aha D, Kibler D (1991) Instance-based learning algorithms. Mach Learn 6:37–66
MATH Google Scholar
Albert MK, Aha DW (1991) Analyses of instance-based learning algorithms. In: Proceedings of the ninth national conference on artificial intelligence (AAAI), pp 553–558
Atashpaz-Gargari E, Sima C, Braga-Neto UM, Dougherty ER (2013) Relationship between the accuracy of classifier error estimation and complexity of decision boundary. Pattern Recognit 46(5):1315–1322
Article Google Scholar
Bengio Y, Bengio S (1999) Modeling high-dimensional discrete data with multi-layer neural networks. In: The twelfth advances in neural information processing systems (NIPS), pp 400–406
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Book Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Chang CC, Lin CJ (2011) Libsvm: a library of the support vector machines. ACM Trans Intell Syst Technol 2:27
Article Google Scholar
Chen L, Wang S (2012a) Automated feature weighting in naive Bayes for high-dimensional data classification. In: Proceedings of the twenty-first ACM international conference on information and knowledge management (CIKM), pp 1243–1252
Chen L, Wang S (2012b) Semi-naive Bayesian classification by weighted kernel density estimation. In: Proceedings of the eighth international conference on advanced data mining and applications (ADMA), pp 260–270
Chapter Google Scholar
Chen L, Jiang Q, Wang S (2012) Model-based method for projective clustering. IEEE Trans Knowl Data Eng 24(7):1291–1305
Article Google Scholar
Dahl GE, Stokes JW, Deng L, Yu D (2013) Large-scale malware classification using random projections and neural networks. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3422–3426
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38
Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Discov 14(1):63–97
Article MathSciNet Google Scholar
Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130
Article Google Scholar
Fan W, Bifet A (2013) Mining big data: current status, and forecast to the future. ACM SIGKDD Explor Newslett 14(2):1–5. doi:10.1145/2481244.2481246
Article Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874
Article MathSciNet Google Scholar
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the thirteenth international joint conference on artificial intelligence (IJCAI), pp 1022–1029
Fortuny EJ, Martens D, Provost F (2013) Predictive modeling with big data: is bigger really better? Big Data 1(4):215–226
Article Google Scholar
Frank E, Hall M, Pfahringer B (2003) Locally weighted naive Bayes. In: Proceedings of the nineteenth annual conference on uncertainty in artificial intelligence (UAI), pp 249–256
Friedman JH (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Min Knowl Discov 1(1):55–77
Article Google Scholar
Gopal S, Yang Y, Bai B, Mizil AN (2012) Bayesian models for large-scale hierarchical classification. In: Proceedings of the twenty-sixth annual conference on neural information processing systems (NIPS), pp 2420–2428
Han EHS, Karypis G (2000) Centroid-based document classification: analysis and experimental results. In: Proceedings of the fourth European conference on principles of knowledge discovery and data mining, pp 424–431
Chapter Google Scholar
Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: The twenty-sixth international conference on very large databases (VLDB), pp 506–515
Hsu CN, Huang HJ, Wong TT (2003) Implications of the dirichlet assumption for discretization of continuous variables in naive Bayesian classifiers. Mach Learn 53(3):235–263
Article Google Scholar
Ifrim G, Bakır G, Weikum G (2008) Fast logistic regression for text categorization with variable-length n-grams. In: Proceedings of the fourteenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 354–362
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing (STOC), pp 604–613
Jing L, Ng MK, Huang JZ (2007) An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19(8):1026–1041
Article Google Scholar
Joachims T (1996) A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. Technical Reports, DTIC Document
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. Springer, Berlin
Google Scholar
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence (UAI), pp 338–345
Kang DK, Silvescu A, Honavar V (2006) RNBL-MN: a recursive naive Bayes learner for sequence classification. In: Proceedings of the eighteenth Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 45–54
Chapter Google Scholar
Kim J, Le DX, Thoma GR (2008) Naive Bayes classifier for extracting bibliographic information from biomedical online articles. In: Proceedings of the international conference on data mining (DMIN), pp 373–378
Kohavi R, Langley P, Yun Y (1997) The utility of feature weighting in nearest-neighbor algorithms. In: Proceedings of the ninth European conference on machine learning (ECML), pp 85–92
Kooij AJ, et al (2007) Chapter 4: regularization with ridge penalties, the lasso, and the elastic net for regression with optimal scaling transformations. In: Prediction accuracy and stability of regression with optimal scaling transformations, pp 65–90
Lee CH, Gutierrez F, Dou D (2011) Calculating feature weights in naive Bayes with kullback–leibler measure. In: Proceedings of the eleventh IEEE international conference on data mining (ICDM), pp 1146–1151
Li P, Shrivastava A, Moore JL, König AC (2011) Hashing algorithms for large-scale learning. In: Proceedings of the twenty-fifth annual conference on neural information processing systems (NIPS), pp 2672–2680
Lin C, Weng RC, Keerthi SS (2007) Trust region Newton methods for large-scale logistic regression. In: Proceedings of the twenty-fourth international conference on machine learning (ICML), pp 561–568
Lin G, Shen C, Shi Q, van den Hengel A, Suter D (2014) Fast supervised hashing with decision trees for high-dimensional data. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1963–1970
Liu S, Trenkler G (2008) Hadamard, khatri-rao, kronecker and other matrix products. Int J Inf Syst Sci 4(1):160–177
MathSciNet MATH Google Scholar
Manevitz LM, Yousef M (2002) One-class svms for document classification. J Mach Learn Res 2:139–154
MATH Google Scholar
Marchiori E (2013) Class dependent feature weighting and k-nearest neighbor classification. In: Proceedings of the eighth IAPR international conference on pattern recognition in bioinformatics (PRIB), pp 69–78
Chapter Google Scholar
Martens D, Provost F (2011) Pseudo-social network targeting from consumer transaction data. Faculty of Applied Economics, University of Antwerp, Belgium
Google Scholar
Martínez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Pattern Anal Mach Intell 23(2):228–233
Article Google Scholar
Masud MM, Khan L, Thuraisingham B (2008) A scalable multi-level feature extraction technique to detect malicious executables. Inf Syst Front 10(1):33–45
Article Google Scholar
Nakajima S, Watanabe S (2005) Generalization error of linear neural networks in an empirical Bayes approach. In: Proceedings of the nineteenth international joint conference on artificial intelligence (IJCAI), pp 804–810
Navon IM, Phua PK, Ramamurthy M (1988) Vectorization of conjugate-gradient methods for large-scale minimization. In: Proceedings of the second ACM/IEEE conference on supercomputing, pp 410–418
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newslett 6(1):90–105
Article Google Scholar
Rani P, Pudi V (2008) RBNBC: Repeat based naive Bayes classifier for biological sequences. In: Proceedings of the eighth IEEE international conference on data mining (ICDM), pp 989–994
Rao J, Wu C (2010) Bayesian pseudo-empirical-likelihood intervals for complex surveys. J R Stat Soc Ser B 72(4):533–544
Article MathSciNet Google Scholar
Seeger M (2006) Bayesian modelling in machine learning: a tutorial review. Technical Reports
Shabtai A, Moskovitch R, Elovici Y, Glezer C (2009) Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey. Inf Secur Tech Rep 14(1):16–29
Article Google Scholar
Straub WO (2009) A brief look at gaussian integrals. Article, Pasadena California. http://www.weylmann.com/gaussian.pdf
Su J, Shirab JS, Matwin S (2011) Large scale text classification using semisupervised multinomial naive Bayes. In: Proceedings of the twenty-eighth international conference on machine learning (ICML), pp 97–104
Tan M, Wang L, Tsang IW (2010) Learning sparse SVM for feature selection on very high dimensional datasets. In: Proceedings of the twenty-seventh international conference on machine learning (ICML), pp 1047–1054
Tan M, Tsang IW, Wang L (2014) Towards ultrahigh dimensional feature selection for big data. J Mach Learn Res 15(1):1371–1429
MathSciNet MATH Google Scholar
Tan S (2008) An improved centroid classifier for text categorization. Exp Syst Appl 35(1–2):279–285
Article Google Scholar
Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581
Article MathSciNet Google Scholar
Vapnik VN (1992) Principles of risk minimization for learning theory. In: Proceedings of the fifth annual conference on neural information processing systems (NIPS), pp 831–838
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw Learn Syst 10(5):988–999
Article Google Scholar
Vapnik VN (2000) The nature of statistical learning theory. Springer, Berlin
Book Google Scholar
Vapnik VN, Levin E, Le Cun Y (1994) Measuring the VC-dimension of a learning machine. Neural Comput 6(5):851–876
Article Google Scholar
Verweij PJ, Van Houwelingen HC (1994) Penalized likelihood in cox regression. Stat Med 13(23–24):2427–2436
Article Google Scholar
Vilalta R, Rish I (2003) A decomposition of classes via clustering to explain and improve naive Bayes. In: Proceedings of the fourteenth European conference on machine learning (ECML), pp 444–455
Chapter Google Scholar
Vilalta R, Achari MK, Eick CF (2003) Class decomposition via clustering: a new framework for low-variance classifiers. In: Proceedings of the third IEEE international conference on data mining (ICDM), pp 673–676
Weinberger K, Dasgupta A, Langford J, Smola A, Attenberg J (2009) Feature hashing for large scale multitask learning. In: Proceedings of the twenty-sixth annual international conference on machine learning (ICML), pp 1113–1120
Xu B, Huang JZ, Williams G, Wang Q, Ye Y (2012) Classifying very high-dimensional data with random forests built from small subspaces. Int J Data Warehous Min 8(2):44–63
Article Google Scholar
Yu S, Fung G, Rosales R, Krishnan S, Rao RB, Dehing-Oberije C, Lambin P (2008) Privacy-preserving Cox regression for survival analysis. In: Proceedings of the fourteenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1034–1042
Yuan YC (2010) Multiple imputation for missing data: concepts and new development (version 9.0). SAS Institute Inc, Rockville
Google Scholar
Zaidi NA, Cerquides J, Carman MJ, Webb GI (2013) Alleviating naive Bayes attribute independence assumption by attribute weighting. J Mach Learn Res 14(1):1947–1988
MathSciNet MATH Google Scholar
Zhang J, Chen L, Guo G (2013) Projected-prototype based classifier for text categorization. Knowl Based Syst 49:179–189
Article Google Scholar
Zhou Z, Chen Z (2002) Hybrid decision tree. Knowl Based Syst 15(8):515–528
Article Google Scholar

Download references

Acknowledgments

We would like to thank Carol Harris for improving this paper significantly. This work has been supported by fundings from the Natural Sciences and Engineering Research Council of Canada (NSERC) to Shengrui Wang under Grant No. 396097-2010, and from the Natural Science Foundation of Fujian Province of China to Lifei Chen under Grant No. 2015J01238. Thanks UPMC—Université Pierre et Marie Curie for a financial support allowing the collaboration between Shengrui Wang and Patrick Gallinari. Shengrui Wang is also partly supported by Natural Science Foundation of China (NSFC) under Grant No. 61170130.

Author information

Authors and Affiliations

ProspectUS Laboratoire, Département d’Informatique, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1, Canada
Jianfei Zhang & Shengrui Wang
School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, 350108, China
Lifei Chen
Laboratoire d’Informatique de Paris 6 (LIP6), Université Pierre et Marie Curie, 75005, Paris, France
Patrick Gallinari

Authors

Jianfei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shengrui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lifei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Gallinari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengrui Wang.

Additional information

Responsible editor: Kristian Kersting.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Wang, S., Chen, L. et al. Multiple Bayesian discriminant functions for high-dimensional massive data classification. Data Min Knowl Disc 31, 465–501 (2017). https://doi.org/10.1007/s10618-016-0481-y

Download citation

Received: 24 August 2015
Accepted: 12 October 2016
Published: 28 October 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10618-016-0481-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple Bayesian discriminant functions for high-dimensional massive data classification

Abstract

Access this article

Similar content being viewed by others

Sparse overlapped linear discriminant analysis

Partially Supervised Sparse Factor Regression For Multi-Class Classification

Feature selection for high-dimensional data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple Bayesian discriminant functions for high-dimensional massive data classification

Abstract

Access this article

Similar content being viewed by others

Sparse overlapped linear discriminant analysis

Partially Supervised Sparse Factor Regression For Multi-Class Classification

Feature selection for high-dimensional data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation