Selective AnDE for large data learning: a low-bias memory constrained approach

Chen, Shenglei; Martínez, Ana M.; Webb, Geoffrey I.; Wang, Limin

doi:10.1007/s10115-016-0937-9

Selective AnDE for large data learning: a low-bias memory constrained approach

Regular Paper
Published: 31 March 2016

Volume 50, pages 475–503, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Shenglei Chen ORCID: orcid.org/0000-0002-2936-9587^1,2,3,
Ana M. Martínez^2,4,
Geoffrey I. Webb² &
…
Limin Wang³

381 Accesses
Explore all metrics

Abstract

Learning from data that are too big to fit into memory poses great challenges to currently available learning approaches. Averaged n-Dependence Estimators (AnDE) allows for a flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for learning from large quantities of data. Memory requirement in AnDE, however, increases combinatorially with the number of attributes and the parameter n. In large data learning, number of attributes is often large and we also expect high n to achieve low-bias classification. In order to achieve the lower bias of AnDE with higher n but with less memory requirement, we propose a memory constrained selective AnDE algorithm, in which two passes of learning through training examples are involved. The first pass performs attribute selection on super parents according to available memory, whereas the second one learns an AnDE model with parents only on the selected attributes. Extensive experiments show that the new selective AnDE has considerably lower bias and prediction error relative to A$n'$DE, where $n' = n-1$, while maintaining the same space complexity and similar time complexity. The proposed algorithm works well on categorical data. Numerical data sets need to be discretized first.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Lazy One-Dependence Classification Algorithm Based on Selective Patterns

Online Markov Blanket Learning for High-Dimensional Data

Article 05 July 2022

Incremental ELMVIS for Unsupervised Learning

References

Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Brain D, Webb GI (2002) The need for low bias algorithms in classification learning from large data sets. In: Elomaa T, Mannila H, Toivonen H (eds) Proceedings of the 6th European Conference on Principles of data mining and knowledge discovery. Springer, pp 62–73
Cestnik B (1990) Estimating probabilities: a crucial task in machine learning. ECAI 90:147–149
Google Scholar
Chen S, Martinez AM, Webb GI (2014) Highly scalable attribute selection for averaged one-dependence estimators. In: Proceedings of the 18th Pacific-Asia conference on knowledge discovery and data mining, pp 86–97. Springer
Domingos P, Pazzani M (1996) Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of 13th international conference on machine learning, pp 105–112
Duda RO, Hart PE (1973) Pattern classification and scene analysis, 1st edn. Wiley, New York
MATH Google Scholar
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence, pp 1022–1029
Flores M, Gmez J, Martnez A, Puerta J (2011) Handling numeric attributes when comparing bayesian network classifiers: does the discretization method matter? Appl Intell 34(3):372–385
Article Google Scholar
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163
Article MATH Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. doi:10.1145/1656274.1656278
Article Google Scholar
Schmidtmann I, Hammer G, Sariyar M, Gerhold-Ay A (2009) Evaluation des krebsregisters nrw—schwerpunkt record linkage—abschlussbericht. Tech. rep., Institut für medizinische Biometrie, Epidemiologie und Informatik, Universitätsmedizin Mainz
Jiang L, Zhang H (2006) Weightily averaged one-dependence estimators. In: PRICAI 2006: trends in artificial intelligence, pp 970–974. Springer
Kaluža B, Mirchevska V, Dovgan E, Luštrek M, Gams M (2010) An agent-based approach to care in independent living. In: Proceedings of the first international joint conference on ambient intelligence. Am I’10, Springer, Berlin, pp 177–186
Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the thirteenth international conference on machine learning, pp 275–283. Morgan Kaufman Publishers, Inc
MacKay DJ (2003) Information theory, inference and learning algorithms. Cambridge University Press, Cambridge
MATH Google Scholar
Petitjean F, Inglada J, Gançarski P (2012) Satellite image time series analysis under time warping. IEEE Trans Geosci Remote Sens 50(8):3081–3095
Article Google Scholar
Reiss A, Stricker D (2012) Creating and benchmarking a new dataset for physical activity monitoring. In: Proceedings of the 5th international conference on PErvasive Technologies Related to Assistive Environments, PETRA ’12, pp 40:1–40:8. ACM, New York, NY, USA
Rish I (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, pp 41–46
Sahami M (1996) Learning limited dependence Bayesian classifiers. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 335–338
Sonnenburg S, Franc V (2010) COFFIN: a computational framework for linear SVMs. In: Proc. ICML 2010
Tsang IW, Kwok JT, Cheung PM (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392
MathSciNet MATH Google Scholar
Webb GI, Boughton JR, Wang Z (2005) Not so naive bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24
Article MATH Google Scholar
Webb GI, Boughton JR, Zheng F, Ting KM, Salem H (2012) Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification. Mach Learn 86(2):233–272
Article MathSciNet MATH Google Scholar
Yang Y, Korb K, Ting KM, Webb GI (2005) Ensemble selection for superparent-one-dependence estimators. In: AI 2005: advances in artificial intelligence, pp 102–112. Springer
Yang Y, Webb GI, Cerquides J, Korb KB, Boughton J, Ting KM (2007) To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans Knowl Data Eng 19(12):1652–1665
Article Google Scholar
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. Proc Twent Int Conf Mach Learn 3:856–863
Google Scholar
Zaidi NA, Webb GI (2013) Fast and effective single pass Bayesian learning. In: Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds) Proceedings of the 17th Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 149–160
Zheng F, Webb GI (2007) Finding the right family: parent and child selection for averaged one-dependence estimators. In: Machine learning: ECML 2007, pp 490–501. Springer
Zheng F, Webb GI, Suraweera P, Zhu L (2012) Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning. Mach Learn 87(1):93–125
Article MathSciNet MATH Google Scholar
Zheng Z, Webb GI (2000) Lazy learning of Bayesian rules. Mach Learn 41(1):53–84
Article Google Scholar

Download references

Acknowledgments

This research has been supported by the Australian Research Council under Grant DP140100087, Asian Office of Aerospace Research and Development, Air Force Office of Scientific Research under contract FA23861214030, National Natural Science Foundation of China under Grant 61202135, 61272209, Natural Science Foundation of Jiangsu, China under Grant BK20130735, Natural Science Foundation of Jiangsu Higher Education Institutions of China under Grant 14KJB520019, 13KJB520011, 13KJB520013, the open project program of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Priority Academic Program Development of Jiangsu Higher Education Institutions. This research has also been supported in part by the Monash e-Research Center and eSolutions-Research Support Services through the use of the Monash Campus HPC Cluster and the LIEF Grant. This research was also undertaken on the NCI National Facility in Canberra, Australia, which is supported by the Australian Commonwealth Government.

Author information

Authors and Affiliations

Department of E-Commerce, Nanjing Audit University, Nanjing, 211815, China
Shenglei Chen
Faculty of Information Technology, Monash University, Melbourne, VIC, 3800, Australia
Shenglei Chen, Ana M. Martínez & Geoffrey I. Webb
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
Shenglei Chen & Limin Wang
Aalborg University, 9220, Aalborg, Denmark
Ana M. Martínez

Authors

Shenglei Chen
View author publications
You can also search for this author inPubMed Google Scholar
Ana M. Martínez
View author publications
You can also search for this author inPubMed Google Scholar
Geoffrey I. Webb
View author publications
You can also search for this author inPubMed Google Scholar
Limin Wang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Shenglei Chen.

Appendices

Appendix 1: Table of RMSE

See Table 8.

Table 8 RMSE of involved algorithms on 15 large data sets

Full size table

Appendix 2: Table of zero-one loss

See Table 9.

Table 9 Zero-one loss of involved algorithms on 15 large data sets

Full size table

Appendix 3: Table of bias and variance

See Table 10.

Table 10 Bias and Variance decomposition of involved algorithms on 15 large data sets

Full size table

Appendix 4: Table of computing time

See Table 11.

Table 11 Computing time of involved algorithms on 15 large data sets

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, S., Martínez, A.M., Webb, G.I. et al. Selective AnDE for large data learning: a low-bias memory constrained approach. Knowl Inf Syst 50, 475–503 (2017). https://doi.org/10.1007/s10115-016-0937-9

Download citation

Received: 06 June 2015
Revised: 16 January 2016
Accepted: 16 March 2016
Published: 31 March 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10115-016-0937-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selective AnDE for large data learning: a low-bias memory constrained approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Lazy One-Dependence Classification Algorithm Based on Selective Patterns

Online Markov Blanket Learning for High-Dimensional Data

Incremental ELMVIS for Unsupervised Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Table of RMSE

Appendix 2: Table of zero-one loss

Appendix 3: Table of bias and variance

Appendix 4: Table of computing time

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now