Adaptable multi-phase rules over the infrequent class

Datta, Soma; Mengel, Susan

doi:10.1007/s00500-018-3399-z

Adaptable multi-phase rules over the infrequent class

Focus
Published: 21 July 2018

Volume 22, pages 6067–6076, (2018)
Cite this article

Soft Computing Aims and scope Submit manuscript

150 Accesses
Explore all metrics

Abstract

Decision trees are a classification model that allow rule generation. Depending upon the type of decision tree model, rules may have one to hundreds of conditions and with repeating data attributes over different conditional values causing the rules to be difficult to understand. To achieve more understandable rules, the number of nodes can be minimized to control the depth of the tree and, therefore, the number of conditions in the rules. Further, the study described in this paper seeks to optimize the decision tree for the generation of rules specific to the infrequent class which presents another challenge since the infrequent class may have few instances in the dataset. Rules that are generated using either decision trees or class association mining generally come from the major class of the dataset. These two mining techniques, decision trees and association mining, are utilized together through ensemble learning in an adaptable manner so that they expand and contract to accommodate the characteristics of the dataset. The ensemble learning occurs in phases: a partially generated or minimized decision tree mining phase, and association mining phase, to increase the probability of finding infrequent class rules. The ensemble learning technique developed in this study is found to generate understandable rules with increased coverage and confidence for the infrequent class with balanced or unbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simple and Accurate Classification Method Based on Class Association Rules Performs Well on Well-Known Datasets

TERM: Tree Ensemble Models for Interpretable Rule Mining

Features’ Associations in Fuzzy Ensemble Classifiers

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

ACT (2004) What works in student retention—four-year public institutions. ACT Inc. 10, Iowa City
Google Scholar
Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. In: International conference on very large databases, pp 487–499
Alldrin N, Smith A, Turnbull D (2003) Clustering with EM and K-means. University of San Diego, California
Google Scholar
Baker RSJD, Yacef K (2009) The state of educational data mining in 2009: a review and future visions. J Educ Data Min 1(1):3–17
Google Scholar
Bayer J, Bydzovska H, Geryk J, Obsivac T, Popelinsky L (2012) Predicting drop-out from social behaviour of students. In: Proceedings of the 5th international conference on educational data mining, pp 103–109
Bean J, Eaton B (2001) The psychology underlying successful retention practices. J Coll Stud Retent Res Theory Pract 3:73–89
Article Google Scholar
Boston WE, Ice P, Gibson AM (2011) Comprehensive assessment of student retention in online learning environments. Online J Distance Learn Adm IV(I):1593–1599
Google Scholar
Byers González J, DesJardins S (2002) Artificial neural networks: a new approach for predicting application behavior. Res High Educ 43(2):235–258
Article Google Scholar
Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 1:245–276
Article Google Scholar
Datta S, Mengel S (2015) Multi-phase decision method to generate rules for student retention. J Comput Sci Coll 31(2):65–71
Google Scholar
Datta S, Mengel S (2016) Elastic multi-stage decision rules for infrequent class. In: 3rd international conference on soft computing and machine intelligence (ISCMI), pp 110–114
DeBerard MS, Spielmans GI, Julka DC (2004) Predictors of academic achievement and retention among college freshmen: a longitudinal study. Coll Stud J 38:66–80
Google Scholar
Delen D (2010) A comparative analysis of machine learning techniques for student retention management. Decis Cover Syst 49:498–506
Article Google Scholar
Eitel JML, Baron JD, Devireddy M, Sundararaju V, Jayaprakash, SM (2012) Mining academic data to improve college student retention: an open source perspective. In: International conference on learning analytics and knowledge, pp 139–142
Gaudard M, Ramsey P, Stephens M (2006) Interactive data mining and design of experiments: the JMP partition and custom design platforms. New Haven Group, New Haven
Google Scholar
Hagedorn LS (2005) How to define retention. In: Seidman A (ed) College student retention: formula for student success. Praeger Publishers, Westport, pp 89–106
Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explorations 11(1):10–18
Article Google Scholar
Herzog S (2006) Estimating student retention and degree completion time: Decision trees and neural networks vis-a-vis regression. New Dir Inst Res 131:17–33
Google Scholar
Huo J, Wang X, Lu M, Chen J (2006) Induction of multi-phase decision tree. In: IEEE international conference on systems, man, and cybernetics
Joshi MV, Watson TJ, Agarwal RC (2001) mining needles in a haystack: classifying rare classes via two-phase rule induction. In: ACM SIGMOD
Kaiser HF (1960) The application of electronic computers to factor analysis. Educ Psychol Meas 20:141–151
Article Google Scholar
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken
Book MATH Google Scholar
Kerkvliet J, Nowell C (2005) Does one size fit all? University differences in the influence of wages, financial aid, and integration on student retention. Econ Educ Rev 24:85–95
Article Google Scholar
Kotsiantis S (2009) Educational data mining: a case study for predicting dropout-prone students. Int J Knowl Eng Soft Data Paradig 1(2):101–111
Article Google Scholar
Lin HS (2012) Data mining for student retention management. J Comput Sci Coll 27(4):92–99
Google Scholar
Lu M, Huo J, Chen CLP, Wang X (2009) Multi-phase decision tree based on inter-class and inner-class margin of SVM. In: Proceedings of the IEEE international conference on systems, man, and cybernetics, pp 1875–1880
Luan J (2002) Data mining and its applications in higher education. In: Serban AM, Luan J (eds) knowledge management: building a competitive advantage in higher education. New directions for institutional research, no. 113. Jossey-Bass, San Francisco
Google Scholar
Lykourentzou I, Giannoukos I, Nikolopoulos V, Mpardis G, Loumos V (2009) Dropout prediction in e-learning courses through the combination of machine learning techniques. Comput Educ 53:950–965
Article Google Scholar
Macfadyen LP, Dawson S (2010) Mining LMS data to develop an early warning system for educators: a proof of concept. Comput Educ 54:588–599
Article Google Scholar
Mallinckrodt B, Sedlacek WE (1987) Student retention and the use of campus facilities by race. NASPA J 24:28–32
Google Scholar
Marquez-Vera C, Cano A, Romero C, Ventura S (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl Intell 38:315–330
Article Google Scholar
Mellalieu PJ (2011) Predicting success, excellence and retention from student’s early course performance: progress results from a data mining decision coverage system in a first year tertiary education programme. In: XXIX international conference of the international council for higher education
Nandeshwar A, Menzies T, Nelson A (2011) Learning patterns of university student retention. Expert Syst Appl 38:14984–14996
Article Google Scholar
Nara A, Barlow E, Crisp G (2005a) Student persistence and degree attainment beyond the first year in college: the need for research. In: Seidman A (ed) College student retention. Praeger, Westport, pp 129–153
Google Scholar
Nara A, Barlow E, Crisp G (2005b) Student persistence and degree attainment beyond the first year in college: The need for research. In: Student College (ed) Alan Seidman. Praeger, Retention, pp 129–153
Google Scholar
National Audit Office (2007) Staying the course: the retention of students in higher education
Pittman K (2008) Comparison of data mining techniques used to predict student retention. Doctoral dissertation, Nova Southeastern University, Fort Lauderdale
Schmitt N, Oswald FL, Kim BH, Imus A, Merritt S, Friede A, Shivpuri S (2007) The use of background and ability profiles to predict college student outcomes. J Appl Psychol 92(1):165–179
Article Google Scholar
Senator T (2005) Multi-phase classification. In: Proceedings of the fifth IEEE international conference on data mining, pp 386–393. [NOTE: tie linkage-analysis to clustering]
Sewell W, Wegner E (1970) Selection and context as factors affecting the probability of graduation from college. Am J Sociol 75(4):665–679
Article Google Scholar
Superby JF, Vandamme J-P, Meskens N (2006) Determination of factors influencing the achievement of the first-year university students using data mining methods. In: Workshop on educational data mining
Thomas E, Galambos N (2004) What satisfies students? Mining student-opinion data with regression and decision tree analysis. Res High Educ 45(3):251–269
Article Google Scholar
Tinto V (1975) Dropout from higher education: a theoretical synthesis of recent research. Rev Educ Res 45:89–125
Article Google Scholar
Tinto V (1993) Leaving college: rethinking the causes and curses of student attrition. University of Chicago Press, Chicago
Google Scholar
Tinto V (2006) Research and practice of student retention: what next? J Coll Stud Retent 8(1):1–19
Article Google Scholar
Tinto V, Russo P, Kadel S (1994) Constructing educational communities in challenging circumstances. Community Coll J 64(1):26–30
Google Scholar
Van Nelson C, Neff K (1990) Comparing and contrasting neural network solutions to classical statistical solutions. Paper presented at the Midwestern Educational Research Association Conference, Chicago, Oct. 19:1990
Google Scholar
Wang Xizhao, He Qiang, Chen Degang (2005) A genetic algorithm for solving the inverse problem of support vector machines. Sci Direct 68:225–238
Google Scholar
Witten IH, Frank E (2005) Data mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Wu X, Kumar V, Quinlan JR, Ghosg J, Yang Q, Motada H, McLachlan GJ, Ng A, Lui B, Yu PS, Zhou Z, Steibach M, Hand DJ, Steinberg D (2007) Top 10 algorithms in data mining. In: IEEE, international conference, survey paper
UCI Repository of machine learning databases and domain theories. FTP address: ftp://www.ftp.ics.uci.edu/pub/machine-learning-databases. UC Irvine Machine Learning Repository (UCI) http://www.archive.ics.uci.edu/ml/. Accessed 15 Apr 2018
Yadav KS, Bharadway B, Pal S (2012) Mining Education data to predict student’s retention: a comparative study. Int J Comput Sci Inf Secur 10(2):113–117
Google Scholar
Yu HC, DiGangi S, Jannasch-Pennell A, Kaprolet C (2010) A data mining approach for identifying predictors of student retention from sophomore to junior year. J Data Sci 8:307–325
Google Scholar
Yu H, Ni J, Dan Y, Xu S (2012) Mining and integrating reliable decision rules for imbalance cancer gene expression data sets. Tsinghua Sci Technol 17(6): 666–673. ISBN: 1007-021407/10
Zhang Y, Oussena S, Clark T, Kim HT (2010) Use data mining to improve student retention in higher education—a case study. In: 12th international conference on enterprise information systems 2010, Paper Nr-129

Download references

Author information

Authors and Affiliations

Software Engineering, University of Houston Clear Lake, Houston, TX, USA
Soma Datta
Computer Science, Texas Tech University, Lubbock, TX, USA
Susan Mengel

Authors

Soma Datta
View author publications
You can also search for this author in PubMed Google Scholar
Susan Mengel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soma Datta.

Ethics declarations

Conflict of interest

All the authors declare that they have no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by S. Deb, T. Hanne, K. C. Wong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Datta, S., Mengel, S. Adaptable multi-phase rules over the infrequent class. Soft Comput 22, 6067–6076 (2018). https://doi.org/10.1007/s00500-018-3399-z

Download citation

Published: 21 July 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s00500-018-3399-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptable multi-phase rules over the infrequent class

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Simple and Accurate Classification Method Based on Class Association Rules Performs Well on Well-Known Datasets

TERM: Tree Ensemble Models for Interpretable Rule Mining

Features’ Associations in Fuzzy Ensemble Classifiers

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Adaptable multi-phase rules over the infrequent class

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Simple and Accurate Classification Method Based on Class Association Rules Performs Well on Well-Known Datasets

TERM: Tree Ensemble Models for Interpretable Rule Mining

Features’ Associations in Fuzzy Ensemble Classifiers

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation