Abstract
The recent growth in the field of data mining and machine learning has remitted into more recognition of outcome prediction and classification. However, the application of these techniques in the field of sports is still unexplored. This paper presents the implementation of data mining and machine learning in sports particularly. Here, machine learning based algorithm to predict the outcome of the badminton tournament has been proposed. We have employed three classifiers, Naïve Bayes with Correlation Based Feature Weighting (NB-CBFW), Composite Hypercubes on Iterated Random Projections (CHIRP) and Hyper Pipes to predict the outcome of Australian Open 2019, Malaysian Open 2019, German Open 2019 and Singapore Open 2019 Badminton tournaments. The outcome prediction is measured in terms of Accuracy, Root Mean Square Error (RMSE), True Positive Rate (TPR), True Negative Rate (TNR), Positive Predicted Value (PPV), Negative Predicted Value (NPV) and Receiver Operating Characteristics (ROC). After implementing the classifiers, it has been observed that NB-CBFW shows excellent accuracy in match outcome prediction as compared to CHIRP and Hyper Pipes.
Similar content being viewed by others
Abbreviations
- \({F}_{i}; {F}_{j}\) :
-
Two different feature variables/attributes
- T:
-
Target/class variable
- \({\mathrm{f}}_{\mathrm{i}}, {\mathrm{f}}_{\mathrm{j}}\) and t:
-
The values of \({\mathrm{F}}_{\mathrm{i}}, {\mathrm{F}}_{\mathrm{j}}\) and T respectively
- m:
-
Total number of feature variables
- \(I \left({F}_{i};T\right)\) :
-
The mutual significance or feature-class/target correlative significance
- \(I \left({F}_{i};{F}_{j}\right)\) :
-
Average mutual redundancy or average feature-feature correlative significance
- \({Q}_{i}\) :
-
Difference between the feature-class correlation and the average feature-feature intercorrelation
- \(\mathrm{NI}\left({\mathrm{F}}_{\mathrm{i}};\mathrm{T}\right)\) :
-
Normalized mutual significance
- N\(\mathrm{I }\left({\mathrm{F}}_{\mathrm{i}};{\mathrm{F}}_{\mathrm{j}}\right)\) :
-
Normalized average mutual redundancy
- \(F{W}_{i}\) :
-
The final weight to the attribute
- Fi :
-
M feature variables with feature values as f1, f2, f3……fm
- \(P(t)\) :
-
Prior probability
- \(P{(f}_{i}|t)\) :
-
Conditional probability
- b:
-
Marginal number of bins
- q:
-
Number of instances
- \({B}_{k}\) :
-
Purity Measure
References
Agarwal S, Yadav L, Mehta S (2017) Cricket team prediction with Hadoop: statistical modelling approach. Procedia Comput Sci 122:525–532
Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Baboota R, Kaur H (2018) Predictive analysis and modelling football results using machine learning approach for English Premier League. Int J Forecast. https://doi.org/10.1016/j.ijforecast.2018.01.003
Barnett T, Brown A, Clarke SR (2006) Developing a tennis model that reflects outcomes of tennis matches. In: proceedings of the 8th Australasian Conference on Mathematics and Computers in Sport, Coolangatta, Queensland, pp 178–188
BFA (2019) https://bwfbadminton.com. Accessed on 23 Sep 2019
BkassinyM LY, Jayaweera SK (2012) A survey on machine learning techniques in cognitive radios. IEEE Commun Surv Tut 15(3):1136–1159
Bradly M (2016) ABC news. https://www.abc.net.au/news/2016-01-21/bradley-corruptioninprofessional-sport-should-be-no-surprise/7101508. Accessed 30 Sep 2019
Buczak AL, Guven E (2015) A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun Surv Tut 18(2):1153–1176. https://doi.org/10.1109/COMST.2015.2494502
Bunker RP, Thabtah F (2019) A machine learning framework for sport result prediction. Appl Comput Inform 15(1):27–33
Careelmont S (2013) Badminton shot classification in compressed video with baseline angled camera. Master Thesis, University of Ghent
Chen B, Wang Z (2007) A statistical method for analysis of technical data of a badminton match based on 2-D Seriate Images. Tsinghua Sci Technol 12(5):594–601
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Cheng G, Zhang Z, Kyebambe MN, Kimbugwe N (2016) Predicting the outcome of NBA playoffs based on the maximum entropy principle. Entropy 18:450
Chu WT, Situmeang S (2017) Badminton video analysis based on Spatiotemporal and Stroke Features. In: Proceedings of the ACM on International Conference on Multimedia Retrieval. Bucharest, Romania, pp 448–451
Delen D, Cogdell D, Kasap N (2012) A comparative analysis of data mining methods in predicting NCAA bowl outcomes. Int J Forecast 28(2):543–552
Ghosh S, Sadhu S, Biswas S, Sarkar D, Sarkar PP (2019) A comparison between different classifiers for tennis match result prediction. Malays J Comput Sci 32(2):97–111
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Berlin
Jain S, Kaur H (2017) Machine learning approaches to predict basketball game outcome. In: 3rd International Conference on Advances in Computing, Communication & Automation (ICACCA) (Fall), Dehradun, India, pp 15–16. doi:https://doi.org/10.1109/icaccaf.2017.8344688.
Jiang L, Li C, Wang S, Zhang L (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39
Jiang L, Zhang L, Yu L, Wang D (2019a) Class-specific attribute weighted naive Bayes. Pattern Recogn 88:321–330
Jiang L, Zhang L, Li C, Wu J (2019b) A correlation-based feature weighting filter for Naive Bayes. IEEE Trans Knowl Data Eng 31(2):201–213
Lazarsfeld PF, Henry N (1968) Latent structure analysis. Houghton Mifflin, Boston
Le DH, Dao LTM (2018) Annotating diseases using human phenotype ontology improves prediction of disease-associated long non-coding RNAs. J Mol Biol 430:2219–2230
Le DH, Pham VH (2017) HGPEC: a Cytoscape app for prediction of novel disease-gene and disease-disease associations and evidence collection based on a random walk on heterogeneous network. BMC Syst Biol 11:61
Le HS, Tran MT, Fujita H, Dey N, Ashour AS, Vo TNN, Le QA, Chu DT (2018a) Dental diagnosis from X-ray images: an expert system based on fuzzy computing. Biomed Signal Process Control 39:64–73. https://doi.org/10.1016/j.bspc.2017.07.005
Le T, Le HS, Vo MT, Lee MY, Baik SW (2018b) A cluster-based boosting algorithm for bankruptcy prediction in a highly imbalanced dataset. Symmetry 10(7):250. https://doi.org/10.3390/sym10070250
Le T, Lee MY, Park JR, Baik SW (2018c) Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset. Symmetry 10(4):79. https://doi.org/10.3390/sym10040079
Le T, Vo B, Baik SW (2018d) Engineering Applications of Applied Efficient algorithms for mining top-rank-k erasable patterns using pruning strategies and the subsume concept. Eng Appl Artif Intell 68:1–9. https://doi.org/10.1016/j.engappai.2017.09.010
Nguyen TTT, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surv Tuts 10(4):56–76. https://doi.org/10.1109/SURV.2008.080406
Overweg H, Popkes AL, Ercole A, Li Y, Hernández-Lobato JM, Zaykov Y, Zhang C (2019) Interpretable outcome prediction with sparse Bayesian neural networks in intensive care. arXiv preprint arXiv:1905.02599
Owramipur F, Eskandarian P, Mozneb FS (2013) Football result prediction with Bayesian Network in Spanish League-Barcelona Team. J Comput Theory Eng 5(5):812–815
Panjan A, Šarabon N, Filipčič A (2010) Prediction of the successfulness of tennis players with machine learning method. Kinesiology 42(1):98–106
Pathak N, Wadhwa H (2016) Applications of modern classification techniques to predict the outcome of ODI cricket. Procedia Comput Sci 87:55–60
Prasetio D, Harlili D (2016) Predicting football match results with logistic regression. In: proceedings of International Conference on Advanced Informatics: Concepts, Theory and Application (ICAICTA), Malaysia, pp 16–19
Prasitio D, Harlili D (2016) Predicting football match results with logistic regression. In: Proceedings of the International Conference on Advanced Informatics: Concepts, Theory and Application (ICAICTA), Penang, Malaysia
Puhun W (2016) The application of data mining algorithm based on association rules in the analysis of football tactics. In: Proceedings of the International Conference on Robots and Intelligent System (ICRIS) pp 418–421
Roan TN, Ali M, Le HS (2018) δ-equality of intuitionistic fuzzy sets: a new proximity measure and applications in medical diagnosis. Appl Intell 48(2):499–525. https://doi.org/10.1007/s10489-017-0986-0
Sharma M (2019a) Cervical cancer prognosis using genetic algorithm and adaptive boosting approach. Health Technol 9(5):877–886
Sharma M (2019b) Improved autistic spectrum disorder estimation using Cfs with greedy stepwise feature selection technique. Int J Inf Tecnol. https://doi.org/10.1007/s41870-019-00335-5
Shaukat K, Luo S, Varadharajan V, Hameed IA, Chen S, Liu D, Li J (2020) Performance comparison and current challenges of using machine learning techniques in cybersecurity. Energies 13:2509
Somboonphokkaphan A, Phimoltares S, Lursinsap C (2009) Tennis winner prediction based on time-series history with neural modeling. In: proceedings of the International MultiConference of Engineers and Computer Scientists IMECS, Hong Kong, pp 18–20
Sturges HA (1926) The choice of a class interval. J Am Stat Assoc 21:65–66
Tax N, Joustra YP (2015) Predicting the Dutch football competition using public data: a machine learning approach. Trans Knowl Data Eng 10(10):1–13
Thabtah F, Zhang L, Abdelhamid (2019) NBA game result prediction using feature analysis and machine learning. Ann Data Sci 6(1):103–116
Vajda S, Karargyris A, Jäger S, Santosh KC, Candemir C, Xue Z, Antani SK, Thoma GR (2018) Feature selection for automatic tuberculosis screening in frontal chest radiographs. J Med Syst 42:146
Wilkinson L, Anand A, Tuan DN (2011) CHIRP: a new classifier based on composite hypercubes on iterated random projections. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 6–14
Witten IH, Eibe F, Hall A (2011) Data mining: practical machine learning tools and techniques. 3rd edn. Morgan Kaufmann, San Francisco
Zhang H, Jiang L, Yu L (2020) Class-specific attribute value weighting for naive bayes. Inf Sci 508:260–274
Funding
There is no funding source.
Author information
Authors and Affiliations
Contributions
Data collection: MS, NK; data analysis: Monika, MS; technical writing: PL, MS, NK.
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sharma, M., Monika, Kumar, N. et al. Badminton match outcome prediction model using Naïve Bayes and Feature Weighting technique. J Ambient Intell Human Comput 12, 8441–8455 (2021). https://doi.org/10.1007/s12652-020-02578-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02578-8