Abstract
Labor safety at workplaces is a critical human rights concern in all industries around the world. Coal mines are considered one of the most dangerous workplaces and every year thousands of miners around the world die or get severe injuries in mining accidents. To make efficient technology-based accident mitigation plans for such work environments, the analysis of reasons which cause these accidents is of great value. This study contributes to the coal mines domain and proposed an approach using machine learning techniques to identify the reasons for the accidents. In our approach, a dataset containing the causes for accidents in text form that occurred in the past in coal mines has been used. We performed preprocessing to clean text data and then extract features to train the machine learning model using the term frequency-inverse document frequency (TF-IDF) technique. This study proposed the voting-based hybrid classifier (VHC) which is a combination of three individual machine learning models random forest, support vector classifier, and logistic regression using soft voting criteria. Evaluation of the model has been done in terms of accuracy, precision, recall, and f1 score. VHC outperforms all other stat of the art models by achieving the highest 0.96 accuracy score.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availibility
The used dataset in experiments is publicly available at Kaggle on the given linkhttps://www. kaggle.com/furqanrustam118/coal-minin-datase.
References
Ajayi A, Oyedele L, Delgado JM, Akanbi L, Bilal M, Akinade O, Olawale O (2019) Big data platform for health and safety accident prediction. World J Sci Technol Sustain Dev 2019:1
Bei Y (2008) An evaluation of text classification methods for literary study. Liter Linguist Comput 23(3):327–343
Bennett JD, Passmore DL (1984) Probability of death, disability, and restricted work activity in united states underground bituminous coal mines, 1975–1981. J Saf Res 15(2):69–76
Biau G, Scornet E (2016) A random forest guided tour. TEST 25(2):197–227
Bocca FF, Rodrigues LHA (2016) The effect of tuning, feature engineering, and feature selection in data mining applied to rainfed sugarcane yield modelling. Comput Electron Agric 128:67–76
Braga PL, Oliveira ALI, Meira SRL (2007a) Software effort estimation using machine learning techniques with robust confidence intervals. In: 7th international conference on hybrid intelligent systems (HIS 2007), pp 352–357, IEEE
Braga PL, Oliveira ALI, Ribeiro GHT, Meira SRL(2007b) Bagging predictors for estimation of software project effort. In: 2007 international joint conference on neural networks, pp 1595–1600, IEEE
Chen C-W, Tseng S-P, Wang J-F (2021) Outpatient text classification system using lstm. J Inf Sci Eng 37:2
Cheng M-Y, Kusoemo D, Gosno RA (2020) Text mining-based construction site accident classification using hybrid supervised machine learning. Autom Constr 118:103265
Chu C, Jain R, Muradian N, Zhang G (2016) Statistical analysis of coal mining safety in china with reference to the impact of technology. J South Afr Inst Min Metall 116(1):73–78
Elorrieta F, Eyheramendy S, Jordán A, Dékány I, Catelan M, Angeloni R, Alonso-García J, Contreras-Ramos R, Gran F, Hajdu G et al (2016) A machine learned classifier for rr lyrae in the vvv survey. Astron Astrophys 595:A82
Fang W, Luo H, Xu S, Love PED, Lu Z, Ye C (2020) Automated text classification of near-misses from safety reports: an improved deep learning approach. Adv Eng Inf 44:101060
Gerassis S, Saavedra Á, Taboada J, Alonso E, Bastante FG (2020) Differentiating between fatal and non-fatal mining accidents using artificial intelligence techniques. Int J Min Reclam Environ 34(10):687–699
Hai-bin LIU, Hui LGRH (2007) Study on characteristics of coal mine intrinsic safety and strategies of management. China Saf Sci J (CSSJ) 4:12
Hu X, Downie JS, Ehmann AF (2009) Lyric text mining in music mood classification. Am Music 183(5,049):2–209
Huang YJ, Powers R, Montelione GT (2005) Protein nmr recall, precision, and f-measure scores (rpf scores): structure quality assessment measures based on information retrieval statistics. J Am Chem Soc 127(6):1665–1674
Hull BP, Leigh J, Driscoll TR, Mandryk J (1996) Factors associated with occupational injury severity in the new south wales underground coal mining industry. Saf Sci 21(3):191–204
Husain V (2005) Obstacles in the sustainable development of artisanal and small-scale mines in Pakistan and remedial measures. Geol Soc Lond Spec Publ 250(1):135–140
Indrasiri RD, Pubudu L, Lee E, Rupapara V, Rustam F, Imran A (2021) Malicious traffic detection in iot and local networks using stacked ensemble classifier. Comput Mater Continua 71(1):489–515
Issac B, Jap WJ(2009) Implementing spam detection using bayesian and porter stemmer keyword stripping approaches. In: TENCON 2009-2009 IEEE Region 10 Conference, pp 1–5, IEEE
Jamil R, Ashraf I, Rustam F, Saad E, Mehmood A, Choi GS (2021) Detecting sarcasm in multi-domain datasets using convolutional neural networks and long short term memory network model. PeerJ Comput Sci 7:e645
Lawrence KD, Marsh LC (1984) Robust ridge estimation methods for predicting us coal mining fatalities. Commun Stat Theory Methods 13(2):139–149
Lin B, Raza MY (2019) Analysis of energy related co2 emissions in Pakistan. J Clean Prod 219:981–993
Punmiya R, Choe S (2019) Energy theft detection using gradient boosting theft detector with feature engineering-based preprocessing. IEEE Trans Smart Grid 10(2):2326–2329
Rupapara V, Rustam F, Shahzad HF, Mehmood A, Ashraf I, Choi GS (2021) Impact of smote on imbalanced text features for toxic comments classification using rvvc model. IEEE Access 2021:2
Rustam F, Ashraf I, Mehmood A, Ullah S, Choi GS (2019) Tweets classification on the base of sentiments for us airline companies. Entropy 21(11):1078
Rustam F, Mehmood A, Ahmad M, Ullah S, Khan DM, Choi GS (2020) Classification of shopify app user reviews using novel multi text features. IEEE Access 8:30234–30244
Sanmiquel L, Rossell JM, Vintró C (2015) Study of spanish mining accidents using data mining techniques. Saf Sci 75:49–55
Sanmiquel L, Bascompta M, Rossell JM, Anticoi HF, Guash E (2018) Analysis of occupational accidents in underground and surface mining in Spain using data-mining techniques. Int J Environ Res Public Health 15(3):462
Sarkar BK, Sana SS (2019) An e-healthcare system for disease prediction using hybrid data mining technique. J Modell Manage 2019:5
Sarkar BK, Sana SS (2020) A conceptual distributed framework for improved and secured healthcare system. Int J Healthcare Manage 13(sup1):74–87
Sarkar BK, Sana SS, Chaudhuri K (2012) A genetic algorithm-based rule extraction system. Appl Soft Comput 12(1):238–254
Sarkar S, Ejaz N, Kumar M, Maiti J (2020) Root cause analysis of incidents using text clustering and classification algorithms. In: Proceedings of ICETIT 2019, pp 707–718. Springer, Berlin
Tarshizi E, Buche MW, Inti B, Chappidi R (2018) Text mining analysis of us department of labor’s MSHA fatal accident reports for coal mining. Mining Eng 70:4
Ting SL, Ip WH, Tsang AHC et al (2011) Is naive bayes a good classifier for document classification. Int J Softw Eng Appl 5(3):37–46
Wang C, Zhang CL, Liu L (2014) Analysis on coal mine safety status in china and its countermeasures. Appl Mech Mater 448:3814–3817
Zhao Y, Gao J, Yang X (2005) A survey of neural network ensembles. In: 2005 international conference on neural networks and brain, vol 1, pp 438–442, IEEE
Zhong B, Pan X, Love PED, Ding L, Fang W (2020) Deep learning and network analysis: classifying and visualizing accident narratives in construction. Autom Constr 113:103089
Acknowledgements
This research was supported by the Florida Center for Advanced Analytics and Data Science funded by Ernesto.Net (under the Algorithms for Good Grant)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Javaid, A., Siddique, M.A., Reshi, A.A. et al. Coal mining accident causes classification using voting-based hybrid classifier (VHC). J Ambient Intell Human Comput 14, 13211–13221 (2023). https://doi.org/10.1007/s12652-022-03779-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-03779-z