Skip to main content

Weak Classifiers Performance Measure in Handling Noisy Clinical Trial Data

  • Conference paper
  • First Online:
Soft Computing in Data Science (SCDS 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 652))

Included in the following conference series:

Abstract

Most research concluded that machine learning performance is better when dealing with cleaned dataset compared to dirty dataset. In this paper, we experimented three weak or base machine learning classifiers: Decision Table, Naive Bayes and k-Nearest Neighbor to see their performance on real-world, noisy and messy clinical trial dataset rather than employing beautifully designed dataset. We involved the clinical trial data scientist in leading us to a better data analysis exploration and enhancing the performance result evaluation. The classifiers performances were analyzed using Accuracy and Receiver Operating Characteristic (ROC), supported with sensitivity, specificity and precision values which resulted to contradiction of conclusion made by previous research. We employed pre-processing techniques such as interquartile range technique to remove the outliers and mean imputation to handle missing values and these techniques resulted to; all three classifiers work better in dirty dataset compared to imputed and clean dataset by showing highest accuracy and ROC measure. Decision Table turns out to be the best classifier when dealing with real-world noisy clinical trial.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rogers, S., Girolami, M.: A First Course in Machine Learning. CRC Press, Boca Raton (2015)

    MATH  Google Scholar 

  2. Simon, H.A.: Applications of Machine Learning and Rule Induction (1995)

    Google Scholar 

  3. Gamberger, D., Lavrač, N.: Noise detection and elimination applied to noise handling in KRK chess endgame. In: International Conference Inductive Logic Programming (1997)

    Google Scholar 

  4. Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  5. Little, R.J., D’Agostino, R., Cohen, M.L., Dickersin, K., Emerson, S.S., Farrar, J.T., Frangakis, C., Hogan, J.W., Molenberghs, G., Murphy, S.A., Neaton, J.D., Rotnitzky, A., Scharfstein, D., Shih, W.J., Siegel, J.P., Stern, H.: The prevention and treatment of missing data in clinical trials. N. Engl. J. Med. 367(14), 1355–1360 (2012)

    Article  Google Scholar 

  6. Grubbs, F.E.: Procedures for detecting outlying observations in samples (1974)

    Google Scholar 

  7. Gamberger, D., Lavrač, N., Duzeroski, S.: Noise detection and elimination in data preprocessing: experiments in medical domains. Appl. Artif. Intell. 14(2), 205–223 (2000)

    Article  Google Scholar 

  8. Van Hulse, J., Khoshgoftaar, T.: Knowledge discovery from imbalanced and noisy data. Data Knowl. Eng. 68(12), 1513–1542 (2009)

    Article  Google Scholar 

  9. Zhu, X., Wu, X., Chen, Q.: Eliminating class noise in large datasets. In: ICML, pp. 920–927 (2003)

    Google Scholar 

  10. Hall, M.A.: Correlation-based feature selection for machine learning. Methodology 21i195–i20, 1–5 (1999)

    Google Scholar 

  11. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  12. Zupan, B., Demšar, J., Kattan, M.W., Beck, J.R., Bratko, I.: Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif. Intell. Med. 20(1), 59–75 (2000)

    Article  Google Scholar 

  13. Kalapanidas, E., Avouris, N., Craciun, M., Neagu, D.: Machine learning algorithms: a study on noise sensitivity. In: Proceedings of the 1st Balcan Conference on Informatics, pp. 356–365, October 2003

    Google Scholar 

  14. Vannucci, M., Colla, V., Cateni, S.: An hybrid ensemble method based on data clustering and weak learners reliabilities estimated through neural networks. In: Rojas, I., Joya, G., Catala, A. (eds.) IWANN 2015. LNCS, vol. 9095, pp. 400–411. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  15. Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)

    Article  Google Scholar 

  16. Maclin, R., Opitz, D.: Popular Ensemble Methods: An Empirical Study, arXiv.org, vol. cs.AI, pp. 169–198 (2011)

    Google Scholar 

  17. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  18. Kohavi, R.: The power of decision tables. In: Machine Learning, ECML 1995, pp. 174–189 (1995)

    Google Scholar 

  19. Wets, G., Vanthienen, J., Timmermans, H.: Modelling decision tables from data. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  20. John, G.H.G., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec, Canada, vol. 1, pp. 338–345 (1995)

    Google Scholar 

  21. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  22. Zweig, M.H., Campbell, G.: Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin. Chem. 39(4), 561–577 (1993)

    Google Scholar 

  23. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques (Google eBook) (2011)

    Google Scholar 

  24. Li, M., Shang, C., Feng, S., Fan, J.: Quick attribute reduction in inconsistent decision tables. Inf. Sci. (Ny) 254, 155–180 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  25. Tomar, D., Agarwal, S.: A survey on data mining approaches for healthcare. Int. J. Bio-Sci. Bio-Technol. 5(5), 241–266 (2013)

    Article  Google Scholar 

  26. Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Miscellaneous Clustering Methods (2011)

    Google Scholar 

Download references

Acknowledgement

This paper is a part of Master Dissertation Theses written in University of Manchester, UK. We would like to thank data scientists from Advance Analytics Centre, Astra Zeneca, Alderley Park, Chesire, UK for their review, support and suggestion on this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ezzatul Akmal Kamaru-Zaman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Kamaru-Zaman, E.A., Brass, A., Weatherall, J., Rahman, S.A. (2016). Weak Classifiers Performance Measure in Handling Noisy Clinical Trial Data. In: Berry, M., Hj. Mohamed, A., Yap, B. (eds) Soft Computing in Data Science. SCDS 2016. Communications in Computer and Information Science, vol 652. Springer, Singapore. https://doi.org/10.1007/978-981-10-2777-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2777-2_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2776-5

  • Online ISBN: 978-981-10-2777-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics