Skip to main content

Visual Analytics for Classifier Construction and Evaluation for Medical Data

  • Chapter
  • First Online:
Data Science for Healthcare
  • 2813 Accesses

Abstract

Designing and optimizing classifiers for multidimensional mixed quantitative-and-categorical data is a challenging task. We present here a workflow and associated toolset that assists with this task, by providing the designer with insights into how the multidimensional input data is structured and how this structure influences the classification results. Our approach heavily relies on visual analytics for detecting relevant patterns in the input data, observing the distribution of classification errors, detecting and controlling the effect of feature selection on the classification results, and comparing in detail the performance of different classification techniques. We demonstrate the value of our approach on the concrete problem of building a classifier for predicting biochemical recurrence, indicating potential cancer relapse after prostate cancer treatment, from clinical patient data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abernethy, A.P., Etheredge, L.M., Ganz, P.A., Wallace, P., German, R.R., Neti, C., Bach, P.B., Murphy, S.B.: Rapid-learning system for cancer care. J. Clin. Oncol. 28(27), 4268–4274 (2010). PMID: 20585094; https://doi.org/10.1200/JCO.2010.28.5478

    Article  Google Scholar 

  2. Albanese, D., Visintainer, R., Merler, S.: mlpy: Machine learning Python (2012). arXiv:1202.6548; http://mlpy.sourceforge.net

  3. Altman, N.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)

    MathSciNet  Google Scholar 

  4. Bartenhagen, C., Klein, H.U., Ruckert, C., Jiang, X., Dugas, M.: Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinform. 11, 567 (2010). https://doi.org/10.1186/1471-2105-11-567

    Google Scholar 

  5. Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, pp. 437–478. Springer, Berlin (2012)

    Google Scholar 

  6. Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Thiel, K., Wiswedel, B.: KNIME – the Konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explor. Newsl. 11(1), 26–31 (2009)

    Google Scholar 

  7. Biehl, M.: GMLVQ source code. http://www.cs.rug.nl/~biehl/gmlvq (2017)

  8. Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp. 144–152. ACM, New York (1992)

    Google Scholar 

  9. da Silva, R.R.O., Rauber, P., Martins, R.M., Minghim, R., Telea, A.: Attribute-based visual explanation of multidimensional projections. In: Proceedings of EuroVis Workshop on Visual Analytics (EuroVA), pp. 137–142 (2015)

    Google Scholar 

  10. Demsar, J., Leban, G., Zupan, B.: FreeViz – an intelligent multivariate visualization approach to explorative analysis of biomedical data. J. Biomed. Inform. 40(6), 661–671 (2007)

    Google Scholar 

  11. Domingos, P.: A few useful things to know about machine learning. Commun. ACM 10(55), 78–87 (2012)

    Google Scholar 

  12. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)

    MATH  Google Scholar 

  13. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  14. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    MATH  Google Scholar 

  15. Hajian-Tilaki, K.: Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Casp. J. Intern. Med. 4(2), 627–635 (2013). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3755824/

    Google Scholar 

  16. Hammer, B., Villmann, T.: Generalized relevance learning vector quantization. Neural Netw. 15, 1059–1068 (2002)

    Google Scholar 

  17. Hoffman, P., Grinstein, G., Marx, K., Grosse, I., Stanley, E.: DNA visual and analytic data mining. In: Proceedings of the IEEE Visualization, pp. 437–445 (1997)

    Google Scholar 

  18. Hofmann, M., Klinkenberg, R.: RapidMiner: Data Mining Use Cases and Business Analytics Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series. CRC Press, Boca Raton (2013)

    Google Scholar 

  19. Hohman, F., Kahng, M., Pienta, R., Chau, D.H.: Visual analytics in deep learning: an interrogative survey for the next frontiers (2018). arXiv:1801.06889 [cs.HC]

    Google Scholar 

  20. Hua, K.L., Hsu, C.H., Hidayati, S.C., Cheng, W.H., Chen, Y.J.: Computer-aided classification of lung nodules on computed tomography images via deep learning technique. OncoTargets Ther. 8, 2015–2022 (2015). https://doi.org/10.2147/OTT.S80733; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531007/

  21. Joia, P., Coimbra, D., Cuminato, J.A., Paulovich, F.V., Nonato, L.G.: Local affine multidimensional projection. IEEE Trans. Vis. Comput. Graph. 17(12), 2563–2571 (2011)

    Google Scholar 

  22. Jolliffe, I.T.: Principal Component Analysis. Springer, Berlin (2002)

    MATH  Google Scholar 

  23. Jones, E., Oliphant, T., Peterson, P.: SciPy: open source scientific tools for Python (2017). http://www.scipy.org

  24. Keim, D., Andrienko, G., Fekete, J.D., Görg, C., Kohlhammer, J., Melan con, G.: Visual analytics: definition, process, and challenges. In: Information Visualization – Human-Centered Issues and Perspectives, pp. 154–175. Springer, Berlin (2008)

    Google Scholar 

  25. Keim, D.A., Mansmann, F., Schneidewind, J., Thomas, J., Ziegler, H.: Visual analytics: scope and challenges. In: Visual Data Mining, pp. 76–90. Springer, Berlin (2008)

    Google Scholar 

  26. Kimelfeld, B., Ré, C.: A relational framework for classifier engineering. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS ’17, pp. 5–20. ACM, New York (2017). http://doi.acm.org/10.1145/3034786.3034797

  27. Kohonen, T.: Learning vector quantization. In: Arbib, M. (ed.) The Handbook of Brain Theory and Neural Networks, pp. 537–540. MIT Press, Cambridge (1995)

    Google Scholar 

  28. Leban, G., Zupan, B., Vidmar, G., Bratko, I.: VizRank: data visualization guided by machine learning. Data Min. Knowl. Disc. 13(2), 119–136 (2006)

    MathSciNet  Google Scholar 

  29. Leemput, K.V., Maes, F., Vandermeulen, D., Suetens, P.: Automated model-based tissue classification of mr images of the brain. IEEE Trans. Med. Imaging 18(10), 897–908 (1999). https://doi.org/10.1109/42.811270

    Google Scholar 

  30. Levinson, J., Askeland, J., Becker, J., Dolson, J., Held, D., Kammel, S., Kolter, J.Z., Langer, D., Pink, O., Pratt, V., Sokolsky, M., Stanek, G., Stavens, D.M., Teichman, A., Werling, M., Thrun, S.: Towards fully autonomous driving: systems and algorithms. In: Intelligent Vehicles Symposium, pp. 163–168. IEEE, Piscataway (2011)

    Google Scholar 

  31. Liu, S., Bremer, P.T., Pascucci, V.: Distortion-guided structure-driven interactive exploration of high-dimensional data. Comput. Graph. Forum 33(3), 101–110 (2014)

    Google Scholar 

  32. Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Vis. Comput. Graph. 23(3), 1249–1268 (2017)

    Google Scholar 

  33. Martins, R., Coimbra, D., Minghim, R., Telea, A.: Visual analysis of dimensionality reduction quality for parameterized projections. Comput. Graph. 41, 26–42 (2014)

    Google Scholar 

  34. Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. 72(4), 417–473 (2010)

    MathSciNet  Google Scholar 

  35. Minghim, R., Paulovich, F.V., Lopes, A.A.: Content-based text mapping using multi-dimensional projections for exploration of document collections. In: Visualization and Data Analysis (Proceedings of SPIE-IS&T Electronic Imaging), vol. 60, pp. 606–615 (2006)

    Google Scholar 

  36. Mühlbacher, T., Piringer, H., Gratzl, S., Sedlmair, M., Streit, M.: Opening the black box: strategies for increased user involvement in existing algorithm implementations. IEEE Trans. Vis. Comput. Graph. 20(12), 1643–1652 (2014)

    Google Scholar 

  37. Mulder, J., van Wijk, J.J., van Liere, R.: A survey of computational steering environments. Futur. Gener. Comput. Syst. 15(1), 119–129 (1999)

    Google Scholar 

  38. Niknazar, P., Bourgault, M.: In the eye of the beholder: opening the black box of the classification process and demystifying classification criteria selection. Int. J. Manag. Proj. Bus. 10(2), 346–369 (2017)

    Google Scholar 

  39. Paller, C.J., Antonarakis, E.S.: Management of biochemically recurrent prostate cancer after local therapy: evolving standards of care and new directions. Clin. Adv. Hematol. Oncol. 11(1), 14–23 (2013)

    Google Scholar 

  40. Paulovich, F., Oliveira, M.C.F., Minghim, R.: The projection explorer: a flexible tool for projection-based multidimensional visualization. In: Proceedings of SIBGRAPI, pp. 27–36 (2007)

    Google Scholar 

  41. Paulovich, F., Nonato, L., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Vis. Comput. Graph. 14(3), 564–575 (2008)

    Google Scholar 

  42. Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). http://scikit-learn.org

    MathSciNet  MATH  Google Scholar 

  43. Pennacchiotti, M., Popescu, A.M.: A machine learning approach to twitter user classification. In: ICWSM, vol. 11, pp. 281–288 (2011)

    Google Scholar 

  44. Pezzotti, N., Höllt, T., van Gemert, J., Lelieveldt, B.P., Eisemann, E., Vilanova, A.: DeepEyes: progressive visual analytics for designing deep neural networks. IEEE Trans. Vis. Comput. Graph. 24(1), 98–108 (2018)

    Google Scholar 

  45. Rauber, P., da Silva, R., Feringa, S., Celebi, M., Falcão, A., Telea, A.: Interactive image feature selection aided by dimensionality reduction. In: Proceedings of EuroVA, pp. 46–51. Eurographics (2015)

    Google Scholar 

  46. Rauber, P., Fadel, S., Falcão, A., Telea, A.: Visualizing the hidden activity of artificial neural networks. IEEE Trans. Vis. Comput. Graph. 23(1), 101–110 (2017)

    Google Scholar 

  47. Sammon, J.W.: A non-linear mapping for data structure analysis. IEEE Trans. Comput. C-18, 401–409 (1964)

    Google Scholar 

  48. Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Ann. Rev. Biomed. Eng. 19(1), 221–248 (2017). http://dx.doi.org/10.1146/annurev-bioeng-071516-044442

    Google Scholar 

  49. Sorzano, C., Vargas, J., Pascual-Montano, A.: A survey of dimensionality reduction techniques (2014). http://arxiv.org/pdf/1403.2877

  50. Stephenson, A.J., Kattan, M.W., Eastham, J.A., Dotan, Z.A., Bianco, F.J., Lilja, H., Scardino, P.T.: Defining biochemical recurrence of prostate cancer after radical prostatectomy: a proposal for a standardized definition. J. Clin. Oncol. 24(24), 3973–3978 (2006)

    Google Scholar 

  51. Sun, Y.: Iterative relief for feature weighting: algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1035–1051 (2007)

    Google Scholar 

  52. Talbot, J., Lee, B., Kapoor, A., Tan, D.: EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers. In: Proceedings of ACM CHI, pp. 1283–1292 (2009)

    Google Scholar 

  53. Tamagnini, P., Krause, J., Dasgupta, A., Bertini, E.: Interpreting black-box classifiers using instance-level visual explanations. In: Proceedings of ACM HILDA (2017)

    Google Scholar 

  54. van der Maaten, L.: Learning a parametric embedding by preserving local structure. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS) (2009)

    Google Scholar 

  55. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2431–2456 (2008)

    MathSciNet  MATH  Google Scholar 

  56. van der Maaten, L., Postma, E., van den Herik, H.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10(1), 66–71 (2009). http://www.iai.uni-bonn.de/~jz/dimensionality_reduction_a_comparative_review.pdf

    Google Scholar 

  57. Zhang, J., Gruenwald, L.: Opening the black box of feature extraction: incorporating visualization into high-dimensional data mining processes. In: Proceedings of IEEE International Conference on Data Mining (ICDM) (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacek Kustra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kustra, J., Telea, A. (2019). Visual Analytics for Classifier Construction and Evaluation for Medical Data. In: Consoli, S., Reforgiato Recupero, D., Petković, M. (eds) Data Science for Healthcare. Springer, Cham. https://doi.org/10.1007/978-3-030-05249-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05249-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05248-5

  • Online ISBN: 978-3-030-05249-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics