Skip to main content

Advertisement

Log in

Similarity-Dissimilarity Plot for Visualization of High Dimensional Data in Biomedical Pattern Classification

  • ORIGINAL PAPER
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

In pattern classification problems, feature extraction is an important step. Quality of features in discriminating different classes plays an important role in pattern classification problems. In real life, pattern classification may require high dimensional feature space and it is impossible to visualize the feature space if the dimension of feature space is greater than four. In this paper, we have proposed a Similarity-Dissimilarity plot which can project high dimensional space to a two dimensional space while retaining important characteristics required to assess the discrimination quality of the features. Similarity-dissimilarity plot can reveal information about the amount of overlap of features of different classes. Separable data points of different classes will also be visible on the plot which can be classified correctly using appropriate classifier. Hence, approximate classification accuracy can be predicted. Moreover, it is possible to know about whom class the misclassified data points will be confused by the classifier. Outlier data points can also be located on the similarity-dissimilarity plot. Various examples of synthetic data are used to highlight important characteristics of the proposed plot. Some real life examples from biomedical data are also used for the analysis. The proposed plot is independent of number of dimensions of the feature space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Logeswaran, R., CholangiocarcinomaAn automated preliminary detection system using MLP. J. Med. Syst. 33:413–421, 2009.

    Article  Google Scholar 

  2. Afsar, F. A., and Arif, M., Robust Electrocardiogram (ECG) beat classification using discrete wavelet transform. Physiol. Meas. 29:555–570, 2008.

    Article  Google Scholar 

  3. Kim, J. H., Kohane, I. S., and Ohno-Machado, L., Visualization and evaluation of clusters for exploratory analysis of gene expression data. J. Biomed. Inform. 35(1):25–36, 2002.

    Article  Google Scholar 

  4. Afsar, F. A., and Arif, M., Detection of ST segment deviation episodes in the ECG using KLT with an ensemble neural classifier. Physiol. Meas. 29:747–760, 2008.

    Article  Google Scholar 

  5. Andrews, D. F., Plot of high dimensional data. Biometrics 29:125–136, 1972.

    Article  Google Scholar 

  6. Chambers, J. M., Cleveland, W. S., Kleiner, B., Tukey, P. A., Graphical methods for data analysis. Chapman and Hall, 1976.

  7. van Wijk, J. J., van Liere, R., HyperSlice, Proceedings of IEEE Visualization ‘93. In: Nielson, G. M., Bergeron, R. D., (Ed.), Los Alamitos: IEEE Computer Society Press, pp. 119–125, 1993.

  8. Alpern, B., Carter, L., Hyperbox, Proceedings of IEEE Visualization ‘91, 133–139, 1991.

  9. Spence, R., Tweedie, L., Dawkes, H., Su, H., Visualisation for functional design. Proceedings of IEEE Visualization ‘95, 4–10, 1995.

  10. Inselberg, A., The plane with parallel coordinates. Vis. Comp. 69–92, 1985.

  11. Inselberg, A., Dimsdale, B., Parallel coordinates: A tool for visualization high dimensional geometry. Proc. IEEE Visualization, 361–378, 1990.

  12. Peng, W., Ward, M. O., Rundensteiner, E. A., Cluster reduction in multi-dimensional data visualization using dimension reordering. Proc of IEEE symposium on Information visualization, 89–96, 2004.

  13. Johansson, J., Ljung, P., Jern, M., Cooper, M., Revealing structures within clustered parallel coordinates display. Proc. of IEEE symposium on Information visualization, 125–132, 2005.

  14. Siirtola, H., Direct manipulation of parallel coordinates. Proc. of IEEE 4th International Conference on Information visualization, 373–378, 2000.

  15. Murtagh, F., A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4):354–359, 1983.

    MATH  Google Scholar 

  16. Boudaillier, E., and Hebrial, G., Interactive interpretation of hierarchical clustering. Intell. Data Anal. 2(3):229–244, 1998.

    Article  Google Scholar 

  17. Willet, P., Recent trends in hierarchical document clustering: A critical review. Inf. Process. Manage. 24:577–597, 1988.

    Article  Google Scholar 

  18. Kohonen, T., The self-organising map. Proc. IEEE 78(9):m1464–1480, 1990.

    Article  Google Scholar 

  19. Brunsdon, C., Fotheringham, A. S., Charlton, M. E., An investigation of methods for visualising highly multivariate datasets. In Case studies of Visualization in Social Sciences, pp. 55–80, 1998.

  20. Leban, G., Bratko, I., Petrovic, U., Curk, T., and Zupan, B., Vizrank: finding informative data projections in functional genomics by machine learning. Bioinformatics 21(3):413–414, 2005.

    Article  Google Scholar 

  21. McCarthy, J. F., Marx, K. A., Hoffman, P. E., Gee, A. G., O’Neil, P., Ujwal, M. L., and Hotchkiss, J., Applications of machine learning and high-dimensional visualization in cancer detection, diagnosis and management. Ann. NY Acad. Sci. 1020:239–262, 2004.

    Article  Google Scholar 

  22. Demsar, J., Leban, G., and Zupan, B., FreeViz—an intelligent multivariate visualization approach to explorative analysis of biomedical data. J. Biomed. Inform. 40(6):661–671, 2007.

    Article  Google Scholar 

  23. Horton, P., Nakai, K., A probabilistic classification system for predicting the cellular localization sites of proteins. Proc. 4th Int. Conf. Intell. Syst Mol. Biol. 109–115, 1996.

  24. Tanwani, A. K., Afridi, J., Shafiq, M. Z., Farooq, M., Guidelines to select machine learning scheme for classification of biomedical datasets. Proc of the 7th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, 128–139, 2009.

  25. Mangasarian, O. L., Street, W. N., and Wolberg, W. H., Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43(4):570–577, 1995.

    Article  MathSciNet  MATH  Google Scholar 

  26. Wolberg, W. H., Street, W. N., Heisey, D. M., Mangasarian, O. L., Computerized breast cancer diagnosis and prognosis from fine needle aspirates, Arch. Surg. 130:511–516, 1995.

    Google Scholar 

  27. Moghaddam, B., Shakhnarovich, G., Boosted dyadic kernel discriminants. Proc of Neural Information Processing Systems, 761–768, 2002.

  28. Ubeyli, E. D., A mixture of experts network structure for breast cancer diagnosis. J. Med. Syst. 29(5), 2005.

  29. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531–537, 1999.

    Article  Google Scholar 

  30. Chao, S., Lihui, C., Feature dimension reduction for microarray data analysis using locally linear embedding. Proc. of 3 rd Asia-Pacific Bioinformatics conference, 211–217, 2005.

  31. Sohn, K., and Lim, S. H., A new gene selection method based on PCA for molecular classification. Proc of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery 4:275–279, 2007.

    Article  Google Scholar 

  32. Marchand, M., Shah, M., PAC-Bayes learning of conjunctions and classification of gene-expression data. In: Saul, L. K., Weiss, Y., Bottou, L. (Ed.), Advances in Neural Information Processing Systems, MIT Press, 17, pp. 881–888, 2005.

  33. Pillati, M., Viroli, C., Supervised locally linear embedding for classification: An application to gene expression data analysis. In: Zani, S., Cerioli, A. (Eds.), Book of Short Papers, CLADAG2005, Parma, pp. 147–150, 2005.

  34. Asuncion, A., Newman, D. J., UCI machine learning repository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, 2007.

  35. Lal, T. N., Chapelle, O., Schölkopf, B., Combining a filter method with SVMs. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. A. (Eds.), Feature Extraction: Foundations and Applications. Springer, pp. 439–446, 2006.

  36. Li, K., Meng, X., Cao, Z., Sun, X., Multi-view learning for high dimensional data classification. Chinese Control and Decision Conference, CCDC ‘09, 3766–3770, 2009.

    Google Scholar 

  37. Hong, Z. Q., and Yang, J. Y., Optimal discriminant plane for a small number of samples and design method of classifier on the plane. Pattern Recognit. 24(4):317–324, 1991.

    Article  MathSciNet  Google Scholar 

  38. Aeberhard, S., Coomans, D., De Vel, O., Comparative-analysis of statistical pattern recognition methods in high-dimensional settings. Proc of IEEE Signal Processing Workshop on Higher Order Statistics, 14–16, 1994.

  39. Chitsaz, E., Taheri, M., Katebi, S. D., and Jahromi, M. Z., An improved fuzzy feature clustering and selection based on chi-squared-test. Proc of the International Multi Conference of Engineers and Computer Scientists 1:35–40, 2009.

    Google Scholar 

  40. McKusick, K., Thompson, K., COBWEB/3: A portable implementation, Technical Report FIA-90-6-18-2. NASA Ames Research Center, 1980.

  41. Cha, S.-H., Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Model Meth. Appl. Sci. 1(4):300–307, 2007.

    MathSciNet  Google Scholar 

  42. Reich, Y., Fenves, S. J., The formation and use of abstract concepts in design. In: Fisher, D. H., Pazzani, M. J., Langley, P. (Eds.), Concepts Formation: Knowledge and Experience in Unsupervised Learning. Morgan Kaufmann, pp. 323–352, 1991.

  43. Li, C., and Biswas, G., Unsupervised learning with mixed numeric and nominal data. IEEE Trans. Knowl. Data Eng. 14(4):673–690, 2002.

    Article  Google Scholar 

  44. Goodall, D.W., A new similarity index based on probability. Biometrics. 22:882–907, 1966.

    Google Scholar 

  45. Boriah, S., Chandola, V., Kumar, V., Similarity measures for categorical data: A comparative evaluation. In: SDM, SIAM, Philadelphia, pp. 243–254, 2008.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Arif.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arif, M. Similarity-Dissimilarity Plot for Visualization of High Dimensional Data in Biomedical Pattern Classification. J Med Syst 36, 1173–1181 (2012). https://doi.org/10.1007/s10916-010-9579-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10916-010-9579-8

Keywords

Navigation