Similarity-Dissimilarity Plot for Visualization of High Dimensional Data in Biomedical Pattern Classification

Arif, Muhammad

doi:10.1007/s10916-010-9579-8

Similarity-Dissimilarity Plot for Visualization of High Dimensional Data in Biomedical Pattern Classification

ORIGINAL PAPER
Published: 24 August 2010

Volume 36, pages 1173–1181, (2012)
Cite this article

Journal of Medical Systems Aims and scope Submit manuscript

Muhammad Arif¹

309 Accesses
2 Citations
Explore all metrics

Abstract

In pattern classification problems, feature extraction is an important step. Quality of features in discriminating different classes plays an important role in pattern classification problems. In real life, pattern classification may require high dimensional feature space and it is impossible to visualize the feature space if the dimension of feature space is greater than four. In this paper, we have proposed a Similarity-Dissimilarity plot which can project high dimensional space to a two dimensional space while retaining important characteristics required to assess the discrimination quality of the features. Similarity-dissimilarity plot can reveal information about the amount of overlap of features of different classes. Separable data points of different classes will also be visible on the plot which can be classified correctly using appropriate classifier. Hence, approximate classification accuracy can be predicted. Moreover, it is possible to know about whom class the misclassified data points will be confused by the classifier. Outlier data points can also be located on the similarity-dissimilarity plot. Various examples of synthetic data are used to highlight important characteristics of the proposed plot. Some real life examples from biomedical data are also used for the analysis. The proposed plot is independent of number of dimensions of the feature space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Logeswaran, R., Cholangiocarcinoma—An automated preliminary detection system using MLP. J. Med. Syst. 33:413–421, 2009.
Article Google Scholar
Afsar, F. A., and Arif, M., Robust Electrocardiogram (ECG) beat classification using discrete wavelet transform. Physiol. Meas. 29:555–570, 2008.
Article Google Scholar
Kim, J. H., Kohane, I. S., and Ohno-Machado, L., Visualization and evaluation of clusters for exploratory analysis of gene expression data. J. Biomed. Inform. 35(1):25–36, 2002.
Article Google Scholar
Afsar, F. A., and Arif, M., Detection of ST segment deviation episodes in the ECG using KLT with an ensemble neural classifier. Physiol. Meas. 29:747–760, 2008.
Article Google Scholar
Andrews, D. F., Plot of high dimensional data. Biometrics 29:125–136, 1972.
Article Google Scholar
Chambers, J. M., Cleveland, W. S., Kleiner, B., Tukey, P. A., Graphical methods for data analysis. Chapman and Hall, 1976.
van Wijk, J. J., van Liere, R., HyperSlice, Proceedings of IEEE Visualization ‘93. In: Nielson, G. M., Bergeron, R. D., (Ed.), Los Alamitos: IEEE Computer Society Press, pp. 119–125, 1993.
Alpern, B., Carter, L., Hyperbox, Proceedings of IEEE Visualization ‘91, 133–139, 1991.
Spence, R., Tweedie, L., Dawkes, H., Su, H., Visualisation for functional design. Proceedings of IEEE Visualization ‘95, 4–10, 1995.
Inselberg, A., The plane with parallel coordinates. Vis. Comp. 69–92, 1985.
Inselberg, A., Dimsdale, B., Parallel coordinates: A tool for visualization high dimensional geometry. Proc. IEEE Visualization, 361–378, 1990.
Peng, W., Ward, M. O., Rundensteiner, E. A., Cluster reduction in multi-dimensional data visualization using dimension reordering. Proc of IEEE symposium on Information visualization, 89–96, 2004.
Johansson, J., Ljung, P., Jern, M., Cooper, M., Revealing structures within clustered parallel coordinates display. Proc. of IEEE symposium on Information visualization, 125–132, 2005.
Siirtola, H., Direct manipulation of parallel coordinates. Proc. of IEEE 4th International Conference on Information visualization, 373–378, 2000.
Murtagh, F., A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4):354–359, 1983.
MATH Google Scholar
Boudaillier, E., and Hebrial, G., Interactive interpretation of hierarchical clustering. Intell. Data Anal. 2(3):229–244, 1998.
Article Google Scholar
Willet, P., Recent trends in hierarchical document clustering: A critical review. Inf. Process. Manage. 24:577–597, 1988.
Article Google Scholar
Kohonen, T., The self-organising map. Proc. IEEE 78(9):m1464–1480, 1990.
Article Google Scholar
Brunsdon, C., Fotheringham, A. S., Charlton, M. E., An investigation of methods for visualising highly multivariate datasets. In Case studies of Visualization in Social Sciences, pp. 55–80, 1998.
Leban, G., Bratko, I., Petrovic, U., Curk, T., and Zupan, B., Vizrank: finding informative data projections in functional genomics by machine learning. Bioinformatics 21(3):413–414, 2005.
Article Google Scholar
McCarthy, J. F., Marx, K. A., Hoffman, P. E., Gee, A. G., O’Neil, P., Ujwal, M. L., and Hotchkiss, J., Applications of machine learning and high-dimensional visualization in cancer detection, diagnosis and management. Ann. NY Acad. Sci. 1020:239–262, 2004.
Article Google Scholar
Demsar, J., Leban, G., and Zupan, B., FreeViz—an intelligent multivariate visualization approach to explorative analysis of biomedical data. J. Biomed. Inform. 40(6):661–671, 2007.
Article Google Scholar
Horton, P., Nakai, K., A probabilistic classification system for predicting the cellular localization sites of proteins. Proc. 4th Int. Conf. Intell. Syst Mol. Biol. 109–115, 1996.
Tanwani, A. K., Afridi, J., Shafiq, M. Z., Farooq, M., Guidelines to select machine learning scheme for classification of biomedical datasets. Proc of the 7th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, 128–139, 2009.
Mangasarian, O. L., Street, W. N., and Wolberg, W. H., Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43(4):570–577, 1995.
Article MathSciNet MATH Google Scholar
Wolberg, W. H., Street, W. N., Heisey, D. M., Mangasarian, O. L., Computerized breast cancer diagnosis and prognosis from fine needle aspirates, Arch. Surg. 130:511–516, 1995.
Google Scholar
Moghaddam, B., Shakhnarovich, G., Boosted dyadic kernel discriminants. Proc of Neural Information Processing Systems, 761–768, 2002.
Ubeyli, E. D., A mixture of experts network structure for breast cancer diagnosis. J. Med. Syst. 29(5), 2005.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531–537, 1999.
Article Google Scholar
Chao, S., Lihui, C., Feature dimension reduction for microarray data analysis using locally linear embedding. Proc. of 3 rd Asia-Pacific Bioinformatics conference, 211–217, 2005.
Sohn, K., and Lim, S. H., A new gene selection method based on PCA for molecular classification. Proc of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery 4:275–279, 2007.
Article Google Scholar
Marchand, M., Shah, M., PAC-Bayes learning of conjunctions and classification of gene-expression data. In: Saul, L. K., Weiss, Y., Bottou, L. (Ed.), Advances in Neural Information Processing Systems, MIT Press, 17, pp. 881–888, 2005.
Pillati, M., Viroli, C., Supervised locally linear embedding for classification: An application to gene expression data analysis. In: Zani, S., Cerioli, A. (Eds.), Book of Short Papers, CLADAG2005, Parma, pp. 147–150, 2005.
Asuncion, A., Newman, D. J., UCI machine learning repository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, 2007.
Lal, T. N., Chapelle, O., Schölkopf, B., Combining a filter method with SVMs. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. A. (Eds.), Feature Extraction: Foundations and Applications. Springer, pp. 439–446, 2006.
Li, K., Meng, X., Cao, Z., Sun, X., Multi-view learning for high dimensional data classification. Chinese Control and Decision Conference, CCDC ‘09, 3766–3770, 2009.
Google Scholar
Hong, Z. Q., and Yang, J. Y., Optimal discriminant plane for a small number of samples and design method of classifier on the plane. Pattern Recognit. 24(4):317–324, 1991.
Article MathSciNet Google Scholar
Aeberhard, S., Coomans, D., De Vel, O., Comparative-analysis of statistical pattern recognition methods in high-dimensional settings. Proc of IEEE Signal Processing Workshop on Higher Order Statistics, 14–16, 1994.
Chitsaz, E., Taheri, M., Katebi, S. D., and Jahromi, M. Z., An improved fuzzy feature clustering and selection based on chi-squared-test. Proc of the International Multi Conference of Engineers and Computer Scientists 1:35–40, 2009.
Google Scholar
McKusick, K., Thompson, K., COBWEB/3: A portable implementation, Technical Report FIA-90-6-18-2. NASA Ames Research Center, 1980.
Cha, S.-H., Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Model Meth. Appl. Sci. 1(4):300–307, 2007.
MathSciNet Google Scholar
Reich, Y., Fenves, S. J., The formation and use of abstract concepts in design. In: Fisher, D. H., Pazzani, M. J., Langley, P. (Eds.), Concepts Formation: Knowledge and Experience in Unsupervised Learning. Morgan Kaufmann, pp. 323–352, 1991.
Li, C., and Biswas, G., Unsupervised learning with mixed numeric and nominal data. IEEE Trans. Knowl. Data Eng. 14(4):673–690, 2002.
Article Google Scholar
Goodall, D.W., A new similarity index based on probability. Biometrics. 22:882–907, 1966.
Google Scholar
Boriah, S., Chandola, V., Kumar, V., Similarity measures for categorical data: A comparative evaluation. In: SDM, SIAM, Philadelphia, pp. 243–254, 2008.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Air University, PAF Complex, E-9, Islamabad, Pakistan
Muhammad Arif

Authors

Muhammad Arif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Arif.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arif, M. Similarity-Dissimilarity Plot for Visualization of High Dimensional Data in Biomedical Pattern Classification. J Med Syst 36, 1173–1181 (2012). https://doi.org/10.1007/s10916-010-9579-8

Download citation

Received: 25 June 2010
Accepted: 16 August 2010
Published: 24 August 2010
Issue Date: June 2012
DOI: https://doi.org/10.1007/s10916-010-9579-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similarity-Dissimilarity Plot for Visualization of High Dimensional Data in Biomedical Pattern Classification

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Similarity-Dissimilarity Plot for Visualization of High Dimensional Data in Biomedical Pattern Classification

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation