Skip to main content
Log in

VizRank: Data Visualization Guided by Machine Learning

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Data visualization plays a crucial role in identifying interesting patterns in exploratory data analysis. Its use is, however, made difficult by the large number of possible data projections showing different attribute subsets that must be evaluated by the data analyst. In this paper, we introduce a method called VizRank, which is applied on classified data to automatically select the most useful data projections. VizRank can be used with any visualization method that maps attribute values to points in a two-dimensional visualization space. It assesses possible data projections and ranks them by their ability to visually discriminate between classes. The quality of class separation is estimated by computing the predictive accuracy of k-nearest neighbor classifier on the data set consisting of x and y positions of the projected data points and their class information. The paper introduces the method and presents experimental results which show that VizRank's ranking of projections highly agrees with subjective rankings by data analysts. The practical use of VizRank is also demonstrated by an application in the field of functional genomics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.

References

  • Bardorfer, A., Munih, M., and Zupan, A. 2001. Upper limb motion analysis using haptic interface. IEEE/ASME Transactions on Mechatronics, 6(3):253–260.

    Article  Google Scholar 

  • Blake, C. and Merz, C. 1998. UCI repository of machine learning databases.

  • Brier, G.W. 1950. Verification of forecasts expressed in terms of probabilities. Monthly Weather Review, 78:1–3.

    Article  Google Scholar 

  • Broder, A.J. 1990. Strategies for efficient incremental nearest neighbor search. Pattern Recognition, 23(1–2):171–178.

    Article  Google Scholar 

  • Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Furey, T.S., Ares, M.J., and Haussler, D. 2000. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences, 97(1):262–267.

    Article  Google Scholar 

  • Chambers, J.M., Cleveland, W.S., Kleiner, B., and Tukey, P.A. 1983. Graphical Methods for Data Analysis, Chapman and Hall.

  • Cleveland, W.S. 1993. Visualizing data, New Jersey: Hobart Press (Summit).

    Google Scholar 

  • Cleveland, W.S. and McGill, R. 1984. The many faces of a scatter plot. Journal of the American Statistical Association, 79(388):807–822.

    Article  MathSciNet  Google Scholar 

  • Cook, R.D. and Yin, X. 2001. Dimension reduction and visualization in discriminant analysis. Australian and New Zealand Journal of Statistics, 43(2):147–199.

    Article  MATH  MathSciNet  Google Scholar 

  • Cutting, J.E. and Vishton, P.M. 1995. Perceiving layout and knowing distances: The integration, relative potency, and contextual use of different information about depth. Handbook of perception and cognition, San Diego, CA: Academic Press, pp. 69–117.

    Google Scholar 

  • Dasarathy, B.W. 1991. Nearest neighbor (NN) norms: NN pattern classification techniques, IEEE Computer Society Press.

  • Demšar, J. and Zupan, B. 2004. From experimental machine learning to interactive data mining, a white paper. AI Lab, Faculty of Computer and Information Science, Ljubljana.

  • DeRisi, J.L., Iyer, V.R., and Brown, P.O. 1997. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278:680–686.

    Article  Google Scholar 

  • Diaconis, P. and Friedman, D. 1984. Asymptotics of graphical projection pursuit. Annals of Statistics, 1(12):793–815.

    Article  Google Scholar 

  • Dillon, I., Modha, D., and Spangler, W. 1998. Visualizing class structure of multidimensional data. Proceedings of the 30th Symposium on the Interface: Computing Science and Statistics, Minneapolis, MN.

  • Duda, R.O., Hart, P.E., and Stork, D.G. 2001. Pattern Classification, John Wiley and Sons, Inc.

  • Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. PNAS, 95(25):14863–14868.

    Article  Google Scholar 

  • Friedman, J.H., Bentley, J.L., and Finkel, R. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3):209–222.

    Article  MATH  Google Scholar 

  • Friedman, J.H. and Tukey, J.W. 1974. A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, 23:881–890.

    Article  MATH  Google Scholar 

  • Grinstein, G., Trutschl, M. and Cvek, U. 2001. High-dimensional visualizations. Proceedings of the Visual Data Mining Workshop, KDD.

  • Harris, R.L. 1999. Information graphics: A comprehensive illustrated reference, New York: Oxford Press, pp. 290–297.

    Google Scholar 

  • Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning, Springer.

  • Hoffman, P.E. and Grinstein, G.G. 1999. Dimensional anchors: A graphic primitive for multidimensional multivariate information visualizations. Proc. of the NPIV 99.

  • Hoffman, P.E., Grinstein, G.G., Marx, K., Grosse, I., and Stanley, E. 1997. DNA visual and analytic data mining. IEEE Visualization, 1:437–441.

    Google Scholar 

  • Huber, P. 1985. Projection pursuit (with discussion). Annals of Statistics, 13:435–525.

    Article  MATH  MathSciNet  Google Scholar 

  • Inselberg, A. 1981. n-dimensional graphics, part i-lines and hyperplanes, Technical Report G320-2711, IBM Los Angeles Scientific Center.

  • Kaski, S. and Peltonen, J. 2003. Informative discriminant analysis. Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), 1:329–336.

    Google Scholar 

  • Keim, D.A. and Kriegel, H. 1996. Visualization techniques for mining large databases: A comparison. Transactions on Knowledge and Data Engineering, Special Issue on Data Mining, 8(6):923–938.

    Article  Google Scholar 

  • Kononenko, I. and Simec, E. 1995. Induction of decision trees using relieff. Mathematical and statistical methods in artificial intelligence, Springer Verlag.

  • Leban, G., Bratko, I., Petrovic, U., Curk, T., and Zupan, B. 2005. Vizrank: Finding informative data projections in functional genomics by machine learning. Bioinformatics, 21(3):413–414.

    Article  Google Scholar 

  • Nason, G. 1992. Design and Choice of Projection Indices, PhD thesis, University of Bath.

  • Santini, S. and Jain, R. 1996. The use of psychological similarity measure for queries in image databases.

  • Santini, S. and Jain, R. 1999. Similarity measures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(9):871–883.

    Article  Google Scholar 

  • Schucany, W. and Frawley, W. 1973. A rank test for two group concordance. Psychometrika, 2(38):249–258.

    Article  Google Scholar 

  • Siegel, S. and Castellan, J. 1988. Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw-Hill.

  • Torkkola, K. 2003. Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research, 3:1415–1438.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors wish to thank Uros Petrovic for the help on analysis of yeast gene expression data set and twelve post-graduate students of University of Ljubljana who for participating in the experiments. We would also like to acknowledge the support from a Program Grant (P2-0209) from Slovenian Research Agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gregor Leban.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leban, G., Zupan, B., Vidmar, G. et al. VizRank: Data Visualization Guided by Machine Learning. Data Min Knowl Disc 13, 119–136 (2006). https://doi.org/10.1007/s10618-005-0031-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-005-0031-5

Keywords

Navigation