Abstract
We present new visual data mining algorithms for interactive decision tree construction with large datasets. The size of data stored in the world is constantly increasing but the limits of current visual data mining (and visualization) methods concerning the number of items and dimensions of the dataset treated are well known (even with pixellisation methods). One solution to improve these methods is to use a higher-level representation of the data, for example a symbolic data representation. Our new interactive decision tree construction algorithms deal with interval and taxonomical data. With such a representation, we are able to deal with potentially very large datasets because we do not use the original data but higher-level data representation. Interactive algorithms are examples of new data mining approach aiming at involving more intensively the user in the process. The main advantages of this user-centered approach are the increased confidence and comprehensibility of the obtained model, because the user was involved in its construction and the possible use of human pattern recognition capabilities. We present some results we obtained on very large datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.: Towards Effective and Interpretable Data Mining by Visual Interaction. SIKDD Explorations 3(2), 11–22, http://www.acm.org/sigkdd/explorations/
Ankerst, M.: Visual Data Mining, PhD Thesis, Faculty of Mathematics and Computer Science, Univ. of Munich (2000)
Ankerst, M., Ester, M., Kriegel, H-P.: Toward an Effective Cooperation of the Computer and the User for Classification. In: Proc. of KDD’2001, pp. 179–188 (2001)
Asseraf, M., Mballo, C., Diday, E.: Binary decision trees for interval and taxonomical variables. A Statistical Journal for Graduate Students, Presses Académiques de Neuchâtel 5(1), 13–28 (2004)
Blake, C., Merz, C.: UCI Repository of machine learning databases. University of California Irvine, Department of Information and Computer Science, http://www.ics.uci.edu/~mlearn/MLRepository.html
Bock, H.H., Diday, E.: Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, Heidelberg (2000)
Carr, D., Littlefield, R., Nicholson, W., Littlefield, J.: Scatterplot Matrix Techniques for Large N. Journal of the American Statistical Association 82(398), 424–436 (1987)
Ciampi, A., Diday, E., Lebbe, J., Périnel, E., Vignes, R.: Growing a tree classifier with imprecise data. Pattern Recognition Letters 21, 787–803 (2000)
Do, T-N., Poulet, F.: Interval Data Mining with Kernel Methods and Visualization. In: Proc. of ASMDA’2005, XIth International Symposium on Applied Stochastic Models and Data Analysis, Brest, France, May, pp. 345–354 (2005)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1996)
Han, J., Cercone, N.: Interactive Construction of Decision Trees. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 575–580. Springer, Heidelberg (2001)
Inselberg, A.: The plane with parallel coordinates. Special Issue on Computational Geometry 1, 69–97 (1985)
Keim, D., Kriegel, H-P., Ankerst, M.: Recursive Pattern: A Technique for Visualizing Very Large Amount of Data. In: Proc. of Visualization’95, Atlanta, USA, pp. 279–286 (1995)
Mballo, C., Diday, E.: The criterion of Kolmogorov-Smirnov for binary decision tree: application to interval valued variables. In: Brito, P., Noirhomme-Fraiture, M. (eds.) Proc. of ECML/PKDD’2004 Workshop on Symbolic and Spatial Data Analysis, pp. 79–90 (2004)
Poulet, F.: CIAD: Interactive Decision Tree Construction (in french). In: Proc. of XXXIIIe Journées de Statistiques, Nantes, May (2001)
Poulet, F.: Full-View: A Visual Data-Mining Environment. IJIG: International Journal of Image and Graphics 2(1), 127–144 (2002)
Poulet, F.: SVM and Graphical Algorithms: A Cooperative Approach. In: ICDM 2004, Brighton, UK, Nov. 2004, pp. 499–502. IEEE, Los Alamitos (2004)
Schneiderman, B.: Inventing Discovery Tools: Combining Information Visualization with Data Mining. Information Visualization 1(1), 5–12 (2002)
Ware, M., Franck, E., Holmes, G., Hall, M., Witten, I.: Interactive Machine Learning: Letting Users Build Classifiers. International Journal of Human-Computer Studies 55, 281–292 (2001)
Wong, P.: Visual Data Mining. IEEE Computer Graphics and Applications 19(5), 20–21 (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Poulet, F. (2007). High Dimensional Visual Data Classification. In: Lévy, P.P., et al. Pixelization Paradigm. VIEW 2006. Lecture Notes in Computer Science, vol 4370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71027-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-71027-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71026-4
Online ISBN: 978-3-540-71027-1
eBook Packages: Computer ScienceComputer Science (R0)