skip to main content
10.1145/1562849.1562852acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Visual exploration of categorical and mixed data sets

Published: 28 June 2009 Publication History

Abstract

For categorical data there does not exist any similarity measure which is as straight forward and general as the numerical distance between numerical items. Due to this it is often difficult to analyse data sets including categorical variables or a combination of categorical and numerical variables (mixed data sets). Quantification of categorical variables enables analysis using commonly used visual representations and analysis techniques for numerical data. This paper presents a tool for exploratory analysis of categorical and mixed data, which uses a quantification process introduced in [16]. The application enables analysis of mixed data sets by providing an environment for exploratory analysis using common visual representations in multiple coordinated views and algorithmic analysis that facilitates detection of potentially interesting patterns within combinations of categorical and numerical variables. The effectiveness of the quantification process and of the features of the application is demonstrated through a case scenario.

References

[1]
A. Asuncion and D. Newman. UCI machine learning repository. http://archive.ics.uci.edu/ml/, 2007.
[2]
M. Q. W. Baldonado, A. Woodruff, and A. Kuchinsky. Guidelines for using multiple views in information visualization. In Proceedings of the Workshop on Advanced Visual Interfaces, pages 110--119, 2000.
[3]
R. A. Becker and W. S. Cleveland. Brushing scatterplots. Technometrics, 29(2):127--142, May 1987.
[4]
S. Boriah, V. Chandola, and V. Kumar. Similarity measures for categorical data: A comparative evaluation. In Siam International Conference on Data Mining, pages 243--254. SIAM, April 2008.
[5]
A. Buja, J. A. McDonald, J. Michalak, and W. Stuetzle. Interactive data visualization using focusing and linking. In Proc. IEEE Visualization 91', San Diego, CA, pages 156--153, 1991.
[6]
J.-D. Fekete. The infovis toolkit. In Proceedings of the 10th IEEE Symposium on Information Visualization (Info Vis'04), pages 167--174. IEEE Press, October 2004.
[7]
M. Friendly. Mosaic displays for multi-way contingency tables. Journal of the American Statistical Association, 89(425):190--200, 1994.
[8]
M. Friendly. Extending mosaic displays: Marginal, conditional, and partial views of categorical data. Journal of Computational and Graphical Statistics, 8(3):373--395, 1999.
[9]
M. Friendly. Visualizing categorical data: Data, stories, and pictures. In Proceedings of the Twenty-Fifth Annual SAS Users Group International Conference, April 2000.
[10]
G. H. Golub and W. Kahan. Calculating the singular values and pseudo-inverse of a matrix. J. SIAM Numer. Anal., Ser. B(2):205--224, 1965.
[11]
M. Greenacre. Multiple Correspondence Analysis and Related Methods. Chapman&Hall, 2006.
[12]
M. Greenacre. Correspondence Analysis in Practice, 2. ed. Chapman&Hall, 2007.
[13]
S. L. Havre, A. Shah, C. Posse, and B.-J. Webb-Robertson. Diverse information integration and visualization. In Proceedings of SPIE - The International Society for Optical Engineering, January 2006.
[14]
A. Inselberg. The plane with parallel coordinates. The Visual Computer, 1(4):69--91, 1985.
[15]
M. Jern, S. Johansson, J. Johansson, and J. Franzén. The gav toolkit for multiple linked views. In Fifth International Conference on Coordinated and Multiple Views in Exploratory Visualization, CMV '07, pages 85--97. IEEE Computer Society, July 2007.
[16]
S. Johansson, M. Jern, and J. Johansson. Interactive quantification of categorical variables in mixed data sets. In Proceedings of IEEE International Conference on Information Visualisation, IV08, pages 3--10. IEEE Computer Society, July 2008.
[17]
I. T. Jolliffe. Principal Component Analysis, 2 ed. Springer-Verlag, 2002.
[18]
R. Kosara, F. Bendix, and H. Hauser. Parallel sets: Interactive exploration and visual analysis of categorical data. IEEE Transactions on Visualization and Computer Graphics, 12(4):558--568, 2006.
[19]
S. Ma and J. L. Hellerstein. Ordering categorical data to improve visualization. In IEEE Information Visualization Symposium Late Breaking Hot Topics, pages 15--18, 1999.
[20]
B. Mirkin. Clustering for data mining a data recovery approach. Chapman&Hall, 2005.
[21]
A. Patro, M. O. Ward, and E. A. Rundensteiner. Seamless integration of diverse data types into exploratory visualization systems. Technical report, Worcester Polytechnic Institute, 2003.
[22]
R. Rao and S. K. Card. The table lens: merging graphical and symbolic representations in an interactive focus + context visualization for tabular information. In Proceedings of the SIGCHI conference on Human factors in computing systems: celebrating interdependence, pages 318--322. ACM, 1994.
[23]
J. C. Roberts. State of the art: Coordinated&multiple views in exploratory visualization. In Fifth International Conference on Coordinated and Multiple Views in Exploratory Visualization, CMV '07, pages 61--71. IEEE Computer Society, July 2007.
[24]
G. E. Rosario, E. A. Rundensteiner, D. C. Brown, M. O. Ward, and S. Huang. Mapping nominal values to numbers for effective visualization. Information Visualization, 3(2):80--95, 2004.
[25]
J. Seo and B. Shneiderman. Interactively exploring hierarchical clustering results. IEEE Computer, 35(7):80--86, 2002.
[26]
P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining, chapter 2, page 74. Addison-Wesley, 2006.
[27]
M. O. Ward. Xmdvtool: Integrating multiple methods for visualizing multivariate data. In Proceedings of the Conference on Visualization 1994, pages 326--333, October 1994.
[28]
E. J. Wegman. Hyperdimensional data analysis using parallel coordinates. Journal of American Statistics Association, 85(411):664--675, 1990.

Cited By

View all
  • (2024)VIME: Visual Interactive Model Explorer for Identifying Capabilities and Limitations of Machine Learning Models for Sequential Decision-MakingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676323(1-21)Online publication date: 13-Oct-2024
  • (2022)Evaluating Data‐type Heterogeneity in Interactive Visual Analyses with Parallel AxesComputer Graphics Forum10.1111/cgf.1443841:1(335-349)Online publication date: 20-Jan-2022
  • (2016)Temporal MDS Plots for Analysis of Multivariate DataIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2015.246755322:1(141-150)Online publication date: 31-Jan-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VAKD '09: Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration
June 2009
92 pages
ISBN:9781605586700
DOI:10.1145/1562849
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

KDD09
Sponsor:

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)VIME: Visual Interactive Model Explorer for Identifying Capabilities and Limitations of Machine Learning Models for Sequential Decision-MakingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676323(1-21)Online publication date: 13-Oct-2024
  • (2022)Evaluating Data‐type Heterogeneity in Interactive Visual Analyses with Parallel AxesComputer Graphics Forum10.1111/cgf.1443841:1(335-349)Online publication date: 20-Jan-2022
  • (2016)Temporal MDS Plots for Analysis of Multivariate DataIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2015.246755322:1(141-150)Online publication date: 31-Jan-2016
  • (2011)Visualization of multi-domain ranked dataSearch computing10.5555/1983774.1983782(53-69)Online publication date: 1-Jan-2011
  • (2011)Visualization of Multi-domain Ranked DataSearch Computing10.1007/978-3-642-19668-3_6(53-69)Online publication date: 2011
  • (2010)Combining statistical independence testing, visual attribute selection and automated analysis to find relevant attributes for classification2010 IEEE Symposium on Visual Analytics Science and Technology10.1109/VAST.2010.5654445(239-240)Online publication date: Oct-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media