skip to main content
10.1145/2637748.2638423acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesi-knowConference Proceedingsconference-collections
research-article

TimeCleanser: a visual analytics approach for data cleansing of time-oriented data

Published: 16 September 2014 Publication History

Abstract

Poor data quality leads to unreliable results of any kind of data processing and has profound economic impact. Although there are tools to help users with the task of data cleansing, support for dealing with the specifics of time-oriented data is rather poor. However, the time dimension has very specific characteristics which introduce quality problems, that are different from other kinds of data. We present TimeCleanser, an interactive Visual Analytics system to support the task of data cleansing of time-oriented data. In order to help the user to deal with these special characteristics and quality problems, TimeCleanser combines semi-automatic quality checks, visualizations, and directly editable data tables. The evaluation of the TimeCleanser system within a focus group (two target users, one developer, and two Human Computer Interaction experts) shows that (a) our proposed method is suited to detect hidden quality problems of time-oriented data and (b) that it facilitates the complex task of data cleansing.

References

[1]
J. Barateiro and H. Galhardas. A survey of data quality tools. Datenbankspektrum, 14:15--21, August 2005.
[2]
J. Bernard, T. Ruppert, O. Goroll, T. May, and J. Kohlhammer. Visual-Interactive preprocessing of time series data. In Proc. of SIGRAD 2012: Interactive Visual Analysis of Data, pages 39--48, November 2012.
[3]
H. Galhardas, D. Florescu, D. Shasha, and E. Simon. AJAX: An extensible data cleaning tool. SIGMOD Record, 29(2):590--596, June 2000.
[4]
T. Gschwandtner, J. Gärtner, W. Aigner, and S. Miksch. A taxonomy of dirty time-oriented data. In G. Quirchmayr, J. Basl, I. You, L. Xu, and E. Weippl, editors, Multidisciplinary Research and Practice for Information Systems, LNCS 7465, pages 58--72. Springer, Berlin/Heidelberg, Germany, 2012.
[5]
R. P. Jagadeesh Chandra Bose, R. S. Mans, and W. M. P. van der Aalst. Wanna improve process mining results? It's high time we consider data quality issues seriously. In Proc. of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2013), pages 127--134, April 2013.
[6]
S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: Interactive visual specification of data transformation scripts. In Proc. of the ACM Conference Human Factors in Computing Systems (CHI 2011), pages 3363--3372, May 2011.
[7]
S. Kandel, R. Parikh, A. Paepcke, J. Hellerstein, and J. Heer. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proc. of the International Working Conference on Advanced Visual Interfaces (AVI'12), pages 547--554, May 2012.
[8]
D. A. Keim, F. Mansmann, J. Schneidewind, J. Thomas, and H. Ziegler. Visual analytics: Scope and challenges. In S. J. Simoff, M. H. Böhlen, and A. Mazeika, editors, Visual Data Mining: Theory, Techniques and Tools for Visual Analytics, LNCS 4404, pages 76--90. Springer, Berlin/Heidelberg, Germany, 2008.
[9]
W. Kim, B.-J. Choi, E.-K. Hong, S.-K. Kim, and D. Lee. A taxonomy of dirty data. Data Mining and Knowledge Discovery, 7(1):81--99, January 2003.
[10]
Microsoft. Excel. office.microsoft.com/en-us/excel/ (accessed: 2014-04-17).
[11]
H. Müller and J.-C. Freytag. HUB-IB-164. Problems, methods, and challenges in comprehensive data cleansing. Technical report, Humboldt University Berlin, 2003.
[12]
T. Munzner. A nested model for visualization design and validation. IEEE Transactions on Visualization and Computer Graphics, 15(6):921--928, November 2009.
[13]
P. Oliveira, F. Rodrigues, and P. Henriques. A formal definition of data quality problems. In Proc. of the International Conference on Information Quality (MIT IQ Conference), November 2005.
[14]
Original German quotes of the focus group session. Attached to the submission as supplemental material. ieg.ifs.tuwien.ac.at/~gschwandtner/material/quotes.pdf (accessed: 2014-04-17).
[15]
E. Rahm and H.-H. Do. Data cleaning: Problems and current approaches. IEEE Bulletin of the Technical Committee on Data Engineering, 23(4):3--13, March 2000.
[16]
V. Raman and J. M. Hellerstein. Potter's wheel: An interactive data cleaning system. In Proc. of the 27th International Conference on Very Large Data Bases, pages 381--390, September 2001.
[17]
Random Developers. OpenRefine. http://openrefine.org/ (accessed: 2014-04-17).
[18]
J. Scholtz, M. A. Whiting, C. Plaisant, and G. Grinstein. A reflection on seven years of the VAST challenge. In Proc. of the 2012 BELIV Workshop: Beyond Time and Errors - Novel Evaluation Methods for Visualization, pages 13:1--13:8, October 2012.
[19]
M. Sedlmair, M. Meyer, and T. Munzner. Design study methodology: Reections from the trenches and the stacks. IEEE Trans. Visualization and Computer Graphics, 18(12):2431--2440, October 2012.
[20]
B. Shneiderman. The eyes have it: A task by data type taxonomy for information visualizations. In Proc. of the 1996 IEEE Symposium on Visual Languages, pages 336--343, September 1996.
[21]
Talend. Profiler. http://www.talend.com/ (accessed: 2014-04-17).
[22]
XIMES GmbH. Time Intelligence Solutions {TIS}. www.ximes.com/en/software/products/tis/ (accessed: 2014-04-17).
[23]
Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma. Mining interesting locations and travel sequences from GPS trajectories. In Proc. of the International Conference on World Wild Web (WWW 2009), pages 791--800, April 2009.

Cited By

View all
  • (2024)Text2EL+: Expert Guided Event Log Enrichment Using Unstructured TextJournal of Data and Information Quality10.1145/364001816:1(1-28)Online publication date: 10-Jan-2024
  • (2024)Antarstick: Extracting Snow Height From Time‐Lapse PhotographyComputer Graphics Forum10.1111/cgf.1508843:3Online publication date: 10-Jun-2024
  • (2024)Tasks and Visualizations Used for Data Profiling: A Survey and Interview StudyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.323433730:7(3400-3412)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
i-KNOW '14: Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business
September 2014
262 pages
ISBN:9781450327695
DOI:10.1145/2637748
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data cleansing
  2. data quality
  3. design study
  4. time-oriented data
  5. visual analytics

Qualifiers

  • Research-article

Funding Sources

Conference

i-KNOW '14

Acceptance Rates

i-KNOW '14 Paper Acceptance Rate 25 of 73 submissions, 34%;
Overall Acceptance Rate 77 of 238 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)4
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Text2EL+: Expert Guided Event Log Enrichment Using Unstructured TextJournal of Data and Information Quality10.1145/364001816:1(1-28)Online publication date: 10-Jan-2024
  • (2024)Antarstick: Extracting Snow Height From Time‐Lapse PhotographyComputer Graphics Forum10.1111/cgf.1508843:3Online publication date: 10-Jun-2024
  • (2024)Tasks and Visualizations Used for Data Profiling: A Survey and Interview StudyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.323433730:7(3400-3412)Online publication date: Jul-2024
  • (2023)It's about Time: Analytical Time PeriodizationComputer Graphics Forum10.1111/cgf.1484542:6Online publication date: 24-May-2023
  • (2023)Interactive Transformations and Visual Assessment of Noisy Event Sequences: An Application in En-Route Air Traffic Control2023 IEEE 16th Pacific Visualization Symposium (PacificVis)10.1109/PacificVis56936.2023.00017(92-101)Online publication date: Apr-2023
  • (2023)Time and Time-Oriented DataVisualization of Time-Oriented Data10.1007/978-1-4471-7527-8_3(53-81)Online publication date: 22-Dec-2023
  • (2022)Use Data Mining Cleansing to Prepare Data for Strategic DecisionsData Mining - Concepts and Applictions10.5772/intechopen.99144Online publication date: 30-Mar-2022
  • (2022)Do You Believe Your (Social Media) Data? A Personal Story on Location Data Biases, Errors, and Plausibility as Well as Their VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.314160528:9(3277-3291)Online publication date: 1-Sep-2022
  • (2022)Utilizing domain knowledge in data-driven process discovery: A literature reviewComputers in Industry10.1016/j.compind.2022.103612137(103612)Online publication date: May-2022
  • (2022)A visual analytics approach for the assessment of information quality of performance models—a software reviewScientometrics10.1007/s11192-022-04399-2127:12(6827-6853)Online publication date: 4-Jul-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media