Skip to main content
Log in

Data Change Exploration Using Time Series Clustering

  • Schwerpunktbeitrag
  • Published:
Datenbank-Spektrum Aims and scope Submit manuscript

Abstract

Analysis of static data is one of the best studied research areas. However, data changes over time. These changes may reveal patterns or groups of similar values, properties, and entities. We study changes in large, publicly available data repositories by modelling them as time series and clustering these series by their similarity. In order to perform change exploration on real-world data we use the publicly available revision data of Wikipedia Infoboxes and weekly snapshots of IMDB.

The changes to the data are captured as events, which we call change records. In order to extract temporal behavior we count changes in time periods and propose a general transformation framework that aggregates groups of changes to numerical time series of different resolutions. We use these time series to study different application scenarios of unsupervised clustering. Our explorative results show that changes made to collaboratively edited data sources can help find characteristic behavior, distinguish entities or properties and provide insight into the respective domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://github.com/HPI-Information-Systems/ChangeTimeSeriesClustering

  2. ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/

  3. The parser is available at: https://github.com/HPI-Information-Systems/IMDBParser

  4. https://dumps.wikimedia.org/

  5. https://hpi.de/naumann/projects/data-profiling-and-analytics/dbchex.html

References

  1. Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering–a decade review. Inf Syst 53:16–38

    Article  Google Scholar 

  2. Alfonseca E, Garrido G, Delort J, Peñas A (2013) WHAD: Wikipedia historical attributes data – historical structured data extraction and vandalism detection from the Wikipedia edit history. Lang Resour Eval 47(4):1163–1190

    Article  Google Scholar 

  3. Bleifuss T, Johnson T, Kalashnikov DV, Naumann F, Shkapenyuk V, Srivastava D (2017) Enabling change exploration (vision). Fourth International Workshop on Exploratory Search in Databases and the Web (ExploreDB), pp 1–3

    Google Scholar 

  4. Cetintemel U, Cherniack M, DeBrabant J, Diao Y, Dimitriadou K, Kalinin A, Papaemmanouil O, Zdonik SB (2013) Query steering for interactive data exploration. Conference on Innovative Data Systems Research (CIDR).

    Google Scholar 

  5. Dasu T, Johnson T, Marathe A (2006) Database exploration using database dynamics. IEEE Data Eng Bull 29(2):43–59

    Google Scholar 

  6. Deligiannidis L, Kochut KJ, Sheth AP (2007) Rdf data exploration and visualization. ACM first workshop on CyberInfrastructure: information management in eScience, pp 39–46

    Google Scholar 

  7. Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inf Sci (Ny) 239:142–153

    Article  MathSciNet  MATH  Google Scholar 

  8. Dividino RQ, Gottron T, Scherp A, Gröner G (2014) From changes to dynamics: dynamics analysis of linked open data sources. Proceedings of the Extended Semantic Web Conference (ESWC).

    Google Scholar 

  9. Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77

    Google Scholar 

  10. Fu T-C, Chung F-L, Luk R, Ng V (2001) Pattern discovery from stock time series using self-organizing maps. Workshop Notes of KDD 2001 Workshop on Temporal Data Mining, pp 26–29

    Google Scholar 

  11. Idreos S, Papaemmanouil O, Chaudhuri S (2015) Overview of data exploration techniques. International Conference on Management of Data (SIGMOD), pp 277–281

    Google Scholar 

  12. Iglesias F, Kastner W (2013) Analysis of similarity measures in times series clustering for the discovery of building energy patterns. Energies 6(2):579–597

    Article  Google Scholar 

  13. Keim DA, Kriegel HP (1994) VisDB: database exploration using multidimensional visualization. IEEE Comput Graph Appl 14(5):40–49

    Article  Google Scholar 

  14. Li X, Li Z, Han J, Lee JG (2009) Temporal outlier detection in vehicle traffic data. International Conference on Data Engineering (ICDE), pp 1319–1322

    Google Scholar 

  15. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144

    Article  MathSciNet  Google Scholar 

  16. Maule A, Emmerich W, Rosenblum DS (2008) Impact analysis of database schema changes. International Conference on Software Engineering (ICSE). ACM, New York, pp 451–460

    Google Scholar 

  17. Mörchen F, Ultsch A, Hoos O (2005) Extracting interpretable muscle activation patterns with time series knowledge mining. Int J Knowledgebased Intell Eng Syst 9(3):197–208

    Google Scholar 

  18. Olszewski RT (2001) Generalized feature extraction for structural pattern recognition in time-series data. Tech. rep. Carnegie-Mellon University, School of Computer Science, Pittsburgh

    Google Scholar 

  19. Özsoyoglu G, Snodgrass RT (1995) Temporal and real-time databases: a survey. IEEE Trans Knowl Data Eng 7(4):513–532

    Article  Google Scholar 

  20. Papavassiliou V, Flouris G, Fundulaki I, Kotzinos D, Christophides V (2009) On detecting high-level changes in RDF/S KBs. International Semantic Web Conference (ISWC), pp 473–488

    Google Scholar 

  21. Petitjean F, Ketterlin A, Gançarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 44(3):678–693

    Article  MATH  Google Scholar 

  22. Ramoni M, Sebastiani P, Cohen P (2000) Multivariate clustering by dynamics. National Conference on Artificial Intelligence (AAAI), pp 633–638

    Google Scholar 

  23. Rebbapragada U, Protopapas P, Brodley CE, Alcock C (2009) Finding anomalous periodic time series. Mach Learn 74(3):281–313

    Article  Google Scholar 

  24. Umbrich J, Decker S, Hausenblas M, Polleres A, Hogan A (2010) Towards dataset dynamics: change frequency of linked open data sources. International Workshop on Linked Data on the Web.

    Google Scholar 

  25. Van Der Aalst W (2012) Process mining: overview and opportunities. ACM Trans Manag Inf Syst 3(2):7

    Google Scholar 

  26. Velegrakis Y, Miller J, Popa L (2004) Preserving mapping consistency under schema changes. VLDB J 13(3):274–293

    Article  Google Scholar 

  27. Xing Z, Pei J, Yu PS, Wang K (2011) Extracting interpretable features for early classification on time series. SIAM International Conference on Data Mining, pp 247–258

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leon Bornemann.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bornemann, L., Bleifuß, T., Kalashnikov, D. et al. Data Change Exploration Using Time Series Clustering. Datenbank Spektrum 18, 79–87 (2018). https://doi.org/10.1007/s13222-018-0285-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13222-018-0285-x

Keywords

Navigation