skip to main content
10.1145/2791347.2791371acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

How to quantify the impact of lossy transformations on change detection

Published: 29 June 2015 Publication History

Abstract

To ease the proliferation of big data, it frequently is transformed, be it by compression, be it by anonymization. Such transformations however modify characteristics of the data, such as changes in the case of time series. Changes however are important for subsequent analyses. The impact of those modifications depends on the application scenario, and quantifying it is far from trivial. This is because a transformation can shift or modify existing changes or introduce new ones. In this paper, we propose MILTON, a flexible and robust Measure for quantifying the Impact of Lossy Transformations on subsequent change detectiON. MILTON is applicable to any lossy transformation technique on time-series data and to any general-purpose change-detection approach. We have evaluated it with three real-world use cases. Our evaluation shows that MILTON allows to quantify the impact of lossy transformations and to choose the best one from a class of transformation techniques for a given application scenario.

References

[1]
G. Acs and C. Castelluccia. I have a dream!(differentially private smart metering). In Information Hiding, 2011.
[2]
S. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, and J. Albrecht. Smart*: An open data set and tools for enabling research in sustainable homes. SustKDD Workshop on Data Mining Applications in Sustainability, 2012.
[3]
D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In KDD workshop, 1994.
[4]
A. Bifet and R. Gavaldà. Learning from Time-Changing Data with Adaptive Windowing, chapter 42.
[5]
W. L. Bircher and L. K. John. Complete system power estimation using processor performance events. IEEE Transactions on Computers, 61:563--577, 2012.
[6]
E. Buchmann, K. Böhm, T. Burghardt, and S. Kessler. Re-identification of smart meter data. Personal and Ubiquitous Computing, 17:653--662, 2013.
[7]
E. Buchmann, S. Kessler, P. Jochem, and K. Böhm. The costs of privacy in local energy markets. In IEEE Conference on Business Informatics, 2013.
[8]
C. Buragohain, N. Shrivastava, and S. Suri. Space efficient streaming algorithms for the maximum error histogram. In IEEE International Conference on Data Engineering, 2007.
[9]
C. Dwork. Differential privacy. In Automata, languages and programming. Springer, 2006.
[10]
P. Efros, E. Buchmann, and K. Böhm. FRESCO: A framework to estimate the energy consumption of computers. In IEEE Conference on Business Informatics, 2014.
[11]
F. Eichinger, P. Efros, S. Karnouskos, and K. Böhm. A time-series compression technique and its application to the smart grid. The VLDB Journal, 24:193--218, 2015.
[12]
F. Eichinger, D. Pathmaperuma, H. Vogt, and E. Müller. Data analysis challenges in the future energy domain. Computational Intelligent Data Analysis for Sustainable Development, 2012.
[13]
H. Elmeleegy, A. K. Elmagarmid, E. Cecchet, W. G. Aref, and W. Zwaenepoel. Online piece-wise linear approximation of numerical streams with precision guarantees. VLDB Endowment, 2:145--156, 2009.
[14]
X. Fan, W.-D. Weber, and L. A. Barroso. Power provisioning for a warehouse-sized computer. In Annual International Symposium on Computer Architecture, 2007.
[15]
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42:14:1--14:53, 2010.
[16]
R. Ge, X. Feng, S. Song, H.-C. Chang, D. Li, and K. Cameron. Powerpack: Energy profiling and analysis of high-performance systems and applications. IEEE Transactions on Parallel and Distributed Systems, 21:658--671, 2010.
[17]
V. Guralnik and J. Srivastava. Event detection from time series data. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999.
[18]
N. Q. V. Hung, H. Jeung, and K. Aberer. An evaluation of model-based approaches to sensor data compression. IEEE Transactions on Knowledge and Data Engineering, 25:2434--2447, 2013.
[19]
A. Kansal, F. Zhao, J. Liu, N. Kothari, and A. A. Bhattacharya. Virtual machine power metering and provisioning. In ACM Symposium on Cloud Computing, 2010.
[20]
E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Locally adaptive dimensionality reduction for indexing large time series databases. ACM SIGMOD Record, 30:151--162, 2001.
[21]
J. Z. Kolter and M. J. Johnson. Redd: A public data set for energy disaggregation research. In SIGKDD Workshop on Data Mining Applications in Sustainability, 2011.
[22]
L. Latecki, Q. Wang, S. Koknar-Tezel, and V. Megalooikonomou. Optimal subsequence bijection. In IEEE International Conference on Data Mining, 2007.
[23]
I. Lazaridis and S. Mehrotra. Capturing sensor-generated time series with quality guarantees. In International Conference on Data Engineering, 2003.
[24]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006.
[25]
S. Liu, M. Yamada, N. Collier, and M. Sugiyama. Change-point detection in time-series data by relative density-ratio estimation. Neural Networks, 43:72--83, 2013.
[26]
A. Molina-Markham, P. Shenoy, K. Fu, E. Cecchet, and D. Irwin. Private memoirs of a smart meter. In ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building, 2010.
[27]
A. Noureddine, A. Bourdon, R. Rouvoy, and L. Seinturier. Runtime monitoring of software energy hotspots. In IEEE/ACM International Conference on Automated Software Engineering, 2012.
[28]
E. S. Page. Continuous inspection schemes. Biometrika, 41:100--115, 1954.
[29]
S. Papadimitriou, F. Li, G. Kollios, and P. S. Yu. Time series compressibility and privacy. In International Conference on Very Large Data Bases, 2007.
[30]
T. G. Papaioannou, M. Riahi, and K. Aberer. Towards online multi-model approximation of time series. In IEEE International Conference on Mobile Data Management, 2011.
[31]
M. Poess and R. O. Nambiar. Power based performance and capacity estimation models for enterprise information systems. IEEE Data Engineering Bulletin, 34:34--49, 2011.
[32]
S. Rajagopalan, L. Sankar, S. Mohajer, and H. Poor. Smart meter privacy: A utility-privacy framework. In IEEE International Conference on Smart Grid Communications, 2011.
[33]
R. Ramanathan, R. Engle, C. W. Granger, F. Vahid-Araghi, and C. Brace. Short-run forecasts of electricity loads and peaks. Cambridge University Press, 2001.
[34]
C. A. Ratanamahatana and E. Keogh. Three myths about dynamic time warping data mining. In SIAM International Conference on Data Mining, 2005.
[35]
G. Ristanoski, W. Liu, and J. Bailey. A time-dependent enhanced support vector machine for time series regression. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013.
[36]
S. Rivoire, P. Ranganathan, and C. Kozyrakis. A comparison of high-level full-system power models. HotPower, 8:3--3, 2008.
[37]
Y. Saatçi, R. D. Turner, and C. E. Rasmussen. Gaussian process change point models. In International Conference on Machine Learning, 2010.
[38]
X. Song, M. Wu, C. Jermaine, and S. Ranka. Statistical change detection for multi-dimensional data. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007.
[39]
J. Takeuchi and K. Yamanishi. A unifying framework for detecting outliers and change points from time series. IEEE Transactions on Knowledge and Data Engineering, 2006.
[40]
Y. Tao and M. T. Ozsu. Mining data streams with periodically changing distributions. In ACM Conference on Information and Knowledge Management, 2009.
[41]
M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. Keogh. Indexing multi-dimensional time-series with support for multiple distance measures. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
[42]
W. Vereecken et al. Overall ICT Footprint and Green Communication Technologies. In International Symposium on Communications, Control and Signal Processing, 2010.
[43]
Y. Wang and T. Pavlidis. Optimal correspondence of string subsequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12:1080--1087, 1990.
[44]
Watts up? Meters. https://www.wattsupmeters.com, Accessed 22 March 2015.

Cited By

View all
  • (2017)An evaluation of combinations of lossy compression and change-detection approaches for time-series dataInformation Systems10.1016/j.is.2016.11.00165:C(65-77)Online publication date: 1-Apr-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSDBM '15: Proceedings of the 27th International Conference on Scientific and Statistical Database Management
June 2015
390 pages
ISBN:9781450337090
DOI:10.1145/2791347
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SSDBM 2015

Acceptance Rates

Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2017)An evaluation of combinations of lossy compression and change-detection approaches for time-series dataInformation Systems10.1016/j.is.2016.11.00165:C(65-77)Online publication date: 1-Apr-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media