research-article

DUST: a generalized notion of similarity between uncertain time series

Authors:
Smruti R. Sarangi

IBM Research - India, Bangalore, India

IBM Research - India, Bangalore, India
View Profile

,
Karin Murthy

IBM Research - India, Bangalore, India

IBM Research - India, Bangalore, India
View Profile

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2010Pages 383–392https://doi.org/10.1145/1835804.1835854

Published:25 July 2010Publication History

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 383–392

ABSTRACT

Large-scale sensor deployments and an increased use of privacy-preserving transformations have led to an increasing interest in mining uncertain time series data. Traditional distance measures such as Euclidean distance or dynamic time warping are not always effective for analyzing uncertain time series data. Recently, some measures have been proposed to account for uncertainty in time series data. However, we show in this paper that their applicability is limited. In specific, these approaches do not provide an intuitive way to compare two uncertain time series and do not easily accommodate multiple error functions.

In this paper, we provide a theoretical framework that generalizes the notion of similarity between uncertain time series. Secondly, we propose DUST, a novel distance measure that accommodates uncertainty and degenerates to the Euclidean distance when the distance is large compared to the error. We provide an extensive experimental validation of our approach for the following applications: classification, top-k motif search, and top-k nearest-neighbor queries.

Supplemental Material

kdd2010_sarangi_dust_01.mov

mov

131.5 MB

Download

References

C. C. Aggarwal and P. S. Yu. A framework for clustering uncertain data streams. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, 2008. Google ScholarDigital Library
C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 21(5):609--623, 2009. Google ScholarDigital Library
J. Assfalg, H. Kriegel, P. Kroeger, and M. Renz. Probabilistic similarity search for uncertain time series. In Proceedings of the 21st International Conference on Scientific and Statistical Database Management, 2009. Google ScholarDigital Library
D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In Proceedings of the 1994 AAAI Workshop, 1994.Google Scholar
R. Braff and C. Shively. A method of over bounding ground based augmentation system (gbas) heavy tail error distributions. Journal of Navigation, 58(1):83--103, 2005.Google ScholarCross Ref
S. Cho. Bidirectional data aggregation scheme for wireless sensor networks. In Proceedings of the 3rd International Conference on Ubiquitous Intelligence and Computing, 2006. Google ScholarDigital Library
P. Ciarlini and U. Maniscalco. Mixture of soft sensors for monitoring air ambient parameters. In Proceedings of the XVIII IMEKO World Congress, 2006.Google Scholar
H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. Proceedings of the VLDB Endowment, 1(2):1542--1552, 2008. Google ScholarDigital Library
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. SIGMOD Record, 23(2):419--429, 1994. Google ScholarDigital Library
S. R. Jeffery, M. Garofalakis, and M. J. Franklin. Adaptive cleaning for rfid data streams. In Proceedings of the 32nd International Conference on Very Large Databases, 2006. Google ScholarDigital Library
E. Keogh, J. Lin, and W. Truppel. Clustering of time series subsequences is meaningless: Implications for previous and future research. Knowledge and Information Systems, 8(2), 2005. Google ScholarDigital Library
E. Keogh, X. Xi, L. Wei, and C. A. Ratanamahatana. The ucr time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data, Accessed on Feb 5th 2010.Google Scholar
J. Lin, E. J. Keogh, L. Wei, and S. Lonardi. Experiencing sax: a novel symbolic representation of time series. Data Mining and Knowledge Discovery, 15(2):107--144, 2007. Google ScholarDigital Library
A. Mueen, E. J. Keogh, Q. Zhu, S. Cash, and B. Westover. Exact discovery of time series motifs. In Proceedings of the SIAM International Conference on Data Mining, 2009.Google ScholarCross Ref
S. V. R. Nageswara. Algorithms for fusion of multiple sensors having unknown error distributions. In Proceedings of the 15th Symposium on Energy Engineering Sciences, 1997.Google Scholar
Nature Publishing Group. Bone marrow transplantation. www.nature.com/bmt/journal/v31/ n8/fig_tab/1703917f2.html, Accessed on Feb 5th 2010.Google Scholar
S. Palit. Signal extraction from multiple noisy sensors. Signal Processing, 61(3):199--212, 1999.Google ScholarCross Ref
A. D. Sarma, O. Benjelloun, A. Halevy, S. Nabar, and J. Widom. Representing uncertain data: Models, properties, and algorithms. The International Journal on Very Large Data Bases, 18(5), 2009. Google ScholarDigital Library
A. Sharma, L. Golubchick, and R. Govindam. On the prevalence of sensor faults in real-world deployments. In Proceedings of the 4th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, 2007.Google ScholarCross Ref
J. Sudano. Dynamic real-time sensor performance evaluation. In Proceedings of the 5th International Conference on Information Fusion, 2002.Google ScholarCross Ref
M. Yeh, K. Wu, P. S. Yu, and M. Chen. Proud: A probabilistic approach to processing similarity queries over uncertain data streams. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, 2009. Google ScholarDigital Library

Index Terms

DUST: a generalized notion of similarity between uncertain time series
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Similarity matching for uncertain time series: analytical and experimental comparison
QUeST '11: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Querying and Mining Uncertain Spatio-Temporal Data

In the last years there has been a considerable increase in the availability of continuous sensor measurements in a wide range of application domains, such as Location-Based Services (LBS), medical monitoring systems, manufacturing plants and ...
Read More
Knowledge discovery of customer purchasing intentions by plausible-frequent itemsets from uncertain data

Many previous studies have focused on the extraction of association rules from transaction data. Unfortunately, customer purchasing intentions tend to be uncertain during the decision making process. That is, they cannot be obtained from business ...
Read More
Mining fuzzy association rules from uncertain data

Association rule mining is an important data analysis method that can discover associations within data. There are numerous previous studies that focus on finding fuzzy association rules from precise and certain data. Unfortunately, real-world data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
July 2010
1240 pages
ISBN:9781450300551
DOI:10.1145/1835804
General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data mining
distance measure
similarity
time series
uncertain data
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 34
  Total Citations
  View Citations
- 973
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DUST: a generalized notion of similarity between uncertain time series

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Similarity matching for uncertain time series: analytical and experimental comparison

Knowledge discovery of customer purchasing intentions by plausible-frequent itemsets from uncertain data

Mining fuzzy association rules from uncertain data