Correlation analysis techniques for uncertain time series

Orang, Mahsa; Shiri, Nematollaah

doi:10.1007/s10115-016-0939-7

Correlation analysis techniques for uncertain time series

Regular Paper
Published: 12 April 2016

Volume 50, pages 79–116, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Mahsa Orang¹ &
Nematollaah Shiri¹

680 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Many applications such as location-based services and wireless sensor networks generate and deal with uncertain time series (UTS), where the “exact” value at each timestamp is unknown. Traditional correlation analysis and search techniques developed for standard time series are inadequate for UTS data analysis required in such applications. Motivated by this need, we propose suitable concepts and techniques for UTS correlation analysis. We formalize the notion of normalization and correlation for UTS in two general settings based on the available information at each timestamp: (1) PDF-based UTS (having probability density function) and (2) multiset-based UTS (having multiset of observed values). For each case, we formulate correlation as a random variable and develop techniques to determine the underlying probability density function. For setup (2), we also present probabilistic pruning and sampling techniques to speed up the search process. We conducted numerous experiments to evaluate the performance of the proposed techniques under different configurations using the UCR benchmark datasets. Our results indicate effectiveness of the proposed techniques. For setup (2), in particular, our results show significant improvement in space utilization and computation time. We believe the proposed ideas and solutions lend themselves to powerful tools for UTS analysis and search tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An ultra-fast time series distance measure to allow data mining in more complex real-world deployments

Article 30 May 2020

Correlation Set Discovery on Time-Series Data

Mining Rare Temporal Pattern in Time Series

References

Asfalg J, Kriegel HP, Kröger P, Renz M (2009) Probabilistic similarity search for uncertain time series. In: Proceedings of international conference on scientific and statistical database management (SSDBM), pp 435–443
Bagnall A, Ratanamahatana CA, Keogh E, Lonardi S, Janacek G (2006) A bit level representation for time series data mining with shape based similarity. Proc Data Min Knowl Discov J 13(1):11–40
Article MathSciNet Google Scholar
Bernecker T, Kriegel H-P, Renz M, Zuefle A (2009) Probabilistic ranking in uncertain vector spaces. In: Proceedings of workshop on managing data quality in collaborative information systems
Bohm C, Pryakhin A, Schubert M (2006) The Gauss-tree: efficient object identification of probabilistic feature vectors. In: Proceedings of international conference on data engineering (ICDE)
Cheng R, Kalashnikov DV, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proceedings of ACM SIGMOD international conference on management of data, pp 551–562
Cheng R, Kalashnikov DV, Prabhakar S (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng 9(16):1112–1127
Article Google Scholar
Cheng R, Singh S, Prabhakar S, Shah R, Vitter JS, Xia Y (2006) Efficient join processing over uncertain data. In: Proceedings of ACM international conference on information and knowledge management (CIKM), pp 738–747
Complete experimental results of this paper. http://tinyurl.com/qfvbauf
Dallachiesa M, Jacques-Silva G, Gedik B, Wu KL, Palpanas T (2014) Sliding windows over uncertain data streams. Knowl Inf Syst J 45(1):159–190
Dallachiesa M, Nushi B, Mirylenka K, Palpanas T (2012) Uncertain time series similarity: return to the basics. Proc VLDB Endow 5(11):1662–1673
Article Google Scholar
Dallachiesa M, Palpanas T, Ilyas IF (2014) Top-k nearest neighbor search in uncertain data series. Proc VLDB Endow J 8(1):13–24
Article Google Scholar
Dvoretzky A, Kiefer J, Wolfowitz J (1956) Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann Math Stat 27(3):642–669
Article MathSciNet MATH Google Scholar
Emrich T, Kriegel H-P, Mamoulis N, Renz M, Zufle A (2012) Querying uncertain spatio-temporal data. In: Proceedings of international conference on data engineering (ICDE), pp 354–365
Hong Y (2013) On computing the distribution function for the Poisson binomial distribution. Comput Stat Data Anal 59:41–51
Article MathSciNet Google Scholar
Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA. The UCR time series classification/clustering homepage. http://www.cs.ucr.edu/~eamonn/time_series_data/
Kriegel H-P, Kunath P, Renz M (2007) Probabilistic nearest-neighbor query on uncertain objects. In: Proceedings of international conference on database systems for advanced, pp 337–348
Lian X, Chen L, Yu JW (2008) Pattern matching over cloaked time series. In: Proceedings of international conference on data engineering (ICDE), pp 1462–1464
Ljosa V, Singh AK (2007) APLA: indexing arbitrary probability distributions. In: Proceedings of international conference on data engineering (ICDE), pp 946–955
Lomnicki ZA, Zaremba SK (1955) Some applications of zero-one processes. Proc J R Stat Soc 17(2):243–255
MathSciNet MATH Google Scholar
Massart P (1990) The tight constant in the Dvoretzky–Kiefer–Wolfowitz inequality. Ann Probab 18(3):1269–1283
Article MathSciNet MATH Google Scholar
Nguyen P, Shiri N (2008) Fast correlation analysis on time series datasets. In: Proceedings of the ACM conference on information and knowledge management (CIKM), pp 787–796
Orang M, Shiri N (2012) A probabilistic approach to correlation queries in uncertain time series data. In: Proceedings of the ACM conference on information and knowledge management (CIKM), pp 2229–2233
Orang M, Shiri N (2014) An experimental evaluation of similarity measures for uncertain time series. In: Proceedings of international database engineering and applications symposium (IDEAS), pp 261–264
Orang M, Shiri N (2015) Improving performance of similarity measures for uncertain time series using preprocessing techniques. In: Proceedings of international conference on scientific and statistical database management (SSDBM), vol 31, pp 1–12
Ross SM (2009) Introductory statistics. Academic Press, San Diego
MATH Google Scholar
Sarangi SR, Murth K (2010) DUST: a generalized notion of similarity between uncertain time series. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 383–392
Shasha D, Zhu Y (2004) High performance discovery in time series: techniques and case studies. Springer, New York
Book MATH Google Scholar
Shorack GR, Wellner JA (2009) Empirical processes with applications to statistics. Society for Industrial and Applied Mathematics, Philadelphia
Tao Y, Cheng R, Xiao X, Ngai W, Kao B, Prabhakar S (2005) Indexing multidimensional uncertain data with arbitrary probability density functions. In: Proceedings of international conference on very large data bases (VLDB), pp 922–933
Weld DS, de Kleer J (1990) Readings in qualitative reasoning about physical systems. Morgan Kaufmann, Burlington
Google Scholar
Wu WCH, Yeh MY, Pei J (2012) Random error reduction in similarity search on time series: a statistical approach. In: Proceedings of IEEE international conference on data engineering (ICDE), pp 858–869
Yeh MY, Wu KL, Yu PS, Chen MS (2009) PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. In: Proceedings of international conference on extending database technology, advances in database technology (EDBT), pp 684–695
Zhang L, Li J, Wang Z (2011) Uneven two-step sampling and distance calculation for uncertain trajectory. J Inf Comput Sci 9(8):1505–1513
Google Scholar
Zhang T, Yue D, Yu G, Gu Y (2007) Correlation analysis based on hierarchical Boolean representation over time series data streams. In: Proceedings of international conference on fuzzy systems and knowledge discovery (FSKD), vol 2, pp 740–744
Zhao Y, Aggarwal CC, Yu PS (2010) On wavelet decomposition of uncertain time series data sets. In: Proceedings of ACM international conference on information and knowledge management (CIKM), pp 129–138

Download references

Acknowledgments

The authors would like to thank anonymous reviewers for their comments that helped improve the manuscript. This work was supported in part by Natural Sciences and Engineering Research Council (NSERC) of Canada and by Concordia University.

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, Concordia University, Montreal, QC, Canada
Mahsa Orang & Nematollaah Shiri

Authors

Mahsa Orang
View author publications
You can also search for this author inPubMed Google Scholar
Nematollaah Shiri
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mahsa Orang.

Appendices

Appendix 1

Table 1 provides a glossary for the notations used in this paper.

Table 1 Main notations used in this paper

Full size table

Appendix 2

For each dataset, Table 2 shows the percentage improvement defined as the $F_{1}$ score of the probabilistic queries minus that of the deterministic queries divided by the $F_{1}$ score of the deterministic queries for the PDF-based model. Table 2 illustrates that for all the datasets, the percentage improvement is positive. This shows that probabilistic queries always outperform deterministic queries. We noted that for the datasets Beef and Trace, the $F_{1}$ score of the deterministic queries was 0, while it was nonzero for probabilistic queries. Thus, for these two datasets, the improvement percentage is undefined.

Table 3 illustrates the improvement in percentage of $F_{1}$ score of the probabilistic queries for the multiset-based model for all datasets. Similar to Table 2, this table shows that the probabilistic queries outperform the deterministic queries. Moreover, in both tables, the higher the uncertainty level (i.e., SDR) the higher the improvement percentage of the $F_{1}$ score. This implies that compared to deterministic queries, probabilistic queries are more resilient to the uncertainty level.

Table 2 Improvement percentage for different UCR datasets for the PDF-based model

Full size table

Table 3 Improvement percentage of UCR datasets for the multiset-based model for number of observed values less than 6

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Orang, M., Shiri, N. Correlation analysis techniques for uncertain time series. Knowl Inf Syst 50, 79–116 (2017). https://doi.org/10.1007/s10115-016-0939-7

Download citation

Received: 11 March 2015
Revised: 24 December 2015
Accepted: 22 March 2016
Published: 12 April 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10115-016-0939-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Correlation analysis techniques for uncertain time series

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An ultra-fast time series distance measure to allow data mining in more complex real-world deployments

Correlation Set Discovery on Time-Series Data

Mining Rare Temporal Pattern in Time Series

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now