TurboLift: fast accuracy lifting for historical data recovery

Yang, Fan; Almutairi, Faisal M.; Song, Hyun Ah; Faloutsos, Christos; Sidiropoulos, Nicholas D.; Zadorozhny, Vladimir

doi:10.1007/s00778-020-00609-6

TurboLift: fast accuracy lifting for historical data recovery

Regular Paper
Published: 09 March 2020

Volume 29, pages 1129–1148, (2020)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Fan Yang¹,
Faisal M. Almutairi^2,3,
Hyun Ah Song⁴,
Christos Faloutsos⁴,
Nicholas D. Sidiropoulos⁵ &
…
Vladimir Zadorozhny¹

410 Accesses
2 Citations
Explore all metrics

A Correction to this article was published on 07 July 2023

This article has been updated

Abstract

Historical data are frequently involved in situations where the available reports on time series are temporally aggregated at different levels, e.g., the monthly counts of people infected with measles. In real databases, the time periods covered by different reports can have overlaps (i.e., time-ticks covered by more than one reports) or gaps (i.e., time-ticks not covered by any report). However, data analysis and machine learning models require reconstructing the historical events in a finer granularity, e.g., the weekly patient counts, for elaborate analysis and prediction. Thus, data disaggregation algorithms are becoming increasingly important in various domains. Time series disaggregation methods commonly utilize domain knowledge about the data, e.g., smoothness, periodicity, or sparsity, to improve the reconstruction accuracy. In this paper, we propose a novel approach, called TurboLift, which aims to improve the quality of the solutions provided by existing disaggregation methods. Starting from a solution produced by a specific method, TurboLift finds a new solution that reduces the disaggregation error and is close to the initial one. We derive a closed-form solution to the proposed formulation of TurboLift that enables us to obtain an accurate reconstruction analytically, without performing resource and time-consuming iterations. Experiments on real data from different domains showcase the effectiveness of TurboLift in terms of disaggregation error, and outlier and anomaly detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable recovery of missing blocks in time series with high and low cross-correlations

Article 15 November 2019

Analyzing Load Profiles of Electricity Consumption by a Time Series Data Mining Framework

Time Series Cleaning Under Variance Constraints

Change history

07 July 2023
A Correction to this paper has been published: https://doi.org/10.1007/s00778-023-00801-4

Notes

The Toeplitz matrix is a matrix in which each descending diagonal from left to right is constant [16].
https://www.kaggle.com/bletchley/course-material-walmart-challenge.
https://www.ncdc.noaa.gov/.
https://www.kaggle.com/chicago/chicago-crime/activity.

References

Almutairi, F.M., Yang, F., Song, H.A., Faloutsos, C., Sidiropoulos, N., Zadorozhny, V.: Homerun: scalable sparse-spectrum reconstruction of aggregated historical data. Proc. VLDB Endow. 11(11), 1496–1508 (2018)
Article Google Scholar
Almutairi, F.M., Kanatsoulis, C.I., Sidiropoulos, N.: PREMA: principled tensor data recovery from multiple aggregated views (2019). arXiv preprint arXiv:1910.12001
Bleiholder, J., Naumann, F.: Data fusion. ACM Comput. Surv. (CSUR) 41(1), 1 (2009)
Article Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Chen, S., Donoho, D.: Basis pursuit. In: Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 41–44. IEEE (1994)
Chen, Y., Chen, L., Zhang, C.J.: Crowdfusion: a crowdsourced approach on data fusion refinement. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 127–130. IEEE (2017)
Chow, G.C., Lin, A.: Best linear unbiased interpolation, distribution, and extrapolation of time series by related series. Rev. Econ. Stat. 53(4), 372–375 (1971)
Article Google Scholar
Di Fonzo, T.: The estimation of M disaggregate time series when contemporaneous and temporal aggregates are known. Rev. Econ. Stat. 178–182 (1990)
Dong, X.L., Naumann, F.: Data fusion: resolving data conflicts for integration. Proc. VLDB Endow. 2(2), 1654–1655 (2009)
Article Google Scholar
Dong, X.L., Srivastava, D.: Big data integration. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1245–1248. IEEE (2013)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MathSciNet MATH Google Scholar
Faloutsos, C., Jagadish, H.V., Sidiropoulos, N.D.: Recovering information from summary data. In: Proceedings of 23rd International Conference on Very Large Data Bases, August 25–29, 1997, Athens, Greece, VLDB’97, pp. 36–45 (1997). http://www.vldb.org/conf/1997/P036.PDF
Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)
MATH Google Scholar
Hall, D.L., Jordan, J.M.: Human-Centered Information Fusion. Artech House, Norwood (2010)
Google Scholar
Hall, D.L., McMullen, S.A.: Mathematical Techniques in Multisensor Data Fusion. Artech House, Norwood (2004)
MATH Google Scholar
Heinig, G., et al.: Algebraic Methods for Toeplitz-Like Matrices and Operators, vol. 13. Birkhauser, Basel (2013)
Google Scholar
Holmes, M.P., Isbell, J., Lee, C., Gray, A.G.: QUIC-SVD: Fast SVD using cosine trees. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 673–680. Curran Associates Inc, New York (2009)
Google Scholar
Hormati, A., Vetterli, M.: Annihilating filter-based decoding in the compressed sensing framework. In: Optical Engineering+ Applications, pp. 670121–670121. International Society for Optics and Photonics (2007)
Jaynes, E.T.: The well-posed problem. Found. Phys. 3(4), 477–492 (1973)
Article MathSciNet Google Scholar
Jin, Y.H., Williams, B.D., Tokar, T., Waller, M.A.: Forecasting with temporally aggregated demand signals in a retail supply chain. J. Bus. Logist. 36(2), 199–211 (2015)
Article Google Scholar
Liu, Z., Song, H.A., Zadorozhny, V., Faloutsos, C., Sidiropoulos, N.: H-fuse: efficient fusion of aggregated historical data. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 786–794. SIAM (2017)
Panagiotopoulou, A., Anastassopoulos, V.: Super-resolution image reconstruction techniques: trade-offs between the data-fidelity and regularization terms. Inf. Fusion 13(3), 185–195 (2012)
Article Google Scholar
Pavia-Miralles, J.M.: A survey of methods to interpolate, distribute and extra-polate time series. J. Serv. Sci. Manag. 3(04), 449 (2010)
Google Scholar
Pavia-Miralles, J.M., Cabrer-Borras, B.: On estimating contemporaneous quarterly regional GDP. J. Forecast. 26(3), 155–170 (2007)
Article MathSciNet Google Scholar
Rekatsinas, T., Joglekar, M., Garcia-Molina, H., Parameswaran, A., Re, C.: Slimfast: guaranteed results for data fusion and source reliability. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1399–1414. ACM (2017)
Research, F.: Fast randomized SVD. https://research.fb.com/fast-randomized-svd/ (2014)
Rossi, N., et al.: A note on the estimation of disaggregate time series when the aggregate is known. Rev. Econ. Stats. 64(4), 695–696 (1982)
Article Google Scholar
Salinas, R.A., Richardson, C., Abidi, M.A., Gonzalez, R.C.: Data fusion: color edge detection and surface reconstruction through regularization. IEEE Trans. Ind. Electron. 43(3), 355–363 (1996)
Article Google Scholar
Sax, C., Steiner, P.: Temporal Disaggregation of time series. R J. 5(2), 80–88 (2013). http://journal.r-project.org/archive/2013-2/sax-steiner.pdf
Shamir, O.: Fast stochastic algorithms for SVD and PCA: convergence properties and convexity. In: International Conference on Machine Learning, pp. 248–256 (2016)
Stoica, P., Moses, R.L.: Introduction to Spectral Analysis, vol. 1. Prentice Hall, Upper Saddle River (1997)
MATH Google Scholar
Tycho: Project tycho: data for health. https://www.tycho.pitt.edu (2013)
Van Panhuis, W.G., Grefenstette, J., Jung, S.Y., Chok, N.S., Cross, A., Eng, H., Lee, B.Y., Zadorozhny, V., Brown, S., Cummings, D., et al.: Contagious diseases in the United States from 1888 to the present. N. Engl. J. Med. 369(22), 2152 (2013)
Yang, F., Song, H.A., Liu, Z., Faloutsos, C., Zadorozhny, V., Sidiropoulos, N.: Ares: automatic disaggregation of historical data. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 65–76. IEEE (2018)
Zadorozhny, V., Grant, J.: A systematic approach to reliability assessment in integrated databases. J. Intell. Inf. Syst. 46(3), 409–424 (2016)
Article Google Scholar
Zadorozhny, V., Lewis, M.: Information fusion for usar operations based on crowdsourcing. In: 16th International Conference on Information Fusion (FUSION), pp. 1450–1457. IEEE (2013)
Zhu, Y., Comaniciu, D., Pellkofer, M., Koehler, T.: Reliable detection of overtaking vehicles using robust information fusion. IEEE Trans. Intell. Transp. Syst. 7(4), 401–414 (2006)
Article Google Scholar

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grants Nos. IIS-1247489, IIS-1247632. This work is also partially supported by an IBM Faculty Award and a Google Focused Research Award. V. Zadorozhny was partially supported by NSF BCS-1244672 Grant. N. Sidiropoulos was partially supported by NSF IIS-1247632 and IIS-1447788. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Author information

Authors and Affiliations

University of Pittsburgh, Pittsburgh, USA
Fan Yang & Vladimir Zadorozhny
University of Minnesota, Minneapolis, USA
Faisal M. Almutairi
Department of Electrical Engineering, College of Engineering, Qassim University, Unaizah, Saudi Arabia
Faisal M. Almutairi
Carnegie Mellon University, Pittsburgh, USA
Hyun Ah Song & Christos Faloutsos
University of Virginia, Charlottesville, USA
Nicholas D. Sidiropoulos

Authors

Fan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Faisal M. Almutairi
View author publications
You can also search for this author in PubMed Google Scholar
Hyun Ah Song
View author publications
You can also search for this author in PubMed Google Scholar
Christos Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas D. Sidiropoulos
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Zadorozhny
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fan Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original article was revised due to update in affiliation of second author.

A preliminaries of the analytical solution

In this section introduce some mathematical background to facilitate the understanding of the Lemma 1

$\textbf{A}^T\textbf{A} \in \mathcal {R} ^ {N \times N}$ is a positive semi-definite matrix.
The singular value decomposition (SVD) of $\textbf{A}^T\textbf{A}$ is $\textbf{U} {\varvec{\Sigma }} \textbf{U}^T$, where $\textbf{U} \in \mathcal {R}^{N\times N} $ contains a set of singular vectors of $\textbf{A}^T\textbf{A}$ and ${\varvec{\Sigma }} \in \mathcal {R}^{N\times N}$ is a diagonal matrix with non-negative singular values $\lambda _i >= 0$ $(i = 1,2,\ldots ,N)$ on the diagonal.
The singular value matrix ${\varvec{\Sigma }}a$ can be written as a partitioned matrix $ \begin{bmatrix} {\varvec{\Sigma }}_1 &{} 0 \\ 0 &{} 0 \end{bmatrix} $ where ${\varvec{\Sigma }}_1 \in \mathcal {R}^{M \times M}$ contains all nonzero singular values. Correspondingly the matrix $\textbf{U}$ can split into two parts $[\textbf{U}_1, \textbf{U}_2]$, where $\textbf{U}_1 \in \mathcal {R}^{N\times M}$ contains the singular vectors corresponding to the nonzero singular values, while $\textbf{U}_2 \in \mathcal {R}^{N\times (N-M)}$ contains the remaining part. Based on the property of a singular vector, $\textbf{U}_1 \perp \textbf{U}_2$ $(\textbf{U}_1^T\textbf{U}_2 = 0)$ and $\textbf{U}_2 \perp \textbf{A}^T$ $(\textbf{U}_2^T\textbf{A}^T = 0)$
Suppose we have
$$\begin{aligned} {\varvec{\Lambda }}&= {\varvec{\Sigma }} + \textbf{I}_n \\&= \begin{bmatrix} {\varvec{\Sigma }}_1 &{} \quad \textbf{0} \\ \textbf{0} &{} \quad \textbf{0} \end{bmatrix} + \begin{bmatrix} \textbf{I}_M &{}\quad \textbf{0} \\ \textbf{0} &{} \quad \textbf{I}_{N-M} \end{bmatrix} \\&= \begin{bmatrix} \lambda _1+1 &{} \quad ...&{} \quad 0 &{}\quad .&{}\quad .&{}\quad 0\\ ... &{} \quad . &{} \quad .&{} \quad . &{} \quad . &{} \quad ...\\ 0 &{} \quad ... &{} \quad \lambda _{M}+1&{} \quad .&{} \quad . &{} \quad 0\\ 0 &{} \quad ... &{} \quad 0 &{} \quad 1 &{} \quad ... &{} \quad 0\\ ... &{} \quad . &{} \quad .&{} \quad . &{} \quad . &{} \quad ...\\ 0 &{} \quad ... &{} \quad 0 &{} \quad 0 &{} \quad ... &{} \quad 1\\ \end{bmatrix} \end{aligned}$$
where $\lambda _i + 1 > 1$ $(i = 1,2,\ldots ,M)$ and $\textbf{I}_n$ is the n-dimension identity matrix.

From the geometric series,
$$\begin{aligned} \frac{1}{\beta } + \left( \frac{1}{\beta }\right) ^2 + \cdots + \left( \frac{1}{\beta }\right) ^n&\\ =&\frac{1}{\beta }(1 + \cdots +\left( \frac{1}{\beta }\right) ^{n-1}) \\ =&\frac{1-\beta ^{-n}}{\beta -1} \end{aligned}$$
With $n \rightarrow \infty $, if $\beta > 1$, the equation converge to $\frac{1}{\beta -1}$, while if $\beta = 1$, the equation equal to n.

Then
$$\begin{aligned} \lim _{n \rightarrow \infty }\sum _{k=1}^n{\varvec{\Lambda }}^{-k} =n \begin{bmatrix} \frac{1}{\lambda _1} &{} \quad ...&{} \quad 0 &{}\quad .&{}\quad .&{} \quad 0\\ ... &{} \quad . &{} \quad .&{}\quad . &{}\quad . &{} \quad ...\\ 0 &{} \quad ... &{} \quad \frac{1}{\lambda _{M}}&{} \quad .&{} \quad . &{} \quad 0\\ 0 &{} \quad ... &{} \quad 0 &{} \quad n &{} \quad ... &{} \quad 0\\ ... &{} \quad . &{} \quad .&{} \quad . &{} \quad . &{} \quad ...\\ 0 &{} \quad ... &{} \quad 0 &{} \quad 0 &{} \quad ... &{} \quad n\\ \end{bmatrix} \end{aligned}$$
Since $\lim _{n \rightarrow \infty } \frac{1}{\beta ^n} = 0$, for $\beta > 1$,
$$\begin{aligned}&\lim _{n \rightarrow \infty } {\varvec{\Lambda }} ^{-n} = \\&\lim _{n \rightarrow \infty }\begin{bmatrix} \frac{1}{(\lambda _1+1)^n} &{} \quad ...&{} \quad 0 &{}\quad .&{}\quad .&{} \quad 0\\ ... &{} \quad . &{} \quad .&{} \quad . &{} \quad . &{} \quad ...\\ 0 &{} \quad ... &{}\quad \frac{1}{(\lambda _{M}+1)^n}&{}\quad .&{} \quad . &{} \quad 0\\ 0 &{} \quad ... &{} \quad 0 &{} \quad 1 &{} \quad ... &{} \quad 0\\ ... &{} \quad . &{} \quad .&{} \quad . &{} \quad . &{} \quad ...\\ 0 &{} \quad ... &{} \quad 0 &{} \quad 0 &{} \quad ... &{} \quad 1\\ \end{bmatrix} \\&=\begin{bmatrix} 0 &{} \quad ...&{} \quad 0 &{}\quad .&{}\quad .&{} \quad 0\\ ... &{} \quad . &{} \quad .&{} \quad . &{} \quad . &{} \quad ...\\ 0 &{} \quad ... &{} \quad 0&{} \quad .&{} \quad . &{} \quad 0\\ 0 &{} \quad ... &{} \quad 0 &{} \quad 1 &{} \quad ... &{} \quad 0\\ ... &{} \quad . &{} \quad .&{} \quad . &{} \quad . &{} \quad ...\\ 0 &{} \quad ... &{} \quad 0 &{} \quad 0 &{} \quad ... &{} \quad 1\\ \end{bmatrix} = \begin{bmatrix} 0 &{}\quad 0 \\ 0 &{} \quad \textbf{I}_{N-M} \end{bmatrix} \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, F., Almutairi, F.M., Song, H.A. et al. TurboLift: fast accuracy lifting for historical data recovery. The VLDB Journal 29, 1129–1148 (2020). https://doi.org/10.1007/s00778-020-00609-6

Download citation

Received: 03 March 2019
Revised: 03 February 2020
Accepted: 24 February 2020
Published: 09 March 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00778-020-00609-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TurboLift: fast accuracy lifting for historical data recovery

Abstract

Access this article

Similar content being viewed by others

Scalable recovery of missing blocks in time series with high and low cross-correlations

Analyzing Load Profiles of Electricity Consumption by a Time Series Data Mining Framework

Time Series Cleaning Under Variance Constraints

Change history

07 July 2023

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A preliminaries of the analytical solution

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TurboLift: fast accuracy lifting for historical data recovery

Abstract

Access this article

Similar content being viewed by others

Scalable recovery of missing blocks in time series with high and low cross-correlations

Analyzing Load Profiles of Electricity Consumption by a Time Series Data Mining Framework

Time Series Cleaning Under Variance Constraints

Change history

07 July 2023

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A preliminaries of the analytical solution

A preliminaries of the analytical solution

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation