Granulation of Large Temporal Databases: An Allan Variance Approach

Sinanaj, Lorina; Haeri, Hossein; Maddipatla, Satya Prasad; Gao, Liming; Pakala, Rinith; Kathiriya, Niket; Beal, Craig; Brennan, Sean; Chen, Cindy; Jerath, Kshitij

doi:10.1007/s42979-022-01397-2

Granulation of Large Temporal Databases: An Allan Variance Approach

Original Research
Published: 15 October 2022

Volume 4, article number 7, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

100 Accesses
1 Citation
Explore all metrics

Abstract

As the use of Big Data begins to dominate various scientific and engineering applications, the ability to conduct complex data analyses with speed and efficiency has become increasingly important. The availability of large amounts of data results in ever-growing storage requirements and magnifies issues related to query response times. In this work, we propose a novel methodology for granulation and data reduction of large temporal databases that can address both issues simultaneously. While prior data reduction techniques rely on heuristics or may be computationally intensive, our work borrows the concept of Allan Variance (AVAR) from the fields of signal processing and sensor characterization to efficiently and systematically reduce the size of temporal databases. Specifically, we use Allan variance to systematically determine the temporal window length over which data remains relevant. Large temporal databases are then granulated using the AVAR-determined window length. Averaging over the resulting granules produces aggregate information for each granule, resulting in significant data reduction. The query performance and data quality are evaluated using existing standard datasets, as well as for two large datasets that include temporal information for vehicular and weather data. Our results demonstrate that the AVAR-based data reduction approach is efficient and maintains data quality, while leading to an order of magnitude improvement in query execution times compared to three existing clustering-based data reduction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

In Situ Statistical Distribution-Based Data Summarization and Visual Analysis

Prospective Data Model and Distributed Query Processing for Mobile Sensing Data Streams

Data reduction in big data: a survey of methods, challenges and future directions

Article 10 July 2024

References

Allan DW. Statistics of atomic frequency standards. Proc IEEE. 1966;54(2):221–30.
Article Google Scholar
Bezdek JC, Ehrlich R, Full W. FCM: the fuzzy c-means clustering algorithm. Comput Geosci. 1984;10(2–3):191–203.
Article Google Scholar
Dua D, Graff C. UCI machine learning repository, 2017. http://archive.ics.uci.edu/ml. 2019
Goldberg D. What every computer scientist should know about floating-point arithmetic. ACM Comput Surv (CSUR). 1991;23(1):5–48.
Article Google Scholar
Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H. Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min Knowl Discov. 1997;1(1):29–53.
Article Google Scholar
Haeri H, Beal CE, Jerath K. Near-optimal moving average estimation at characteristic timescales: an allan variance approach. IEEE Control Syst Lett. 2021;5(5):1531–6.
Article MathSciNet Google Scholar
Hartigan JA. Clustering algorithms. Hoboken: Wiley; 1975.
MATH Google Scholar
Helsen J, Peeters C, Doro P, Ververs E, Jordaens P.J. Wind farm operation and maintenance optimization using big data. In: 2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService), 2017;179–184.
Henrikson J. Completeness and total boundedness of the Hausdorff metric. MIT Undergrad J Math. 1999;1:69–80.
Google Scholar
IEEE. IEEE standard for floating-point arithmetic. In: IEEE Std 754-2019 (Revision of IEEE 754-2008), 2019;1–84.
Januzaj E, Kriegel H.-P, Pfeifle M. DBDC: Density-based distributed clustering. In: International Conference on Extending Database Technology, 2004;88–105. Springer.
Jerath K, Brennan S, Lagoa C. Bridging the gap between sensor noise modeling and sensor characterization. Measurement. 2018;116:350–66.
Article Google Scholar
Jerath K, Brennan SN. GPS-free terrain-based vehicle tracking performance as a function of inertial sensor characteristics. Dyn Syst Control Conf. 2011;54761:367–74.
Google Scholar
Johnston W. Model visualization. San Francisco: Morgan Kaufmann Publishers Inc.; 2001. p. 223–7.
Google Scholar
Kaufmann L. Clustering by means of medoids. In Proc. Statistical Data Analysis Based on the L1 Norm Conference. Neuchatel. 1987;1987:405–16.
Keogh E, Mueen A. Curse of dimensionality. In: Encyclopedia of Machine Learning and Data Mining. 2017;314–315
Kile H, Uhlen K. Data reduction via clustering and averaging for contingency and reliability analysis. Int J Elect Power Energy Syst. 2012;43(1):1435–42.
Article Google Scholar
Kodinariya TM, Makwana PR. Review on determining number of cluster in k-means clustering. Int J. 2013;1(6):90–5.
Google Scholar
Liu H, Motoda H. On issues of instance selection. Data Min Knowl Discov. 2002;6:115–30.
Article MathSciNet Google Scholar
Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G. Learning under concept drift: a review. IEEE Trans Knowl Data Eng. 2018;31(12):2346–63.
Google Scholar
Lumini A, Nanni L. A clustering method for automatic biometric template selection. Pattern Recogn. 2006;39(3):495–7.
Article MATH Google Scholar
MacQueen J. et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA. 1967
Maddipatla SP, Haeri H, Jerath K, Brennan S. Fast allan variance (FAVAR) and Dynamic fast allan variance (D-FAVAR) algorithms for both regularly and irregularly sampled data. IFAC-PapersOnLine. 2021;54(20):26–31.
Article Google Scholar
Madigan D, Nason M. Data reduction: sampling. In: Handbook of data mining and knowledge discovery. 2002;205–208.
Mishra AD, Garg D. Selection of best sorting algorithm. Int J Intell Inform Process. 2008;2(2):363–8.
Google Scholar
NASA. Prediction of worldwide energy resource (POWER) datasets. https://power.larc.nasa.gov/.
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF. A new fast prototype selection method based on clustering. Pattern Anal Appl. 2010;13(2):131–41.
Article MathSciNet Google Scholar
Pedrycz W. Granular computing: an introduction. In: Proceedings joint 9th IFSA world congress and 20th NAFIPS international conference (Cat. No. 01TH8569), 2001;3, 1349–1354. IEEE
Rehman MH, Liew CS, Abbas A, Jayaraman PP, Wah TY, Khan SU. Big data reduction methods: a survey. Data Sci Eng. 2016;1(4):265–84.
Article Google Scholar
Sesia I, Tavella P. Estimating the Allan variance in the presence of long periods of missing data and outliers. Metrologia. 2008;45(6).
Sinanaj L, Haeri H, Gao L, Maddipatla S, Chen C, Jerath K, Beal C, Brennan S. Allan Variance-based Granulation Technique for Large Temporal Databases. In: Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KMIS, pages 17–28. INSTICC, SciTePress. 2021
Sun X, Liu L, Geng C, Yang S. Fast data reduction with granulation-based instances importance labeling. IEEE Access. 2019;7:33587–97.
Article Google Scholar
Zadeh LA. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 1997;90(2):111–27.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. 1932138. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Computer Science Department, University of Massachusetts Lowell, 220 Pawtucket St, Lowell, USA
Lorina Sinanaj, Rinith Pakala, Niket Kathiriya & Cindy Chen
Mechanical Engineering Department, University of Massachusetts Lowell, Lowell, USA
Hossein Haeri & Kshitij Jerath
Mechanical Engineering Department, The Pennsylvania State University, University Park, USA
Satya Prasad Maddipatla, Liming Gao & Sean Brennan
Mechanical Engineering Department, Bucknell University, Lewisburg, USA
Craig Beal

Authors

Lorina Sinanaj
View author publications
You can also search for this author inPubMed Google Scholar
Hossein Haeri
View author publications
You can also search for this author inPubMed Google Scholar
Satya Prasad Maddipatla
View author publications
You can also search for this author inPubMed Google Scholar
Liming Gao
View author publications
You can also search for this author inPubMed Google Scholar
Rinith Pakala
View author publications
You can also search for this author inPubMed Google Scholar
Niket Kathiriya
View author publications
You can also search for this author inPubMed Google Scholar
Craig Beal
View author publications
You can also search for this author inPubMed Google Scholar
Sean Brennan
View author publications
You can also search for this author inPubMed Google Scholar
Cindy Chen
View author publications
You can also search for this author inPubMed Google Scholar
Kshitij Jerath
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Kshitij Jerath.

Ethics declarations

Conflict of interest statement

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Knowledge Discovery, Knowledge Engineering and Knowledge Management” guest edited by Joaquim Filipe, Ana Fred, Jan Dietz, Ana Salgado and Jorge Bernardino.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sinanaj, L., Haeri, H., Maddipatla, S.P. et al. Granulation of Large Temporal Databases: An Allan Variance Approach. SN COMPUT. SCI. 4, 7 (2023). https://doi.org/10.1007/s42979-022-01397-2

Download citation

Received: 09 March 2022
Accepted: 20 August 2022
Published: 15 October 2022
DOI: https://doi.org/10.1007/s42979-022-01397-2

Keywords

Part of a collection:

Knowledge Discovery, Knowledge Engineering and Knowledge Management

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Granulation of Large Temporal Databases: An Allan Variance Approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

In Situ Statistical Distribution-Based Data Summarization and Visual Analysis

Prospective Data Model and Distributed Query Processing for Mobile Sensing Data Streams

Data reduction in big data: a survey of methods, challenges and future directions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now