Real-Time Analytics: Benefits, Limitations, and Tradeoffs

Kuznetsov, S. D.; Velikhov, P. E.; Fu, Q.

doi:10.1134/S036176882301005X

Real-Time Analytics: Benefits, Limitations, and Tradeoffs

Published: 27 March 2023

Volume 49, pages 1–25, (2023)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

274 Accesses
1 Citation
Explore all metrics

Abstract

Real-time analytics is a relatively new branch of analytics. A common definition of real-time analytics is that it consists in analyzing data as quickly as possible over the most recent data possible. This defines the essence of the fundamental needs of users, but in no way is a specific requirement for the corresponding software systems due to the vagueness of the definition. As a result, different manufacturers of analytical data-management systems and researchers classify real-time analytics systems as extremely different systems, which differ in architecture, functionality, and even timing. The purpose of this article is to analyze the different approaches to providing real-time analytics, their advantages and disadvantages, and the tradeoffs that both designers and users of the systems inevitably have to make.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

Inmon, W.H., Building the Data Warehouse, John Wiley & Sons, 1992.
Google Scholar
Kimball, R., The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses, Wiley, 1996.
Google Scholar
Information Technology. Gartner Glossary. Real-Time Analytics. https://www.gartner.com/en/information-technology/glossary/real-time-analytics. Accessed 06.16.2021.
Kejariwal, A., Kulkarni, S., and Ramasamy, K., Real Time Analytics: Algorithms and Systems. Extended Version of VLDB’15 Tutorial Proposal, 2017. arXiv:1708.02621
Milosevic, Z., Chen, W., Berry, A., and Rabhi, F.A., Real-time analytics, in Big Data: Principles and Paradigms, Morgan Kaufmann, 2016, pp. 39–61.
Google Scholar
Özcan, F., Tian, Y., and Tözün, P., Hybrid transactional/analytical processing: a survey, Proc. ACM Int. Conf. on Management of Data, Chicago, 2017, pp. 1771–1775.
Kuznetsov, S.D., Velikhov, P.E., and Qiang Fu, Real-time analytics, hybrid transactional/analytical processing, in-memory data management, and non-volatile memory, Proc. Ivannikov ISPRAS Open Conf., 2020, pp. 78–90.
Henzinger, M.R., Raghavan, P., and Rajagopalan, S., Computing on data streams, SRC Technical Note, May 26, 1998, no. 1998-11.
The “Stream Team” Page. http://infolab.stanford.edu/sdt/. Accessed 07.07.2021.
Special issue on data stream processing, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1.
Zdonik, S., Stonebraker, M., et al., The Aurora and Medusa projects, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1, pp. 3–10.
Google Scholar
Krishnamurthy, S., Chandrasekaran, S., et al., TelegraphCQ: an architectural status report, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1, pp. 11–18.
Google Scholar
Arasu, A., Babcock, B., et al., STREAM: the Stanford Stream Data Manager, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1, pp. 19–26.
Google Scholar
Terry, D., Goldberg, D., Nichols, D., and Oki, B., Continuous queries over append-only databases, ACM SIGMOD Record, 1992, vol. 21, issue 2, pp. 321–330.
Article Google Scholar
Chen, J., DeWitt, D.J., Tian, F., and Wang, Y., NiagaraCQ: a scalable continuous query system for Internet databases, ACM SIGMOD Record, 2000, vol. 29, issue 2, pp. 379–390.
Article Google Scholar
Chandrasekaran, S., Cooper, O., et al., TelegraphCQ: continuous dataflow processing for an uncertain world, Proc. 2003 CIDR Conf., Monterey, 2003.
Gehrke, J., Korn, F., and Srivastava, D., On computing correlated aggregates over continual data streams, Proc. ACM SIGMOD Int. Conf. on Management of
Arasu, A., Babcock, B., et al., STREAM: The Stanford Data Stream Management System, Technical Report, Stanford InfoLabData, Santa Barbara, 2001, pp. 13–24., 2004. Later appeared as a chapter in Data Stream Management. Processing High-Speed Data Streams, Springer, 2016, pp. 317–336.
Arasu, A., Babu, S., and Widom, J., CQL: a Language for Continuous Queries over Streams and Relations, Berlin, Heidelberg: Springer, 2003.
Google Scholar
Abadi, D.J., Carney, D., et al., Aurora: a new model and architecture for data stream management, Int. J. Very Large Data Bases, 2003, vol. 12, no. 2, pp. 120–139.
Article Google Scholar
Çetintemel, U. and Abadi, D., The Aurora and Borealis stream processing engines, in Data Stream Management. Processing High-Speed Data Streams, Springer, 2016, pp. 337–359.
Google Scholar
Abadi, D.J., Ahmad, Y., et al., The design of the Borealis stream processing engine, Proc. CIDR Conf., Asilomar, CA, 2005, pp. 277–289.
TIBCO StreamBase. https://www.tibco.com/sites/tibco/files/resources/DS-TIBCO-StreamBase-final.pdf. Accessed 07.14.2021.
StreamSQL Guide. https://docs.tibco.com/pub/sb-lv/2.1.8/doc/html/streamsql/index.html. Accessed 07.14.2021.
Jain, N., Mishra, S., et al., Towards a streaming SQL standard, Proc. VLDB Endowment, 2008, vol. 1, issue 2, pp 1379–1390.
Stonebraker, M., Çetintemel, U., and Zdonik, S., The 8 requirements of real-time stream processing, ACM SIGMOD Record, 2005, vol. 34, issue 4, pp. 42–47.
Article Google Scholar
Geisler, S., Data stream management systems, in Data Exchange, Integration, and Streams, Dagstuhl Follow-Ups, 2013, vol. 5, pp. 275–304.
Google Scholar
Special issue on next-generation stream processing, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4.
Kleppmann, M. and Kreps, J., Kafka, Samza and the Unix philosophy of distributed data, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4, pp. 4–14.
Google Scholar
Carbone, P., Ewen, S., and Flink, A., Stream and batch processing in a single engine, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4, pp. 28–38.
Google Scholar
Schneider, S., Gedik, B., and Hirzel, M., Language runtime and optimizations in IBM streams, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4, pp. 61–72.
Google Scholar
Witkowski, A., Bellamkonda, S., et al., Continuous queries in Oracle, Proc. 33rd Int. Conf. on Very Large Data Bases, Vienna, 2007, pp. 1173–1184.
Oracle Fusion Middleware Understanding Stream Analytics. https://docs.oracle.com/en/middleware/fusion-middleware/osa/18.1/understanding-stream-analytics/understanding-oracle-stream-analytics.pdf. Accessed 07.16.2021.
Vengal, T., What is Oracle stream analytics?. https://blogs.oracle.com/dataintegration/what-is-oracle-stream-analytics. Accessed 07.16.2021.
IBM, Streams. https://www.ibm.com/cloud/streaming-analytics. Accessed 07.16.2021.
Biem, A., Bouillet, E., et al., IBM InfoSphere streams for scalable, real-time, intelligent transportation services, Proc. ACM SIGMOD Int. Conf. on Management of Data, Indianapolis, 2010, pp. 1093–1104.
Hirzel, M., Andrade, H., et al., IBM streams processing language: analyzing BigData in motion, IBM J. Res. Develop., 2013, vol. 57, no. 3/4.
Ali, M., Chandramouli, B., et al., Spatio-temporal stream processing in microsoft StreamInsight, IEEE Bull. Tech. Comm. Data Eng., 2010, vol. 33, no. 2, pp. 69–74.
Google Scholar
Ali, M., Chandramouli, B., et al., The extensibility framework in Microsoft StreamInsight, Proc. 27th IEEE Int. Conf. on Data Engineering, Hannover, 2011, pp. 1242–1253.
Pierry, R.,Streaminsight – master large data streams with Microsoft StreamInsight, MSDN Mag., 2011, vol. 26, no. 06.
What is Microsoft StreamInsight?. https://azurecloudai.blog/2013/01/30/what-is-microsoft-streaminsight/. Accessed 07.16.2021.
Welcome to Azure stream analytics. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-introduction. Accessed 07.16.2021.
Data Engineering Streaming. https://www.informatica.com/products/big-data/big-data-streaming.html. Accessed 07.16.2021.
SAS’s Event Stream Processing. https://www.sas.com/en_us/software/event-stream-processing.html. Accessed 07.16.2021.
Apache Kafka. https://kafka.apache.org/. Accessed 07.16.2021.
Apache Samza. http://samza.apache.org/. Accessed 07.16.2021.
Apache Kafka Architecture – Kafka Component Overview. https://www.instaclustr.com/apache-kafka-architecture/#. Accessed 07.16.2021.
Apache ZooKeeper. https://zookeeper.apache.org/. Accessed 07.16.2021.
Apache Hadoop YARN. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 07.16.2021.
Anand, R., What is Apache Samza?. https://www.quora.com/What-is-Apache-Samza-1. Accessed 07.16.2021.
What is Apache Flink? – Architecture. https://flink.apache.org/flink-architecture.html. Accessed 07.16.2021.
Spark Streaming Programming Guide. https://spark.apache.org/docs/latest/streaming-programming-guide.html. Accessed 07.16.2021.
Spark API Documentation. https://spark.apache.org/docs/2.4.0/api.html. Accessed 07.16.2021.
BigQuery. https://cloud.google.com/bigquery. Accessed 07.17.2021.
A Deep Dive into Google BigQuery Architecture. https://panoply.io/data-warehouse-guide/bigquery-architecture/. Accessed 07.17.2021.
Melnik, S., Gubarev, A., et al., Dremel: interactive analysis of web-scale datasets, Proc. VLDB Endowment, 2010, vol. 3, no. 1, pp. 330–339.
Afrati, F.N., Delorey, D., et al., Storing and querying tree structured records in Dremel, Proc. VLDB Endowment, 2014, vol. 7, no. 11, pp. 1131–1142.
Pasumansky, M., Inside Capacitor, BigQuery’s next-generation columnar storage format. https://cloud.google.com/blog/products/bigquery/inside-capacitor-bigquerys-next-generation-columnar-storage-format. Accessed 07.17.2021.
Serenyi, D., Colossus under the hood: a peek into Google’s scalable storage system. https://cloud.google.com/blog/products/storage-data-transfer/a-peek-behind-colossus-googles-file-system. Accessed 07.17.2021.
Verma, A., Pedrosa, L., et al., Large-scale cluster management at Google with Borg, Proc. 10th European Conf. on Computer Systems, Bordeaux, 2015, pp. 1–17.
Singh, A., Ong, J., et al., Jupiter rising: a decade of clos topologies and centralized control in Google’s datacenter network, in ACM SIGCOMM Computer Communication Review, New York: Association for Computing Machinery, 2015, pp. 183–197.
Google Scholar
Amazon Redshift and PostgreSQL. https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html. Accessed 07.17.2021.
Data Warehouse System Architecture. https://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.html. Accessed 07.17.2021.
Gupta, A., Agarwal, D., et al., Amazon redshift and the case for simpler data warehouses, Proc. ACM SIGMOD Int. Conf. on Management of Data, Melbourne, 2015, pp. 1917–1923.
The Microsoft Modern Data Warehouse. White Paper, 2016. http://download.microsoft.com/download/C/2/D/C2D2D5FA-768A-49AD-8957-1A434C6C8126/Microsoft_Modern_Data_Warehouse_white_paper.pdf. Accessed 07.18.2021.
Azure Synapse SQL Architecture. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/overview-architecture. Accessed 07.18.2021.
What is Azure Synapse Analytics? https://docs.microsoft.com/en-us/azure/synapse-analytics/overview-what-is. Accessed 07.18.2021.
Use Transactions in a SQL Pool in Azure Synapse. https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-develop-transactions.md. Accessed 07.18.2021.
Motivala, A. and Yan, J., The Snowflake Elastic Data Warehouse, SIGMOD 2016 and beyond. https://15721.courses.cs.cmu.edu/spring2018/slides/25-snowflake.pdf. Accessed 07.18.2021.
Dageville, B., Cruanes, T., et al., The snowflake elastic data warehouse, Proc. Int. Conf. on Management of Data, San Francisco, 2016, pp. 215–226.
Ailamaki, A., DeWitt, D.J., et al., Weaving relations for Cache performance, Proc. 27th Int. Conf. on Very Large Data Bases, Roma, Sept. 2001, pp. 169–180.
Karger, D., Lehman, E., et al., Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web, Proc. 29th Annu. ACM Symp. on Theory of Computing, El Paso, TX, 1997, pp. 654–663.
Graefe, G., The cascades framework for query optimization, IEEE Bull. Tech. Comm. Data Eng., 1995, vol. 18, no. 3, pp. 19–29.
Google Scholar
Faerber, F., Kemper, A., et al., Main memory database systems, Found. Trends Databases, 2016, vol. 8, no. 1–2, pp. 1–130.
Article Google Scholar
Transier, F. and Sanders, P., Engineering basic algorithms of an in-memory text search engine, ACM Trans. Inf. Syst., 2010, art. no. 2.
Ross, J.A., SAP NetWeaver BI Accelerator, SAP PRESS, 2008.
Google Scholar
Cha, S.K. and Song, C., P*TIME: highly scalable oltp dbms for managing update-intensive stream workload, Proc. 30th VLDB Conf., Toronto, 2004, pp. 1033–1044.
Bögelsack, A., Gradl, S., Mayer, M., and Krcmar, H., SAP MaxDB Administration, SAP PRESS, 2009.
Google Scholar
Faerber, F., May, N., et al., The SAP HANA database – an architecture overview, IEEE Bull. Tech. Comm. Data Eng., 2012, vol. 35, no. 1, pp. 28–33.
Google Scholar
Larson, P.-Å., Clinciu, C., et al., SQL server column store indexes, Proc. ACM SIGMOD Int. Conf. on Management of Data, Athens, 2011, pp. 1177–1184.
Larson, P.-Å., Zwilling, M., and Farlee, K., The Hekaton memory-optimized OLTP engine, Bull. Tech. Comm. Data Eng., 2013, vol. 36, no. 2, pp. 34–40.
Google Scholar
Larson, P.-Å., Birka, A., et al., Real-time analytical processing with SQL server, Proc. VLDB Endowment, 2015, vol. 8, no. 12, pp. 1740–1751.
Eldawy, A., Levandoski, J., and Larson, P.-Å., Trekking through Siberia: managing cold data in a memory-optimized database, Proc. VLDB Endowment, 2014, vol. 7, no. 11, pp. 931–942.
Lahiri, T., Neimat, M.-A., and Folkman, S., Oracle timesten: an in-memory database for enterprise applications, Bull. Tech. Comm. Data Eng., 2013, vol. 36, no. 2, pp. 6–13.
Google Scholar
Listgarten, S. and Neimat, M.-A., Modelling costs for a MM-DBMS, Proc. Int. Workshop on Real-Time Databases, Issues and Applications (RTDB), Newport Beach, CA, 1996, pp. 72–78.
Lahiri, T., Chavan, S., et al., Oracle database in-memory: a dual format in-memory database, Proc. 31st IEEE Int. Conf. on Data Engineering, Seoul, 2015, pp. 1253–1258.
Mukherjee, N., Chavan, S., et al., Distributed architecture of oracle database in-memory, Proc. VLDB Endowment, 2015, vol. 8, no. 12, pp. 1630–1641.
Chavan, S. and Goindi, G., Oracle Database In-Memory on Exadata: a Potent Combination. Oracle OpenWorld 2018. https://www.oracle.com/technetwork/database/exadata/pro4016-exadataandinmemory-5187037.pdf. Accessed 07.18.2021.
Barber, R., Bendel, P., et al., Business analytics in (a) blink, Bull. IEEE Comput. Soc. Tech. Comm. Data Eng., 2012, vol. 35, no. 1, pp. 9–14.
Google Scholar
IBM Informix Warehouse Accelerator. Technical White Paper. https://www.iiug.org/library/ids_12/IWA%20White%20Paper-2013-03-21.pdf. Accessed 07.18.2021.
Raman, V., Attaluri, G., et al., DB2 with BLU acceleration: so much more than just a column store, Proc. VLDB Endowment, 2013, vol. 6, no. 11, pp. 1080–1091.
Chen, W.-J., Bläser, B., et al., Architecting and Deploying DB2 with BLU Acceleration, IBM Redbooks, 2014.
Google Scholar
Faster Analytics with HyPer. https://www.tableau.com/products/new-features/hyper. Accessed 07.18.2021.
Kemper, A. and Neumann, T., HyPer – hybrid OLTP&OLAP high performance database system, Technical Report, Munich: Technical Univ., 2010, no. TUM-I1010.
Kemper, A., Neumann, T., et al., Transaction processing in the hybrid OLTP&OLAP main-memory database system HyPer, Bull. Tech. Comm. Data Eng., 2013, vol. 36, no. 2, pp. 41–47.
Google Scholar
Albutiu, M.-C., Kemper, A., and Neumann, T., Massively parallel sort-merge joins in main memory multi-core database systems, Proc. VLDB Endowment, 2012, vol. 5, no. 10, pp. 1064–1075.
Neumann, T., Mühlbauer, T., and Kemper, A., Fast serializable multi-version concurrency control for main-memory database systems, Proc. ACM SIGMOD Int. Conf. on Management of Data, Melbourne, 2015, pp. 677–689.
Andrei, M., Lemke, C., et al., SAP HANA adoption of non-volatile memory, Proc. VLDB Endowment, 2017, vol. 10, no. 12, pp. 1754–1765.
Dorr, B., How It Works (It Just Runs Faster): Non-Volatile Memory SQL Server Tail of Log Caching on NVDIMM. https://docs.microsoft.com/ru-ru/archive/blogs/bobsql/how-it-works-it-just-runs-faster-non-volatile-memory-sql-server-tail-of-log-caching-on-nvdimm. Accessed 07.18.2021.
Oracle Database 20c. Database Administrator’s Guide. Using Persistent Memory Database. https://docs.oracle.com/en/database/oracle/oracle-database/. Accessed 07.18.2021.
Arulraj, J. and Pavlo, A., Non-Volatile Memory Database Management Systems. Synthesis Lectures on Data Management, Morgan & Claypool Publ., 2019.
Google Scholar
Oukid, I., Architectural Principles for Database Systems on Storage-Class Memory, Bonn: Gesellschaft fur Informatik, 2019, pp. 477–486.
Google Scholar

Download references

ACKNOWLEDGMENTS

This article is based on the materials of a report at the seventh international conference “Actual Problems of System and Software Engineering” (APSSE 2021).

Author information

Authors and Affiliations

Ivannikov Institute for System Programming of the Russian Academy of Sciences, 109004, Moscow, Russia
S. D. Kuznetsov
Moscow State University, 119991, Moscow, Russia
S. D. Kuznetsov
Moscow Institute of Physics and Technology (State University), 141700, Dolgoprudny, Moscow oblast, Russia
S. D. Kuznetsov
National Research University, Higher School of Economics, 101978, Moscow, Russia
S. D. Kuznetsov
TigerGraph, 94065, Redwood City, CA, United States
P. E. Velikhov
Huawei Technologies Co., Ltd., 121614, Moscow, Russia
Q. Fu

Authors

S. D. Kuznetsov
View author publications
You can also search for this author in PubMed Google Scholar
P. E. Velikhov
View author publications
You can also search for this author in PubMed Google Scholar
Q. Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to S. D. Kuznetsov, P. E. Velikhov or Q. Fu.

Ethics declarations

The authors declare that they have no conflicts of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuznetsov, S.D., Velikhov, P.E. & Fu, Q. Real-Time Analytics: Benefits, Limitations, and Tradeoffs. Program Comput Soft 49, 1–25 (2023). https://doi.org/10.1134/S036176882301005X

Download citation

Received: 10 September 2022
Revised: 19 September 2022
Accepted: 24 September 2022
Published: 27 March 2023
Issue Date: February 2023
DOI: https://doi.org/10.1134/S036176882301005X

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Analytics: Benefits, Limitations, and Tradeoffs

Abstract

Access this article

REFERENCES

ACKNOWLEDGMENTS

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation