Column Access-aware In-stream Data Cache with Stream Processing Framework

Ma, Kun; Yang, Bo

doi:10.1007/s11265-016-1117-6

Column Access-aware In-stream Data Cache with Stream Processing Framework

Published: 05 March 2016

Volume 86, pages 191–205, (2017)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

466 Accesses
11 Citations
Explore all metrics

Abstract

In recent years, researches focus on addressing the query bottleneck issue of big data, e.g. NoSQL databases, MapReduce and big data processing framework. Although NoSQL databases have many advantages on On-Line Analytical Processing (OLAP), it is a big project to migrate Relational Database Management System (RDBMS) to NoSQL. Therefore, the optimization of RDBMS is still important. In this paper, we construct Column Access-aware In-stream Data Cache (CAIDC) for relational databases, which is an integral part of RDBMS and in-memory cache. Furthermore, a live synchronization approach from physical RDBMS to in-memory data cache using stream processing framework is proposed. On one hand, CAIDC provides low latency while supporting log-based trigger in the presence of updates to maintain data consistency because of stream processing framework. On the other hand, CAIDC translates the frequently accessed data to column-oriented in-memory cache by the column access frequency to ensure heavy hitter queries. Finally, experimental results show that this approach is supporting a wide range of applications with big data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Article Open access 05 June 2020

Antonios Makris, Konstantinos Tserpes, … Dimosthenis Anagnostopoulos

The big data system, components, tools, and technologies: a survey

Article 18 September 2018

T. Ramalingeswara Rao, Pabitra Mitra, … A. Goswami

A survey on transactional stream processing

Article Open access 27 September 2023

Shuhao Zhang, Juan Soto & Volker Markl

References

Ahirrao, S., & Ingle, R. (2013). Scalable transactions in cloud data stores. In 2013 IEEE 3rd international advance computing conference (IACC) (pp. 116–119). IEEE.
Bo, L.C.L. (2010). An improvement on window snapshot differential algorithm. Computer Applications and Software, 4, 047.
Google Scholar
Casters, M., Bouman, R., & Van Dongen, J. (2010). Pentaho Kettle solutions: building open source ETL solutions with Pentaho data integration. Wiley.
Cattell, R. (2011). Scalable sql and nosql data stores. ACM SIGMOD Record, 39(4), 12–27.
Article Google Scholar
Consulting, A. Mongify - move data from sql to mongodb with ease., http://mongify.com/.
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., & Sears, R. (2010). Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on cloud computing (pp. 143–154). ACM.
Das, S., Botev, C., Surlaker, K., Ghosh, B., Varadarajan, B., Nagaraj, S., Zhang, D., Gao, L., Westerman, J., Ganti, P., & et al. (2012). All aboard the databus!: Linkedin’s scalable consistent change data capture platform. In Proceedings of the 3rd ACM symposium on cloud computing (p. 18). ACM.
Dean, J., & Ghemawat, S. (2010). Mapreduce: a flexible data processing tool. Communications of the ACM, 53(1), 72–77.
Dong, F., Ma, K., & Yang, B. (2015). Cache system for frequently updated data in the cloud. WSEAS Transactions on Computers, 14, 163–170.
Google Scholar
Fitzpatrick, B. (2004). Distributed caching with memcached. Linux journal, 2004(124), 5.
Google Scholar
Ghandeharizadeh, S., & Yap, J. (2012). Gumball: a race condition prevention technique for cache augmented sql database management systems. In Proceedings of the 2nd ACM SIGMOD workshop on databases and social networks (pp. 1–6). ACM.
Ghandeharizadeh, S., & Yap, J. (2013). Cache augmented database management systems. In Proceedings of the ACM SIGMOD workshop on databases and social networks (pp. 31–36). ACM.
Gupta, P., Zeldovich, N., & Madden, S. (2011). A trigger-based middleware cache for orms. In Middleware 2011 (pp. 329–349). Springer.
Gupta, P., Zeldovich, N., & Madden, S. (2011). A trigger-based middleware cache for orms. In Middleware 2011 (pp. 329–349). Springer.
Liu, Y., Liu, W., Song, J., & He, H. (2015). An empirical study on implementing highly reliable stream computing systems with private cloud. Ad Hoc Networks.
Ma, K., & Dong, F. (2015). Live data migration approach from relational tables to schema-free collections with mapreduce. International Journal of Services Technology and Management, 21(4/5/6), 318–335.
Ma, K., & Yang, B. (2015). Access-aware in-memory data cache middleware for relational databases. In Proceedings of 17th IEEE international conference on high performance computing and communications (pp. 1506–1511).
Ma, K., & Yang, B. (2015). Log-based change data capture from schema-free document stores using mapreduce. In Proceedings of 2015 international conference of cloud computing technologies and applications (pp. 1–6).
Mi, P., & Scacchi, W. (1992). Process integration in case environments. IEEE Software, 9(2), 45–53.
Article Google Scholar
Plattner, H. (2009). A common database approach for oltp and olap using an in-memory column database. In Proceedings of the 2009 ACM SIGMOD international conference on management of data (pp. 1–2). ACM.
Ports, D.R., Clements, A.T., Zhang, I., Madden, S., & Liskov, B. (2010). Transactional consistency and automatic management in an application data cache. In OSDI, (Vol. 10 pp. 1–15).
Qin, L., Yu, J.X., & Chang, L. (2009). Keyword search in databases: the power of rdbms. In Proceedings of the 2009 ACM SIGMOD international conference on management of data (pp. 681–694). ACM.
Schwartz, B., Zaitsev, P., & Tkachenko, V. (2012). High performance MySQL: optimization, backups, and replication. O’Reilly Media Inc.
Stonebraker, M. (2010). Sql databases v. nosql databases. Communications of the ACM, 53(4), 10–11.
Article Google Scholar
Vassiliadis, P. (2009). A survey of extract-transform-load technology. International Journal of Data Warehousing and Mining, 5(3), 1–27.
Article Google Scholar
Xhafa, F., Naranjo, V., & Caballé, S. (2015). Processing and analytics of big data streams with yahoo! s4. In Proceedings of 2015 IEEE 29th international conference on advanced information networking and applications (pp. 263–270).
Zhou, H., Yang, D., & Xu, Y. (2012). An etl strategy for real-time data warehouse. In Practical applications of intelligent systems (pp. 329–336). Springer.

Download references

Acknowledgments

This work was supported by the Doctoral Fund of University of Jinan (XBS1237), the Shandong Provincial Natural Science Foundation (ZR2014FQ029), the Shandong Provincial Key R&D Program (2015GGX106007), the Teaching Research Project of University of Jinan (J1344), the National Key Technology R&D Program (2012BAF12B07), and the Open Project Funding of Shandong Provincial Key Laboratory of Software Engineering (No. 2015SE03).

Author information

Authors and Affiliations

Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, 250022, China
Kun Ma & Bo Yang

Authors

Kun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, K., Yang, B. Column Access-aware In-stream Data Cache with Stream Processing Framework. J Sign Process Syst 86, 191–205 (2017). https://doi.org/10.1007/s11265-016-1117-6

Download citation

Received: 10 October 2015
Revised: 17 December 2015
Accepted: 17 February 2016
Published: 05 March 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11265-016-1117-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Column Access-aware In-stream Data Cache with Stream Processing Framework

Abstract

Access this article

Similar content being viewed by others

MongoDB Vs PostgreSQL: A comparative study on performance aspects

The big data system, components, tools, and technologies: a survey

A survey on transactional stream processing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

MongoDB Vs PostgreSQL: A comparative study on performance aspects

The big data system, components, tools, and technologies: a survey

A survey on transactional stream processing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation