Skip to main content
Log in

Next-generation heartbeat classification with a column-store DBMS and UDFs

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

We live in a digital world where data is being generated at an always increasing rate. This creates the need to develop new technology not only for storing these vast amounts of data, but also for manipulating and analyzing it. It is through this data analysis that we can make decisions and generate knowledge. The medical field is no exception and healthcare and biomedical data must be stored and analyzed to gain insights that help in disease prevention and diagnostics. An example of this kind of data are electrocardiograms (ECG), whose careful analysis has proven to be of significant help to diagnose cardiovascular abnormalities. ECG recording devices can produce a very large amount of data in a short period of time. Usually abstracted as unstructured data, ECG digital signals have traditionally been stored and analyzed using file-based solutions for storage, and ad-hoc programs for data processing. We favor the idea that ECG signals can be abstracted as sets of tuples and stored in database relations. In this paper we present a proposal to store, manage, and analyze ECG data in a column-store database management system (DBMS). We provide extensive empirical evidence showing that incorporating complex analytical tasks such as ECG transformation and classification into a DBMS is not only feasible but also efficient and scalable. For this, we rely on the Structured Query Language provided by relational DBMSs, and the implementation of user defined functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Comprehensive R Archive Network.

References

  • Berkaya, S.K., Uysal, A.K., Gunal, E.S., Ergin, S., Gunal, S., Gulmezoglu, M.B. (2018). A Survey on ECG Analysis. Biomedical Signal Processing and Control, 43, 216–235. https://doi.org/10.1016/j.bspc.2018.03.003.

    Article  Google Scholar 

  • Casas, M.M., Avitia, R.L., Reyna, M.A., Cárdenas, A. (2016). Evaluation of three machine learning algorithms as classifiers of premature ventricular contractions on ECG beats. In: Proceedings of the global medical engineering physics Exchanges/Pan American health care exchanges. Madrid, Spain, pp 1–6. https://doi.org/10.1109/GMEPE-PAHCE.2016.7504615.

  • Castro-Lopez, O., & Vega-Lopez, I. (2018a). glm.deploy: ’C’ and ’Java’ Source Code Generator for Fitted GLM Objects. https://CRAN.R-project.org/package=glm.deploy, r package version 1.0.4.

  • Castro-Lopez, O., & Vega-Lopez, I.F. (2018b). ML2ESC: A source code generator to embed machine learning models in production environments. In: Proceedings of the international conference on data science, CSREA, Las Vegas, USA, vol 14, pp. 70–73.

  • Chandra, S., & Motwani, D. (2016). An approach to enhance the performance of Hadoop MapReduce framework for big data. In: International conference on micro-electronics and telecommunication engineering, pp 178–182. https://doi.org/10.1109/ICMETE.2016.64.

  • Cottin, F., Leprêtre, P M, Lopes, P., Papelier, Y., Médigue, C, Billat, V. (2006). Assessment of ventilatory thresholds from heart rate variability in well-trained subjects during cycling. International journal of sports medicine, 27(12), 959–967.

    Article  Google Scholar 

  • Cuen-Téllez, O. (2016). A model for signal data management and processing. PhD thesis: Universidad Autónoma de Sinaloa.

    Google Scholar 

  • Deserno, T.M., & Marx, N. (2016). Computational electrocardiography: Revisiting Holter ECG monitoring. Methods of Information in Medicine, 55(4), 305–311. https://doi.org/10.3414/ME15-05-0009.

    Article  Google Scholar 

  • Gadepally, V., Chen, P., Duggan, J., Elmore, A., Haynes, B., Kepner, J., Madden, S., Mattson, T., Stonebraker, M. (2016). The BigDAWG polystore system and architecture. In Proceedings of the IEEE high performance extreme computing conference. https://doi.org/10.1109/HPEC.2016.7761636 (pp. 1–6). USA: Waltham.

  • Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.K., Stanley, H.E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23), e215—e220. https://doi.org/10.1161/01.CIR.101.23.e215.

    Article  Google Scholar 

  • Guazzelli, A., Zeller, M., Lin, W.C., Williams, G., et al. (2009). PMML: An open standard for sharing models. The R Journal, 1(1), 60–65.

    Article  Google Scholar 

  • Hurst, J.W. (1998). Naming of the waves in the ECG, with a brief account of their genesis. Circulation, 98(18), 1937–1942. https://doi.org/10.1161/01.CIR.98.18.1937.

    Article  Google Scholar 

  • Kim, T.W., Park, K.H., Yi, S.H., Kim, H.C. (2014). A big data framework for u-healthcare systems utilizing vital signs. In: Proceedings of the international symposium on computer, consumer and control. Taichung, Taiwan, pp. 494–497. https://doi.org/10.1109/IS3C.2014.135.

  • Kligfield, P., Gettes, L.S., Bailey, J.J., Childers, R., Deal, B.J., Hancock, E.W., van Herpen, G., Kors, J.A., Macfarlane, P., Mirvis, D.M., Pahlm, O., Rautaharju, P., Wagner, G.S. (2007). Recommendations for the standardization and interpretation of the electrocardiogram. Journal of the American College of Cardiology, 49(10), 1109–1127. https://doi.org/10.1016/j.jacc.2007.01.024.

    Article  Google Scholar 

  • Kumar, P.M., & Gandhi, U.D. (2018). A Novel Three-tier Internet of Things Architecture with Machine Learning Algorithm for Early Detection of Heart Diseases. Computers & Electrical Engineering, 65, 222–235. https://doi.org/10.1016/j.compeleceng.2017.09.001.

    Article  Google Scholar 

  • Kumar, A., Boehm, M., Yang, J. (2017). Data management in machine learning: Challenges, techniques, and systems. In Proceedings of the international conference on management of data. https://doi.org/10.1145/3035918.3054775 (pp. 1717–1722). New York: ACM.

  • Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., Bear, C. (2012). The Vertica Analytic Database: C-store 7 Years Later. VLDB Endowment, 5(12), 1790–1801. https://doi.org/10.14778/2367502.2367518.

    Article  Google Scholar 

  • Le, M.K., Chang, H.T., Chang, Y.M., Hu, Y.H., Chen, H.T. (2016). An efficient multilevel healthy cloud system using spark for smart clothes. In: Proceedings of the international computer symposium. Chiayi, Taiwan, pp. 182–186. https://doi.org/10.1109/ICS.2016.0044.

  • Li, Y., Guo, L., Wu, C., Lee, C., Guo, Y. (2014). Building a cloud-based platform for personal health sensor data management. In: Proceedings of the international conference on biomedical and health informatics. Valencia, Spain, pp. 223–226. https://doi.org/10.1109/BHI.2014.6864344.

  • Luo, K., Li, J., Wang, Z., Cuschieri, A. (2017). Patient-specific deep architectural model for ECG classification. Journal of Healthcare Engineering, 4108, 720. https://doi.org/10.1155/2017/4108720.

    Google Scholar 

  • Luz, E.J.S., Schwartz, W.R., Cámara-Chávez, G, Menotti, D. (2016). ECG-based heartbeat classification for arrhythmia detection: A survey. Computer Methods and Programs in Biomedicine, 127, 144–164. https://doi.org/10.1016/j.cmpb.2015.12.008.

    Article  Google Scholar 

  • Mahmoodabadi, S.Z., Ahmadian, A., Abolhasani, M.D., Eslami, M., Bidgoli, J.H. (2005). ECG feature extraction based on multiresolution wavelet transform. In: Proceedings of the IEEE engineering in medicine and biology. Shanghai, China, pp. 3902–3905. https://doi.org/10.1109/IEMBS.2005.1615314.

  • Martis, R.J., Acharya, U.R., Min, L.C. (2013). ECG beat classification using PCA, LDA, ICA and discrete wavelet transform. Biomedical Signal Processing and Control, 8(5), 437–448. https://doi.org/10.1016/j.bspc.2013.01.005.

    Article  Google Scholar 

  • Mateo, J., Torres, A., Aparicio, A., Santos, J. (2016). An efficient method for ecg beat classification and correction of ectopic beats. Computers and Electrical Engineering, 53(C), 219–229. https://doi.org/10.1016/j.compeleceng.2015.12.015.

    Article  Google Scholar 

  • McSharry, P.E., Clifford, G.D., Tarassenko, L., Smith, L.A. (2003). A dynamical model for generating synthetic electrocardiogram signals. IEEE Transactions on Biomedical Engineering, 50(3), 289–294. https://doi.org/10.1109/TBME.2003.808805.

    Article  Google Scholar 

  • Mohammed, E.A., Far, B.H., Naugler, C. (2014). Applications of the mapreduce programming framework to clinical big data analysis: Current landscape and future trends. BioData Mining, 7(1), 22. https://doi.org/10.1186/1756-0381-7-22.

    Article  Google Scholar 

  • Moody, G.B., & Mark, R.G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50. https://doi.org/10.1109/51.932724.

    Article  Google Scholar 

  • Mozaffarian, D., Benjamin, E., Go, A., Arnett, D., Blaha, M., Cushman, M., De Ferranti, S., Després, J, Fullerton, H., Howard, V., Huffman, M., Judd, S., Kissela, B., Lackland, D., Lichtman, J., Lisabeth, L., Liu, S., Mackey, R., Matchar, D., McGuire, D., Mohler, E., Moy, C., Muntner, P., Mussolino, M., Nasir, K., Neumar, R., Nichol, G., Palaniappan, L., Pandey, D., Reeves, M., Rodriguez, C., Sorlie, P., Stein, J., Towfighi, A., Turan, T., Virani, S., Willey, J., Woo, D., Yeh, R., Turner, M. (2015). Executive summary: Heart disease and stroke statistics-2015 update: A report from the american heart association. Circulation, 131(4), 434–441. https://doi.org/10.1161/CIR.0000000000000157.

    Article  Google Scholar 

  • Ordonez, C. (2007). Building statistical models and scoring with UDFs. In Proceedings of the ACM SIGMOD international conference on management of data. https://doi.org/10.1145/1247480.1247599 (pp. 1005–1016). New York: ACM.

  • Ordonez, C. (2010). Statistical model computation with UDFs. IEEE Transactions on Knowledge and Data Engineering, 22(12), 1752–1765. https://doi.org/10.1109/TKDE.2010.44.

    Article  Google Scholar 

  • Ordonez, C., & García-García, J. (2016). Managing big data analytics workflows with a database system. In Proceedings of the international symposium on cluster, cloud and grid computing. https://doi.org/10.1109/CCGrid.2016.63 (pp. 649–655). Cartagena: IEEE.

  • Pan, J., & Tompkins, W.J. (1985). A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering BME, 32(3), 230–236. https://doi.org/10.1109/TBME.1985.325532.

    Article  Google Scholar 

  • Pandey, S., Voorsluys, W., Niu, S., Khandoker, A., Buyya, R. (2012). An autonomic cloud environment for hosting ecg data analysis services. Future Generation Computer Systems, 28(1), 147–154. https://doi.org/10.1016/j.future.2011.04.022.

    Article  Google Scholar 

  • Petrutiu, S., Sahakian, A.V., Swiryn, S. (2007). Abrupt changes in fibrillatory wave characteristics at the termination of paroxysmal atrial fibrillation in humans. Europace, 9(7), 466–470. https://doi.org/10.1093/europace/eum096.

    Article  Google Scholar 

  • Ramakrishnan, R., & Gehrke, J. (2000). Database management systems. McGraw Hill.

  • Sahoo, S.S., Jayapandian, C., Garg, G., Kaffashi, F., Chung, S., Bozorgi, A., Chen, C.H., Loparo, K., Lhatoo, S.D., Zhang, G.Q. (2014). Heart beats in the cloud: Distributed analysis of electrophysiological ’Big Data’ using cloud computing for epilepsy clinical research. Journal of the American Medical Informatics Association, 21(2), 263–271. https://doi.org/10.1136/amiajnl-2013-002156.

    Article  Google Scholar 

  • Saktheeswari, R., & Adalarasu, K. (2017). Survey on signal processing techniques for diagnoising cardiovascular diseases. In: Proceedings of the international conference on innovations in information, embedded and communication systems. Coimbatore, India, pp. 1–4. https://doi.org/10.1109/ICIIECS.2017.8276116.

  • Shvachko, K., Kuang, H., Radia, S., Chansler, R. (2010). The Hadoop distributed file system. In Proceedings of the symposium on mass storage systems and technologies. https://doi.org/10.1109/MSST.2010.5496972 (pp. 1–10). Washington: IEEE Computer Society.

  • Trigo, J.D., Alesanco, Á, Martínez, I., García, J. (2012). A review on digital ecg formats and the relationships between them. IEEE Transactions on Information Technology in Biomedicine, 16(3), 432–444. https://doi.org/10.1109/TITB.2011.2176955.

    Article  Google Scholar 

  • Vincent, A.E., & Sreekumar, K. (2017). A survey on approaches for ECG signal analysis with focus to feature extraction and classification. In Proceedings of the international conference on inventive communication and computational technologies. https://doi.org/10.1109/ICICCT.2017.7975175 (pp. 140–144). India: Coimbatore.

  • Wang, L., Chen, D., Ranjan, R., Khan, S.U., KolOdziej, J., Wang, J. (2012). Parallel processing of massive eeg data with MapReduce. In: Proceedings of the international conference on parallel and distributed systems, pp. 164–171. https://doi.org/10.1109/ICPADS.2012.32.

  • Wee, K.C., & Zahid, M.S.M. (2015). Auto-tuned Hadoop MapReduce for ECG analysis. In: Proceedings of the IEEE student conference on research and development. Kuala Lumpur, Malaysia, pp. 329–334. https://doi.org/10.1109/SCORED.2015.7449350.

  • Woodbridge, D.M., Wilson, A.T., Rintoul, M.D., Goldstein, R.H. (2015). Time series discord detection in medical data using a parallel relational database. In: Proceedings of the international conference on bioinformatics and biomedicine. Washington, DC, USA, pp. 1420–1426. https://doi.org/10.1109/BIBM.2015.7359885.

  • Zhang, Y., Ordonez, C., Cabrera, W. (2016). Big data analytics integrating a parallel columnar DBMS and the R language. In: Proceedings of the international symposium on cluster, cloud and grid computing, pp. 627–630. https://doi.org/10.1109/CCGrid.2016.94.

  • Zhou, B., Ma, Q., Song, Y., Bian, C. (2016). Cloud-based dynamic electrocardiogram monitoring and analysis system, IEEE, Datong. https://doi.org/10.1109/CISP-BMEI.2016.7852997.

Download references

Acknowledgements

The authors would like to acknowledge the funding provided for this research by the Mexican Council of Science and Technology (CONACYT) and the Autonomous University of Sinaloa (UAS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ines F. Vega-Lopez.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Castro-Lopez, O., Lopez-Barron, D.E. & Vega-Lopez, I.F. Next-generation heartbeat classification with a column-store DBMS and UDFs. J Intell Inf Syst 54, 363–390 (2020). https://doi.org/10.1007/s10844-019-00557-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-019-00557-w

Keywords

Navigation