Abstract
We live in a digital world where data is being generated at an always increasing rate. This creates the need to develop new technology not only for storing these vast amounts of data, but also for manipulating and analyzing it. It is through this data analysis that we can make decisions and generate knowledge. The medical field is no exception and healthcare and biomedical data must be stored and analyzed to gain insights that help in disease prevention and diagnostics. An example of this kind of data are electrocardiograms (ECG), whose careful analysis has proven to be of significant help to diagnose cardiovascular abnormalities. ECG recording devices can produce a very large amount of data in a short period of time. Usually abstracted as unstructured data, ECG digital signals have traditionally been stored and analyzed using file-based solutions for storage, and ad-hoc programs for data processing. We favor the idea that ECG signals can be abstracted as sets of tuples and stored in database relations. In this paper we present a proposal to store, manage, and analyze ECG data in a column-store database management system (DBMS). We provide extensive empirical evidence showing that incorporating complex analytical tasks such as ECG transformation and classification into a DBMS is not only feasible but also efficient and scalable. For this, we rely on the Structured Query Language provided by relational DBMSs, and the implementation of user defined functions.
Similar content being viewed by others
Notes
Comprehensive R Archive Network.
References
Berkaya, S.K., Uysal, A.K., Gunal, E.S., Ergin, S., Gunal, S., Gulmezoglu, M.B. (2018). A Survey on ECG Analysis. Biomedical Signal Processing and Control, 43, 216–235. https://doi.org/10.1016/j.bspc.2018.03.003.
Casas, M.M., Avitia, R.L., Reyna, M.A., Cárdenas, A. (2016). Evaluation of three machine learning algorithms as classifiers of premature ventricular contractions on ECG beats. In: Proceedings of the global medical engineering physics Exchanges/Pan American health care exchanges. Madrid, Spain, pp 1–6. https://doi.org/10.1109/GMEPE-PAHCE.2016.7504615.
Castro-Lopez, O., & Vega-Lopez, I. (2018a). glm.deploy: ’C’ and ’Java’ Source Code Generator for Fitted GLM Objects. https://CRAN.R-project.org/package=glm.deploy, r package version 1.0.4.
Castro-Lopez, O., & Vega-Lopez, I.F. (2018b). ML2ESC: A source code generator to embed machine learning models in production environments. In: Proceedings of the international conference on data science, CSREA, Las Vegas, USA, vol 14, pp. 70–73.
Chandra, S., & Motwani, D. (2016). An approach to enhance the performance of Hadoop MapReduce framework for big data. In: International conference on micro-electronics and telecommunication engineering, pp 178–182. https://doi.org/10.1109/ICMETE.2016.64.
Cottin, F., Leprêtre, P M, Lopes, P., Papelier, Y., Médigue, C, Billat, V. (2006). Assessment of ventilatory thresholds from heart rate variability in well-trained subjects during cycling. International journal of sports medicine, 27(12), 959–967.
Cuen-Téllez, O. (2016). A model for signal data management and processing. PhD thesis: Universidad Autónoma de Sinaloa.
Deserno, T.M., & Marx, N. (2016). Computational electrocardiography: Revisiting Holter ECG monitoring. Methods of Information in Medicine, 55(4), 305–311. https://doi.org/10.3414/ME15-05-0009.
Gadepally, V., Chen, P., Duggan, J., Elmore, A., Haynes, B., Kepner, J., Madden, S., Mattson, T., Stonebraker, M. (2016). The BigDAWG polystore system and architecture. In Proceedings of the IEEE high performance extreme computing conference. https://doi.org/10.1109/HPEC.2016.7761636 (pp. 1–6). USA: Waltham.
Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.K., Stanley, H.E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23), e215—e220. https://doi.org/10.1161/01.CIR.101.23.e215.
Guazzelli, A., Zeller, M., Lin, W.C., Williams, G., et al. (2009). PMML: An open standard for sharing models. The R Journal, 1(1), 60–65.
Hurst, J.W. (1998). Naming of the waves in the ECG, with a brief account of their genesis. Circulation, 98(18), 1937–1942. https://doi.org/10.1161/01.CIR.98.18.1937.
Kim, T.W., Park, K.H., Yi, S.H., Kim, H.C. (2014). A big data framework for u-healthcare systems utilizing vital signs. In: Proceedings of the international symposium on computer, consumer and control. Taichung, Taiwan, pp. 494–497. https://doi.org/10.1109/IS3C.2014.135.
Kligfield, P., Gettes, L.S., Bailey, J.J., Childers, R., Deal, B.J., Hancock, E.W., van Herpen, G., Kors, J.A., Macfarlane, P., Mirvis, D.M., Pahlm, O., Rautaharju, P., Wagner, G.S. (2007). Recommendations for the standardization and interpretation of the electrocardiogram. Journal of the American College of Cardiology, 49(10), 1109–1127. https://doi.org/10.1016/j.jacc.2007.01.024.
Kumar, P.M., & Gandhi, U.D. (2018). A Novel Three-tier Internet of Things Architecture with Machine Learning Algorithm for Early Detection of Heart Diseases. Computers & Electrical Engineering, 65, 222–235. https://doi.org/10.1016/j.compeleceng.2017.09.001.
Kumar, A., Boehm, M., Yang, J. (2017). Data management in machine learning: Challenges, techniques, and systems. In Proceedings of the international conference on management of data. https://doi.org/10.1145/3035918.3054775 (pp. 1717–1722). New York: ACM.
Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., Bear, C. (2012). The Vertica Analytic Database: C-store 7 Years Later. VLDB Endowment, 5(12), 1790–1801. https://doi.org/10.14778/2367502.2367518.
Le, M.K., Chang, H.T., Chang, Y.M., Hu, Y.H., Chen, H.T. (2016). An efficient multilevel healthy cloud system using spark for smart clothes. In: Proceedings of the international computer symposium. Chiayi, Taiwan, pp. 182–186. https://doi.org/10.1109/ICS.2016.0044.
Li, Y., Guo, L., Wu, C., Lee, C., Guo, Y. (2014). Building a cloud-based platform for personal health sensor data management. In: Proceedings of the international conference on biomedical and health informatics. Valencia, Spain, pp. 223–226. https://doi.org/10.1109/BHI.2014.6864344.
Luo, K., Li, J., Wang, Z., Cuschieri, A. (2017). Patient-specific deep architectural model for ECG classification. Journal of Healthcare Engineering, 4108, 720. https://doi.org/10.1155/2017/4108720.
Luz, E.J.S., Schwartz, W.R., Cámara-Chávez, G, Menotti, D. (2016). ECG-based heartbeat classification for arrhythmia detection: A survey. Computer Methods and Programs in Biomedicine, 127, 144–164. https://doi.org/10.1016/j.cmpb.2015.12.008.
Mahmoodabadi, S.Z., Ahmadian, A., Abolhasani, M.D., Eslami, M., Bidgoli, J.H. (2005). ECG feature extraction based on multiresolution wavelet transform. In: Proceedings of the IEEE engineering in medicine and biology. Shanghai, China, pp. 3902–3905. https://doi.org/10.1109/IEMBS.2005.1615314.
Martis, R.J., Acharya, U.R., Min, L.C. (2013). ECG beat classification using PCA, LDA, ICA and discrete wavelet transform. Biomedical Signal Processing and Control, 8(5), 437–448. https://doi.org/10.1016/j.bspc.2013.01.005.
Mateo, J., Torres, A., Aparicio, A., Santos, J. (2016). An efficient method for ecg beat classification and correction of ectopic beats. Computers and Electrical Engineering, 53(C), 219–229. https://doi.org/10.1016/j.compeleceng.2015.12.015.
McSharry, P.E., Clifford, G.D., Tarassenko, L., Smith, L.A. (2003). A dynamical model for generating synthetic electrocardiogram signals. IEEE Transactions on Biomedical Engineering, 50(3), 289–294. https://doi.org/10.1109/TBME.2003.808805.
Mohammed, E.A., Far, B.H., Naugler, C. (2014). Applications of the mapreduce programming framework to clinical big data analysis: Current landscape and future trends. BioData Mining, 7(1), 22. https://doi.org/10.1186/1756-0381-7-22.
Moody, G.B., & Mark, R.G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50. https://doi.org/10.1109/51.932724.
Mozaffarian, D., Benjamin, E., Go, A., Arnett, D., Blaha, M., Cushman, M., De Ferranti, S., Després, J, Fullerton, H., Howard, V., Huffman, M., Judd, S., Kissela, B., Lackland, D., Lichtman, J., Lisabeth, L., Liu, S., Mackey, R., Matchar, D., McGuire, D., Mohler, E., Moy, C., Muntner, P., Mussolino, M., Nasir, K., Neumar, R., Nichol, G., Palaniappan, L., Pandey, D., Reeves, M., Rodriguez, C., Sorlie, P., Stein, J., Towfighi, A., Turan, T., Virani, S., Willey, J., Woo, D., Yeh, R., Turner, M. (2015). Executive summary: Heart disease and stroke statistics-2015 update: A report from the american heart association. Circulation, 131(4), 434–441. https://doi.org/10.1161/CIR.0000000000000157.
Ordonez, C. (2007). Building statistical models and scoring with UDFs. In Proceedings of the ACM SIGMOD international conference on management of data. https://doi.org/10.1145/1247480.1247599 (pp. 1005–1016). New York: ACM.
Ordonez, C. (2010). Statistical model computation with UDFs. IEEE Transactions on Knowledge and Data Engineering, 22(12), 1752–1765. https://doi.org/10.1109/TKDE.2010.44.
Ordonez, C., & García-García, J. (2016). Managing big data analytics workflows with a database system. In Proceedings of the international symposium on cluster, cloud and grid computing. https://doi.org/10.1109/CCGrid.2016.63 (pp. 649–655). Cartagena: IEEE.
Pan, J., & Tompkins, W.J. (1985). A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering BME, 32(3), 230–236. https://doi.org/10.1109/TBME.1985.325532.
Pandey, S., Voorsluys, W., Niu, S., Khandoker, A., Buyya, R. (2012). An autonomic cloud environment for hosting ecg data analysis services. Future Generation Computer Systems, 28(1), 147–154. https://doi.org/10.1016/j.future.2011.04.022.
Petrutiu, S., Sahakian, A.V., Swiryn, S. (2007). Abrupt changes in fibrillatory wave characteristics at the termination of paroxysmal atrial fibrillation in humans. Europace, 9(7), 466–470. https://doi.org/10.1093/europace/eum096.
Ramakrishnan, R., & Gehrke, J. (2000). Database management systems. McGraw Hill.
Sahoo, S.S., Jayapandian, C., Garg, G., Kaffashi, F., Chung, S., Bozorgi, A., Chen, C.H., Loparo, K., Lhatoo, S.D., Zhang, G.Q. (2014). Heart beats in the cloud: Distributed analysis of electrophysiological ’Big Data’ using cloud computing for epilepsy clinical research. Journal of the American Medical Informatics Association, 21(2), 263–271. https://doi.org/10.1136/amiajnl-2013-002156.
Saktheeswari, R., & Adalarasu, K. (2017). Survey on signal processing techniques for diagnoising cardiovascular diseases. In: Proceedings of the international conference on innovations in information, embedded and communication systems. Coimbatore, India, pp. 1–4. https://doi.org/10.1109/ICIIECS.2017.8276116.
Shvachko, K., Kuang, H., Radia, S., Chansler, R. (2010). The Hadoop distributed file system. In Proceedings of the symposium on mass storage systems and technologies. https://doi.org/10.1109/MSST.2010.5496972 (pp. 1–10). Washington: IEEE Computer Society.
Trigo, J.D., Alesanco, Á, Martínez, I., García, J. (2012). A review on digital ecg formats and the relationships between them. IEEE Transactions on Information Technology in Biomedicine, 16(3), 432–444. https://doi.org/10.1109/TITB.2011.2176955.
Vincent, A.E., & Sreekumar, K. (2017). A survey on approaches for ECG signal analysis with focus to feature extraction and classification. In Proceedings of the international conference on inventive communication and computational technologies. https://doi.org/10.1109/ICICCT.2017.7975175 (pp. 140–144). India: Coimbatore.
Wang, L., Chen, D., Ranjan, R., Khan, S.U., KolOdziej, J., Wang, J. (2012). Parallel processing of massive eeg data with MapReduce. In: Proceedings of the international conference on parallel and distributed systems, pp. 164–171. https://doi.org/10.1109/ICPADS.2012.32.
Wee, K.C., & Zahid, M.S.M. (2015). Auto-tuned Hadoop MapReduce for ECG analysis. In: Proceedings of the IEEE student conference on research and development. Kuala Lumpur, Malaysia, pp. 329–334. https://doi.org/10.1109/SCORED.2015.7449350.
Woodbridge, D.M., Wilson, A.T., Rintoul, M.D., Goldstein, R.H. (2015). Time series discord detection in medical data using a parallel relational database. In: Proceedings of the international conference on bioinformatics and biomedicine. Washington, DC, USA, pp. 1420–1426. https://doi.org/10.1109/BIBM.2015.7359885.
Zhang, Y., Ordonez, C., Cabrera, W. (2016). Big data analytics integrating a parallel columnar DBMS and the R language. In: Proceedings of the international symposium on cluster, cloud and grid computing, pp. 627–630. https://doi.org/10.1109/CCGrid.2016.94.
Zhou, B., Ma, Q., Song, Y., Bian, C. (2016). Cloud-based dynamic electrocardiogram monitoring and analysis system, IEEE, Datong. https://doi.org/10.1109/CISP-BMEI.2016.7852997.
Acknowledgements
The authors would like to acknowledge the funding provided for this research by the Mexican Council of Science and Technology (CONACYT) and the Autonomous University of Sinaloa (UAS).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Castro-Lopez, O., Lopez-Barron, D.E. & Vega-Lopez, I.F. Next-generation heartbeat classification with a column-store DBMS and UDFs. J Intell Inf Syst 54, 363–390 (2020). https://doi.org/10.1007/s10844-019-00557-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-019-00557-w