skip to main content
10.1145/1882992.1883067acmotherconferencesArticle/Chapter ViewAbstractPublication PagesihiConference Proceedingsconference-collections
poster

Large-scale multimodal mining for healthcare with mapreduce

Published: 11 November 2010 Publication History

Abstract

Recent advances in healthcare and bioscience technologies and proliferation of portable medical devices have produce massive amount of multimodal data, the need for parallel processing is apparent for mining these data sets, which can range anywhere from tens of gigabytes, to terabytes or even petabytes. AALIM (Advanced Analytics for Information Management) is a new multimodal mining-based clinical decision support system that brings together patient data captured in many modalities to provide a holistic presentation of a patient's exam data, diseases, and medications. In addition, it offers disease-specific similarity search based on the various data modalities. The current deployed AALIM system is only able to process limited amount of patient data per day. In this paper, we attempt to address this challenge of building a healthcare multimodal mining system on top of the MapReduce framework, specifically its popular open-source implementation, Hadoop. We presented a scalable and generic framework that enables automatic parallelization of the healthcare multimodal mining algorithm, and distribution of large-scale computation that achieves high performance on clusters of commodity servers. Initial testing of importing a single AALIM module (EKG period estimation) using Hadoop on a cluster of servers shows very promising results.

References

[1]
D. Brenner and E. Hall. Computed tomography, a l an increasing source of radiation exposure. N. Engl. J. Med., 357(22):2277--2284, November 2007.
[2]
C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-Reduce for Machine Learning on Multicore. NIPS 2007: Advances in Neural Information Processing Systems, pages 281--288, 2007.
[3]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004: 6th Symposium on Operating Systems Design and Implementation, pages 137--150, 2004.
[4]
T. Elsayed, J. Lin, and D. W. Oard. Pairwise Document Similarity in Large Collections with MapReduce. Proceedings of ACL-08, pages 265--268, 2008.
[5]
Hadoop. http://hadoop.apache.org/core/.
[6]
T. Syeda-Mahmood, D. Beymer, and F. Wang. Shape-based matching of ECG recordings. IEEE International Conference on Engineering in Medicine and Biology, pages 2012--2018, 2007.
[7]
T. Syeda-Mahmood, F. Wang, D. Beymer, M. London, and R. Reddy. Characterizing spatio-temporal patterns for disease discrimination in cardiac echo videos. In MICCAI, pages 261--269, 2007.
[8]
F. Wang, T. Syeda-Mahmood, and D. Beymer. Information extraction from multimodal ecg documents. In Tenth International Conference on Document Analysis and Recognition(ICDAR'2009), pages 381--385, Spain, 2009. IEEE.
[9]
R. Yan, M.-O. Fleury, M. Merler, A. Natsev, and J. R. Smith. Large-Scale Multimedia Semantic Concept Modeling using Robust Subspace Bagging and MapReduce. LS-MMRM 2009: The 1st Workshop on Large-Scale Multimedia Retrieval and Mining, pages 35--42, 2009

Cited By

View all
  • (2019)Big Data Analytics in BioinformaticsBiotechnology10.4018/978-1-5225-8903-7.ch080(1967-1984)Online publication date: 2019
  • (2019)An Efficient MapReduce-Based Apriori-Like Algorithm for Mining Frequent Itemsets from Big DataWireless Internet10.1007/978-3-030-06158-6_8(76-85)Online publication date: 5-Jan-2019
  • (2019)Cloud‐based difference algorithm using big GPR data for roadbed damage detectionConcurrency and Computation: Practice and Experience10.1002/cpe.554532:23Online publication date: 11-Nov-2019
  • Show More Cited By

Index Terms

  1. Large-scale multimodal mining for healthcare with mapreduce

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        IHI '10: Proceedings of the 1st ACM International Health Informatics Symposium
        November 2010
        886 pages
        ISBN:9781450300308
        DOI:10.1145/1882992
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        In-Cooperation

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 11 November 2010

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. hadoop mapreduce
        2. high-performance computing
        3. multimodal mining for healthcare

        Qualifiers

        • Poster

        Conference

        IHI '10
        IHI '10: ACM International Health Informatics Symposium
        November 11 - 12, 2010
        Virginia, Arlington, USA

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)9
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 28 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2019)Big Data Analytics in BioinformaticsBiotechnology10.4018/978-1-5225-8903-7.ch080(1967-1984)Online publication date: 2019
        • (2019)An Efficient MapReduce-Based Apriori-Like Algorithm for Mining Frequent Itemsets from Big DataWireless Internet10.1007/978-3-030-06158-6_8(76-85)Online publication date: 5-Jan-2019
        • (2019)Cloud‐based difference algorithm using big GPR data for roadbed damage detectionConcurrency and Computation: Practice and Experience10.1002/cpe.554532:23Online publication date: 11-Nov-2019
        • (2018)Big Data and AnalyticsHandbook of Research on Pattern Engineering System Development for Big Data Analytics10.4018/978-1-5225-3870-7.ch004(55-66)Online publication date: 2018
        • (2018)Big Data Analytics in BioinformaticsHandbook of Research on Biomimicry in Information Retrieval and Knowledge Management10.4018/978-1-5225-3004-6.ch017(321-338)Online publication date: 2018
        • (2018)Big data handling mechanisms in the healthcare applications: A comprehensive and systematic literature reviewJournal of Biomedical Informatics10.1016/j.jbi.2018.03.01482(47-62)Online publication date: Jun-2018
        • (2016)Big Data Paradigm for Healthcare SectorBig Data10.4018/978-1-4666-9840-6.ch026(570-587)Online publication date: 2016
        • (2016)Big Data Paradigm for Healthcare SectorManaging Big Data Integration in the Public Sector10.4018/978-1-4666-9649-5.ch010(169-186)Online publication date: 2016
        • (2016)Insight of big data analytics in healthcare industry2016 International Conference on Computing, Communication and Automation (ICCCA)10.1109/CCAA.2016.7813696(95-100)Online publication date: Apr-2016
        • (2015)ScaDiPaSiBig Data Research10.5555/2991307.29913422:1(19-27)Online publication date: 1-Mar-2015
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media