Abstract:
Driven by the computing resource requirement, there are increasing demands of migrating data driven analysis from local computing resource to powerful remote resources su...Show MoreMetadata
Abstract:
Driven by the computing resource requirement, there are increasing demands of migrating data driven analysis from local computing resource to powerful remote resources such as cloud and high performance computing cluster. In addition to various commercial cloud services, there are also rich selections of high performance computing centers in academia providing cyberinfrastructure (CI) offerings. However, access barriers exist in bring those resources to data driven research community at large. To help lower those access barriers and increase the adoption of utilization of remote resources for data driven analysis, we propose a new service model for utilizing remote computing resources, which empower users to deploy and run their big data application as a web application on remote computing resources. There are several key design goals of this model including enabling interactivity, reusability and reproducibility. Compare to the traditional batch-processing model commonly supported by CI resource providers, supporting a web application interface enables interactive analysis capabilities. Users design the application through a configuration file utilizing a set of predefined task templates that are also extensible by users. The application generated from the configuration file is self-contained and can be deployed without alleviated system privilege. Therefore, ad-hoc analysis routines can be described and preserved in a format that can be shared and re-used. Remote resources can also be described and implemented through configuration files to automatically bridge the application with remote resources and facilitate migration with different resources in the future. Consequently, analysis tasks can be preserved through the configuration file for reproducibility. Here we detail our proposed application framework and its preliminary implementations. We demonstrated usage of this framework with a practical use case of aggregating and analyzing live tweets.
Date of Conference: 10-13 December 2018
Date Added to IEEE Xplore: 24 January 2019
ISBN Information: