ABSTRACT
Hybrid data analysis systems integrate an analytic tool and a data management tool. While hybrid systems have benefits, in order to be effective data movement between the two hybrid components must be minimized. Through experimental results we demonstrate that under workloads whose inputs vary in size, shape, and location, automation is the only practical way to manage data movement in hybrid systems.
- Park, J., Bikshandi, G., Vaidyanathan, K., Tang, P., Dubey, P., and Kim, D. 2013. Tera-Scale 1D FFT with Low-Communication Algorithm and Intel Xeon Phi Coprocessors. Proceedings of SC13. Google ScholarDigital Library
- Tiwari, D., Vazhkudai, S., Kim, Y., Ma, X., Boboila, S. and Desnoyers, P. 2012. Reducing data movement costs using energy-efficient, active computation on SSD. USENIX 2012, 1--5. Google ScholarDigital Library
- Ihaka, R., and Gentelman, R. 1996. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 299--314.Google Scholar
- Stonebraker, M., Brown, P., Poliakov, A., Raman, S. 2011. The Architecture of SciDB. Proceedings of SSDBM 2011. Google ScholarDigital Library
- Leyshock, P., Maier, D., and Tufte, K. 2013. Agrios: A hybrid approach to big array analytics. IEEE International Conference on Big Data, 85--93.Google Scholar
- Yi, Z., Herodotou, H., and Yang, J. 2009. RIOT: I/O-efficient numerical computing without SQL. CIDR 2009, 1--11.Google Scholar
- Grosse, P., Lehner, W., Weichert, T., Farber, F., and Li, W. S. 2011. Bridging two worlds with RICE. Proceedings of the VLDB Endowment, 1307--1317.Google Scholar
- Das, S., Simanis, Y., Beyer, K. S., Gemulla, R., Haas, P. J., and McPherson, J. 2011. Ricardo: Integrating R and Hadoop. Proceedings of the 2010 International Conference on Management of Data, 987--998 Google ScholarDigital Library
Index Terms
- Data movement in hybrid analytic systems: a case for automation
Recommendations
Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system
AbstractOver the last five years, Apache Spark has become a major software platform for in-memory data analysis. Acknowledging its widespread use, we present a comprehensive study of system characteristics of Spark targeting scientific data ...
Highlights- We develop a benchmark, ArrayBench, for benchmarking scientific data analytics that process gene expression matrices using Spark and SciDB.
A Brief Survey on Big Data in Healthcare
This article presents a brief introduction to big data and big data analytics and also their roles in the healthcare system. A definite range of scientific researches about big data analytics in the healthcare system have been reviewed. The definition ...
Responsible Big Data Analytics for E-Business Services
ICBDR '21: Proceedings of the 5th International Conference on Big Data ResearchThis paper examines responsible big data analytics for e-business services and looks at how to use responsible big data analytics to obtain responsible e-business services. It addresses why responsibility matters to big data analytics and e-business ...
Comments