ABSTRACT
Big data systems are becoming pervasive. They are distributed systems that include redundant processing nodes, replicated storage, and frequently execute on a shared 'cloud' infrastructure. For these systems, design-time predictions are insufficient to assure runtime performance in production. This is due to the scale of the deployed system, the continually evolving workloads, and the unpredictable quality of service of the shared infrastructure. Consequently, a solution for addressing performance requirements needs sophisticated runtime observability and measurement. Observability gives real-time insights into a system's health and status, both at the system and application level, and provides historical data repositories for forensic analysis, capacity planning, and predictive analytics. Due to the scale and heterogeneity of big data systems, significant challenges exist in the design, customization and operations of observability capabilities. These challenges include economical creation and insertion of monitors into hundreds or thousands of computation and data nodes, efficient, low overhead collection and storage of measurements (which is itself a big data problem), and application-aware aggregation and visualization. In this paper we propose a reference architecture to address these challenges, which uses a model-driven engineering toolkit to generate architecture-aware monitors and application-specific visualizations.
- J. Weiner and N. Bronson. Facebook's Top Open Data Problems {Online}. https://research.facebook.com/blog/1522692927972019/facebook-s-top-open-data-problems/ (Accessed 10 Nov 2014).Google Scholar
- M. Finnegan, "Boeing 787s to create half a terabyte of data per flight, says Virgin Atlantic," Computerworld UK, 6 March 2013, http://www.computerworlduk.com/news/infrastructure/3433595/boeing-787s-to-create-half-a-terabyte-of-data-per-flight-says-virgin-atlantic/ (Accessed 20 Feb 2014).Google Scholar
- P. Groves, B. Kayyali, D. Knott, et al., "The 'big data' revolution in healthcare." McKinsey & Company, Report, 2013, http://www.mckinsey.com/insights/health_systems_and_services/the_big-data_revolution_in_us_health_care (Accessed 20 Feb 2014).Google Scholar
- V. Turner, J. F. Gantz, D. Reinsel, et al., "The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things." International Data Corporation, White Paper, IDC_1672, 2014, http://idcdocserv.com/1678 (Accessed 10 Nov 2014).Google Scholar
- P. J. Sadalage and M. Fowler, NoSQL Distilled. Addison-Wesley Professional, 2012.Google ScholarDigital Library
- W. Vogels, "Amazon.com: E-Commerce at Interplanetary Scale," in Proc. O'Reilly Emerging Technology Conf., San Diego, CA, USA, 2005. http://conferences.oreillynet.com/cs/et2005/view/e_sess/5974 (Accessed 7 Nov 2014).Google Scholar
- J. Dean and L. A. Barroso, "The Tail at Scale," Communications of the ACM, vol. 56, no. 2, pp. 74--80, February 2013. doi: 10.1145/2408776.2408794 Google ScholarDigital Library
- R. Bias, "Architectures for Open and Scalable Clouds," in Proc. CloudConnect 2012, Santa Clara, CA, USA, 2012.Google Scholar
- BBC News. Instagram, Vine and Netflix hit by Amazon glitch {Online}. http://www.bbc.com/news/technology-23839901 (Accessed 7 Oct 2014).Google Scholar
- B. Wong and C. Kalantzis. A State of Xen - Chaos Monkey & Cassandra {Online}. http://techblog.netflix.com/2014/10/a-state-of-xen-chaos-monkey-cassandra.html (Accessed 30 Oct 2014).Google Scholar
- M. Nygard, Release it! Design and Deploy Production-ready Software, 1st Edition. Pragmatic Bookshelf, 2007. Google ScholarDigital Library
- J. O. Kephart and D. M. Chess, "The vision of autonomic computing," Computer, vol. 36, no. 1, pp. 41--50, January 2003, doi: 10.1109/MC.2003.1160055. Google ScholarDigital Library
- C. Watson. Observability at Twitter {Online}. https://blog.twitter.com/2013/observability-at-twitter (Accessed 10 Nov 2014).Google Scholar
- M. L. Massie, B. N. Chun, and D. E. Culler, "The ganglia distributed monitoring system: design, implementation, and experience," Parallel Computing, vol. 30, no. 7, pp. 817--840, July 2004, doi: 10.1016/j.parco.2004.04.001.Google ScholarCross Ref
- E. Imamagic and D. Dobrenic, "Grid Infrastructure Monitoring System Based on Nagios," in Proc. 2007 Workshop on Grid Monitoring (GMW '07), Monterey, California, USA, 2007, pp. 23--28. doi: 10.1145/1272680.1272685. Google ScholarDigital Library
- J. Kowall and W. Cappelli, "Magic Quadrant for Application Performance Monitoring." Gartner, Inc., Technical Report, G00262851, 2014.Google Scholar
- K. Ren, J. Lopez, and G. Gibson, "Otus: Resource Attribution in Data-intensive Clusters," in Proc. Second International Workshop on MapReduce and Its Applications (MapReduce '11), 2011, pp. 1--8. doi: 10.1145/1996092.1996094 Google ScholarDigital Library
- J. Yin, P. Sun, Y. Wen, et al., "Cloud3DView: An Interactive Tool for Cloud Data Center Operations," in Proc. ACM Conference on SIGCOMM (SIGCOMM '13), Hong Kong, China, 2013, pp. 499--500. doi: 10.1145/2486001.2491704 Google ScholarDigital Library
- E. Garduno, S. P. Kavulya, J. Tan, et al., "Theia: Visual Signatures for Problem Diagnosis in Large Hadoop Clusters," in Proc. 26th International Conference on Large Installation System Administration: Strategies, Tools, and Techniques (lisa'12), San Diego, CA, 2012, pp. 33--42. Google ScholarDigital Library
- M. Shaw and D. Garlan, Software Architecture: Perspectives on an Emerging Discipline. Prentice Hall, 1996. Google ScholarDigital Library
- D. Garlan, S.-W. Cheng, A.-C. Huang, et al., "Rainbow: Architecture-Based Self Adaptation with Reusable Infrastructure," IEEE Computer, vol. 37, no. 10, October 2004, doi: 10.1109/MC.2004.175. Google ScholarDigital Library
- Y. He, X. Chen, and G. Lin, "Composition of Monitoring Components for On-demand Construction of Runtime Model Based on Model Synthesis," in Proc. 5th Asia-Pacific Symposium on Internetware (Internetware '13), Changsha, China, 2013, pp. 20:1--20:4. doi: 10.1145/2532443.2532472 Google ScholarDigital Library
- J. S. Kim and D. Garlan, "Analyzing architectural styles," Journal of Systems and Software, vol. 83, pp. 1216--1235, 2010, doi: 10.1016/j.jss.2010.01.049. Google ScholarDigital Library
- D. Garlan, R. T. Monroe, and D. Wile, "Acme: Architectural Description of Component-Based Systems". In G. T. Leavens and M. Sitaraman, (Eds.), Foundations of Component-Based Systems (pp. 47--68). Cambridge University Press, 2000. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, et al., "Benchmarking Cloud Serving Systems with YCSB," in Proc. 1st ACM Symp. on Cloud Computing (SoCC '10), 2010, pp. 143--154. doi: 10.1145/1807128.1807152. Google ScholarDigital Library
- S. Patil, M. Polte, K. Ren, et al., "YCSB++: Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores," in Proc. 2nd ACM Symp. on Cloud Computing (SOCC '11), 2011, pp. 9:1--9:14. doi: 10.1145/2038916.2038925. Google ScholarDigital Library
Index Terms
- Runtime Performance Challenges in Big Data Systems
Recommendations
Challenges for MapReduce in Big Data
SERVICES '14: Proceedings of the 2014 IEEE World Congress on ServicesIn the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce ...
A Brief Survey on Big Data in Healthcare
This article presents a brief introduction to big data and big data analytics and also their roles in the healthcare system. A definite range of scientific researches about big data analytics in the healthcare system have been reviewed. The definition ...
Comments