Abstract
Data production has recently witnessed explosive growth, reaching an insurmountable amount (larger than 4 ZB in 2013). This includes data sources such as sensors used to gather climate information, reports on household parameters, posts to social media sites containing digital pictures and videos, purchase transaction records, and cell phone GPS signals, to name a few. Not yet having more than an intuitive and ad hoc definition, big data is challenging the IT infrastructure of companies and organizations, forcing them to look for viable solutions leading to data processing such that enterprises can deploy a better business strategy. In essence, big data implies collecting, extracting, transforming, transporting, loading (ETL), classifying, analyzing, interpreting, and visualizing, among many other operations, on large amounts of structured, semi-structured, and unstructured data, in the order of a few petabytes per day, executed and terminated in critical time. This paper will introduce the architecture and the corresponding functions of a platform and tools implementing part of these challenging operations, while others are being obtained via composing elementary operations. The architecture is built around a distributed network of virtual servers called “agents,” which can migrate around a network of hardware servers whenever available resources are provided or created. A control center makes decisions on moving the agents based on the availability of resources when needed. An example from the telecommunications industry will illustrate how the platform is applied to this domain of big data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Reinsel D, Gantz J (2011) Extracting value form chaos. http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf. Accessed Feb 2015
B. M. 2011 hype cycle special report (2011) http://www.gartner.com/newsroom/id/1763814. Accessed Feb 2015
Amazon. AWS Case Study: Obama for America Campaign 2012 (2012) http://aws.amazon.com/solutions/case-studies/obama. Accessed Feb 2015
White T (2009) Hadoop: the definitive guide. 1st edn. O’Reilly Media, Newton
Podesta J, Pritzker P, Moniz E (2014) Seizing opportunities, preserving values, 1st edn. White House Publishing, Washington
Chen M, Mao S, Liu Y (2014) Big data: A survey. Mob Netw Appl 19(2):171–209
Gunarathne T, Wu T-L, Choi JY, Bae S-H, Qiu J (2011) Cloud computing paradigms for pleasingly parallel biomedical applications. Concurr Comput: Pract Exp 23(17):2338–2354
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18
Beisken S, Meinl T, Wiswedel B, de Figueiredo L, Berthold M, Steinbeck C (2013) Knimecdk: workflow-driven cheminformatics. BMC Bioinf 14(1):257
R. D. C. Team (2011) R: a language and environment for statistical computing. R Development Core Team, 1st edn
PMMLorg. Pmml 4.2—general structure (2014) http://www.dmg.org/v4-2-1/GeneralStructure.html. Accessed Feb 2015
Jeffrey D, Sanjay G (2004) Proceedings of usenix osdi ’04: Operating systems design and implementation. In: ICSOC, pp 107–111, Oct 2004
Big Data for Development: Opportunities Challenges (2012) http://www.unglobalpulse.org/projects/BigDataforDevelopment. Accessed Feb 2015
Eaton C, Deutsh T, Deroos D, Lapis D, Zikopoulos, P (2012) Understanding big data, analytics for enterprise class; hadoop and streaming data. McGraw-Hill, 1st edn 2012
Hadoop (2015) Hadoop Wiki: PoweredBy http://wiki.apache.org/hadoop/PoweredBy. Accessed Feb 2015
Apache. Apache Mahout Project (2014) https://mahout.apache.org/. Accessed Feb 2015
Solomon B, Ionescu D, Litoiu M, Mihaescu M (2007) Towards a real-time reference architecture for autonomic systems. In: SEAMS ’07: proceedings of the 2007 international workshop on software engineering for adaptive and self-managing systems, pp. 1–10
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ionescu, B., Ionescu, D., Gadea, C., Solomon, B., Trifan, M. (2016). An Architecture and Methods for Big Data Analysis. In: Balas, V., C. Jain, L., Kovačević, B. (eds) Soft Computing Applications. SOFA 2014. Advances in Intelligent Systems and Computing, vol 356. Springer, Cham. https://doi.org/10.1007/978-3-319-18296-4_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-18296-4_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18295-7
Online ISBN: 978-3-319-18296-4
eBook Packages: EngineeringEngineering (R0)