Skip to main content

An Architecture and Methods for Big Data Analysis

  • Conference paper
  • First Online:
Soft Computing Applications (SOFA 2014)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 356))

Included in the following conference series:

  • 888 Accesses

Abstract

Data production has recently witnessed explosive growth, reaching an insurmountable amount (larger than 4 ZB in 2013). This includes data sources such as sensors used to gather climate information, reports on household parameters, posts to social media sites containing digital pictures and videos, purchase transaction records, and cell phone GPS signals, to name a few. Not yet having more than an intuitive and ad hoc definition, big data is challenging the IT infrastructure of companies and organizations, forcing them to look for viable solutions leading to data processing such that enterprises can deploy a better business strategy. In essence, big data implies collecting, extracting, transforming, transporting, loading (ETL), classifying, analyzing, interpreting, and visualizing, among many other operations, on large amounts of structured, semi-structured, and unstructured data, in the order of a few petabytes per day, executed and terminated in critical time. This paper will introduce the architecture and the corresponding functions of a platform and tools implementing part of these challenging operations, while others are being obtained via composing elementary operations. The architecture is built around a distributed network of virtual servers called “agents,” which can migrate around a network of hardware servers whenever available resources are provided or created. A control center makes decisions on moving the agents based on the availability of resources when needed. An example from the telecommunications industry will illustrate how the platform is applied to this domain of big data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Reinsel D, Gantz J (2011) Extracting value form chaos. http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf. Accessed Feb 2015

  2. B. M. 2011 hype cycle special report (2011) http://www.gartner.com/newsroom/id/1763814. Accessed Feb 2015

  3. Amazon. AWS Case Study: Obama for America Campaign 2012 (2012) http://aws.amazon.com/solutions/case-studies/obama. Accessed Feb 2015

  4. White T (2009) Hadoop: the definitive guide. 1st edn. O’Reilly Media, Newton

    Google Scholar 

  5. Podesta J, Pritzker P, Moniz E (2014) Seizing opportunities, preserving values, 1st edn. White House Publishing, Washington

    Google Scholar 

  6. Chen M, Mao S, Liu Y (2014) Big data: A survey. Mob Netw Appl 19(2):171–209

    Article  MathSciNet  Google Scholar 

  7. Gunarathne T, Wu T-L, Choi JY, Bae S-H, Qiu J (2011) Cloud computing paradigms for pleasingly parallel biomedical applications. Concurr Comput: Pract Exp 23(17):2338–2354

    Article  Google Scholar 

  8. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

  9. Beisken S, Meinl T, Wiswedel B, de Figueiredo L, Berthold M, Steinbeck C (2013) Knimecdk: workflow-driven cheminformatics. BMC Bioinf 14(1):257

    Article  Google Scholar 

  10. R. D. C. Team (2011) R: a language and environment for statistical computing. R Development Core Team, 1st edn

    Google Scholar 

  11. PMMLorg. Pmml 4.2—general structure (2014) http://www.dmg.org/v4-2-1/GeneralStructure.html. Accessed Feb 2015

  12. Jeffrey D, Sanjay G (2004) Proceedings of usenix osdi ’04: Operating systems design and implementation. In: ICSOC, pp 107–111, Oct 2004

    Google Scholar 

  13. Big Data for Development: Opportunities Challenges (2012) http://www.unglobalpulse.org/projects/BigDataforDevelopment. Accessed Feb 2015

  14. Eaton C, Deutsh T, Deroos D, Lapis D, Zikopoulos, P (2012) Understanding big data, analytics for enterprise class; hadoop and streaming data. McGraw-Hill, 1st edn 2012

    Google Scholar 

  15. Hadoop (2015) Hadoop Wiki: PoweredBy http://wiki.apache.org/hadoop/PoweredBy. Accessed Feb 2015

  16. Apache. Apache Mahout Project (2014) https://mahout.apache.org/. Accessed Feb 2015

  17. Solomon B, Ionescu D, Litoiu M, Mihaescu M (2007) Towards a real-time reference architecture for autonomic systems. In: SEAMS ’07: proceedings of the 2007 international workshop on software engineering for adaptive and self-managing systems, pp. 1–10

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bogdan Ionescu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ionescu, B., Ionescu, D., Gadea, C., Solomon, B., Trifan, M. (2016). An Architecture and Methods for Big Data Analysis. In: Balas, V., C. Jain, L., Kovačević, B. (eds) Soft Computing Applications. SOFA 2014. Advances in Intelligent Systems and Computing, vol 356. Springer, Cham. https://doi.org/10.1007/978-3-319-18296-4_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18296-4_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18295-7

  • Online ISBN: 978-3-319-18296-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics