Skip to main content

Building Scalable Software for Data Centers: An Approach to Distributed Computing at Enterprise Level

  • Chapter
  • First Online:
Handbook on Data Centers

Abstract

Big data can be defined as a large collection of data that it is difficult to process due to its size or complexity. In 2001 Doug Laney, a META Group (now Gartner) analyst, published a research report defining 3 dimensions that characterize big-data problems: Volume, Variety and Velocity (also known as 3V’s). The original report can be found at the garter site [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Laney, D.: 3D Data Management Controlling-Data Volume, Velocity and Variety (February 2001)

    Google Scholar 

  2. LSST Corporation: LSST and Technology Innovation (2013)

    Google Scholar 

  3. Google Corporation: Waze Champs Meetup at Waze HQ. http://blog.waze.com/ (2013)

  4. Bollen, J., Mao, H., Zeng, X.: Twitter Mood Predicts the Stock Market. Journal of Computational Science 2(1) (2011) 1–8

    Article  Google Scholar 

  5. Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting Elections with Twitter: What 140 Characters Reveal About Political Sentiment. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media. (May 23–26 2010) 178–185

    Google Scholar 

  6. O’Connor, B., Balasubramanyan, R., Routedge, B., Smith, N.: From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media. (May 23–26 2010) 122–129

    Google Scholar 

  7. Golder, S.A., Macy, M.W.: Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures. Science 333(6051) (may 2002) 1878–1881

    Article  Google Scholar 

  8. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A View of Cloud Computing. Communications of the ACM 53(4) (April 2010) 50–58

    Google Scholar 

  9. Dorband, J.E., Raytheon, J.P., Ranawake, U.: “Commodity Computing Clusters at Goddard Space Flight Center”. Online journal of space communication, School of Media Arts and Studies Scripps College of Communication, Ohio University (2013)

    Google Scholar 

  10. Neuman, B.C.: Scale in Distributed Systems. Readings in Distributed Computing Systems (1994) 463–489

    Google Scholar 

  11. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of reusable Object-Oriented Software. Addison-Wesley (1994)

    Google Scholar 

  12. Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: Pattern Oriented Software Architecture: A System of Patterns. Volume 1. J. Willey (1999)

    Google Scholar 

  13. Fowler, M.: Analysis Patterns: Reusable Object Models. 1 edn. Addison-Wesley Professional (1996)

    Google Scholar 

  14. Schmidt, D., Stal, M., Rohnert, H., Buschmann, F.: Pattern-oriented Software Architecture: Patterns for Concurrent and Networked Objects. Volume 2. J. Willey (2000)

    Google Scholar 

  15. S. Ishikawa, M.S.: Pattern Language: Towns, Buildings, Construction. Oxford University Press (1977)

    Google Scholar 

  16. Buschmann, F., Henney, K., Schmidt, D.C.: Pattern-Oriented Software Architecture: On Patterns and Pattern Languages. Volume 5. J. Willey (April 2007)

    Google Scholar 

  17. Buschmann, F., Henney, K., Schmidt, D.C.: Pattern-Oriented Software Architecture: A Pattern Language for Distributed Computing. Volume 4. J. Willey (2007)

    Google Scholar 

  18. OPL Working Group, B.U.: A Pattern Language for Parallel Programming ver2.0 (2013)

    Google Scholar 

  19. Fayad, M., Schmidt, D.C.: Object-Oriented Application Frameworks. Communications of the ACM 40(10) (October 1997) 32–38

    Article  Google Scholar 

  20. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implemention, San Francisco, CA (December 2004) 1–13

    Google Scholar 

  21. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press (2011)

    Google Scholar 

  22. Thusoo, A.: Hive - A Petabyte Scale Data Warehouse using Hadoop. https://www.facebook.com/note.php?note_id=89508453919 (2009)

  23. O’Malley, O., Murthy, A.: Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds. http://developer.yahoo.com/blogs/hadoop/hadoop-sorts-petabyte-16-25-hours-terabyte-62-422.html (2013)

  24. Apache Software Foundation: Hive™. http://hive.apache.org/ (2013)

  25. Apache Software Foundation: Pig™. http://pig.apache.org/ (2013)

  26. Microsoft Corporation: DryadLINQ™. http://research.microsoft.com/en-us/projects/dryadlinq/ (2013)

  27. MongoDB Inc: MongoDB™. http://www.mongodb.org/ (2013)

  28. Apache Software Foundation: HBase™. http://hbase.apache.org/ (2013)

  29. Apache Software Foundation: Cassandra™. http://cassandra.apache.org/ (2013)

  30. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-Scale Data Analysis. In: Proceedings of the International Conference on Management of Data, New York, NY, USA, ACM (August 2009) 165–178

    Google Scholar 

  31. Dean, J., Ghemawat, S.: MapReduce: A Flexible Data Processing Tool. Communications of the ACM 53(1) (January 2010) 72–77

    Article  Google Scholar 

  32. Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and Parallel DBMSs: Friends or Foes? Communications of the ACM 53(1) (January 2010) 64–71

    Article  Google Scholar 

  33. Apache Software Foundation: Hadoop. http://hadoop.apache.org/ (2013)

  34. Singh, S.: Hadoop at Yahoo!: More Than Ever Before. http://developer.yahoo.com/blogs/hadoop/hadoop-yahoo-more-ever-095826045.html/ (2013)

  35. Apache Software Foundation: Hadoop Wiki. http://wiki.apache.org/hadoop/PoweredBy/ (2013)

  36. Apache Software Foundation: HDFS Users Guide. http://hadoop.apache.org/docs/stable/ hdfs_user_guide.html/ (2013)

  37. NetApp Corporation: Open Solution for Hadoop. http://www.netapp.com/us/solutions/big-data/hadoop.aspx/ (2013)

  38. MapR Technologies, Inc: MapR™ Distribution for Apache Hadoop Advantages. http://www.mapr.com/products/why-mapr/ (2013)

  39. Gupta, K., Jain, R., Koltsidas, I., Pucha, H., Sarkar, P., Seaman, M., Subhraveti, D.: GPFS-SNC: An Enterprise Storage Framework for Virtual-Machine Clouds. IBM Journal of Research and Development 55(6) (December 2011) 2:1–2:10

    Article  Google Scholar 

  40. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A Distributed Storage System for Structured Data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, Berkeley, CA, USA, USENIX Association (November 2006) 15–15

    Google Scholar 

  41. Apache Software Foundation: Apache HBase™. http://hbase.apache.org/acid-semantics.html/ (2013)

  42. Apache Software Foundation: Pig Latin Basics. http://pig.apache.org/docs/r0.11.1/basic.html/ (2013)

  43. Apache Software Foundation: LanguageManual DDL. https://cwiki.apache.org/confluence/ display/Hive/ LanguageManual+DDL/ (2013)

  44. Apache Software Foundation: LanguageManual DML. https://cwiki.apache.org/confluence/ display/Hive/ LanguageManual+DML/ (2013)

  45. Snir, M.: A Compilation of Parallel Patterns http://www.cs.uiuc.edu/homes/snir/PPP/ (2013)

  46. Apache Software Foundation: Apache 2.0 license: http://www.apache.org/licenses/LICENSE-2.0.txt

Download references

Acknowledgment

Part of the computations of this work were performed in EOLO, the HPC of Climate Change of the International Campus of Excellence of Moncloa, funded by MECD and MICINN.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis Javier García Villalba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Turrado García, F., Sandoval Orozco, A., García Villalba, L. (2015). Building Scalable Software for Data Centers: An Approach to Distributed Computing at Enterprise Level. In: Khan, S., Zomaya, A. (eds) Handbook on Data Centers. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2092-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-2092-1_23

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-2091-4

  • Online ISBN: 978-1-4939-2092-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics