Building Scalable Software for Data Centers: An Approach to Distributed Computing at Enterprise Level

Turrado García, Fernando; Sandoval Orozco, Ana Lucila; García Villalba, Luis Javier

doi:10.1007/978-1-4939-2092-1_23

Fernando Turrado García³,
Ana Lucila Sandoval Orozco³ &
Luis Javier García Villalba³

4037 Accesses

Abstract

Big data can be defined as a large collection of data that it is difficult to process due to its size or complexity. In 2001 Doug Laney, a META Group (now Gartner) analyst, published a research report defining 3 dimensions that characterize big-data problems: Volume, Variety and Velocity (also known as 3V’s). The original report can be found at the garter site [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Laney, D.: 3D Data Management Controlling-Data Volume, Velocity and Variety (February 2001)
Google Scholar
LSST Corporation: LSST and Technology Innovation (2013)
Google Scholar
Google Corporation: Waze Champs Meetup at Waze HQ. http://blog.waze.com/ (2013)
Bollen, J., Mao, H., Zeng, X.: Twitter Mood Predicts the Stock Market. Journal of Computational Science 2(1) (2011) 1–8
Article Google Scholar
Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting Elections with Twitter: What 140 Characters Reveal About Political Sentiment. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media. (May 23–26 2010) 178–185
Google Scholar
O’Connor, B., Balasubramanyan, R., Routedge, B., Smith, N.: From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media. (May 23–26 2010) 122–129
Google Scholar
Golder, S.A., Macy, M.W.: Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures. Science 333(6051) (may 2002) 1878–1881
Article Google Scholar
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A View of Cloud Computing. Communications of the ACM 53(4) (April 2010) 50–58
Google Scholar
Dorband, J.E., Raytheon, J.P., Ranawake, U.: “Commodity Computing Clusters at Goddard Space Flight Center”. Online journal of space communication, School of Media Arts and Studies Scripps College of Communication, Ohio University (2013)
Google Scholar
Neuman, B.C.: Scale in Distributed Systems. Readings in Distributed Computing Systems (1994) 463–489
Google Scholar
Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of reusable Object-Oriented Software. Addison-Wesley (1994)
Google Scholar
Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: Pattern Oriented Software Architecture: A System of Patterns. Volume 1. J. Willey (1999)
Google Scholar
Fowler, M.: Analysis Patterns: Reusable Object Models. 1 edn. Addison-Wesley Professional (1996)
Google Scholar
Schmidt, D., Stal, M., Rohnert, H., Buschmann, F.: Pattern-oriented Software Architecture: Patterns for Concurrent and Networked Objects. Volume 2. J. Willey (2000)
Google Scholar
S. Ishikawa, M.S.: Pattern Language: Towns, Buildings, Construction. Oxford University Press (1977)
Google Scholar
Buschmann, F., Henney, K., Schmidt, D.C.: Pattern-Oriented Software Architecture: On Patterns and Pattern Languages. Volume 5. J. Willey (April 2007)
Google Scholar
Buschmann, F., Henney, K., Schmidt, D.C.: Pattern-Oriented Software Architecture: A Pattern Language for Distributed Computing. Volume 4. J. Willey (2007)
Google Scholar
OPL Working Group, B.U.: A Pattern Language for Parallel Programming ver2.0 (2013)
Google Scholar
Fayad, M., Schmidt, D.C.: Object-Oriented Application Frameworks. Communications of the ACM 40(10) (October 1997) 32–38
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implemention, San Francisco, CA (December 2004) 1–13
Google Scholar
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press (2011)
Google Scholar
Thusoo, A.: Hive - A Petabyte Scale Data Warehouse using Hadoop. https://www.facebook.com/note.php?note_id=89508453919 (2009)
O’Malley, O., Murthy, A.: Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds. http://developer.yahoo.com/blogs/hadoop/hadoop-sorts-petabyte-16-25-hours-terabyte-62-422.html (2013)
Apache Software Foundation: Hive™. http://hive.apache.org/ (2013)
Apache Software Foundation: Pig™. http://pig.apache.org/ (2013)
Microsoft Corporation: DryadLINQ™. http://research.microsoft.com/en-us/projects/dryadlinq/ (2013)
MongoDB Inc: MongoDB™. http://www.mongodb.org/ (2013)
Apache Software Foundation: HBase™. http://hbase.apache.org/ (2013)
Apache Software Foundation: Cassandra™. http://cassandra.apache.org/ (2013)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-Scale Data Analysis. In: Proceedings of the International Conference on Management of Data, New York, NY, USA, ACM (August 2009) 165–178
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: A Flexible Data Processing Tool. Communications of the ACM 53(1) (January 2010) 72–77
Article Google Scholar
Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and Parallel DBMSs: Friends or Foes? Communications of the ACM 53(1) (January 2010) 64–71
Article Google Scholar
Apache Software Foundation: Hadoop. http://hadoop.apache.org/ (2013)
Singh, S.: Hadoop at Yahoo!: More Than Ever Before. http://developer.yahoo.com/blogs/hadoop/hadoop-yahoo-more-ever-095826045.html/ (2013)
Apache Software Foundation: Hadoop Wiki. http://wiki.apache.org/hadoop/PoweredBy/ (2013)
Apache Software Foundation: HDFS Users Guide. http://hadoop.apache.org/docs/stable/ hdfs_user_guide.html/ (2013)
NetApp Corporation: Open Solution for Hadoop. http://www.netapp.com/us/solutions/big-data/hadoop.aspx/ (2013)
MapR Technologies, Inc: MapR™ Distribution for Apache Hadoop Advantages. http://www.mapr.com/products/why-mapr/ (2013)
Gupta, K., Jain, R., Koltsidas, I., Pucha, H., Sarkar, P., Seaman, M., Subhraveti, D.: GPFS-SNC: An Enterprise Storage Framework for Virtual-Machine Clouds. IBM Journal of Research and Development 55(6) (December 2011) 2:1–2:10
Article Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A Distributed Storage System for Structured Data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, Berkeley, CA, USA, USENIX Association (November 2006) 15–15
Google Scholar
Apache Software Foundation: Apache HBase™. http://hbase.apache.org/acid-semantics.html/ (2013)
Apache Software Foundation: Pig Latin Basics. http://pig.apache.org/docs/r0.11.1/basic.html/ (2013)
Apache Software Foundation: LanguageManual DDL. https://cwiki.apache.org/confluence/ display/Hive/ LanguageManual+DDL/ (2013)
Apache Software Foundation: LanguageManual DML. https://cwiki.apache.org/confluence/ display/Hive/ LanguageManual+DML/ (2013)
Snir, M.: A Compilation of Parallel Patterns http://www.cs.uiuc.edu/homes/snir/PPP/ (2013)
Apache Software Foundation: Apache 2.0 license: http://www.apache.org/licenses/LICENSE-2.0.txt

Download references

Acknowledgment

Part of the computations of this work were performed in EOLO, the HPC of Climate Change of the International Campus of Excellence of Moncloa, funded by MECD and MICINN.

Author information

Authors and Affiliations

Group of Analysis, Security and Systems (GASS), Department of Software Engineering and Artificial Intelligence (DISIA), Faculty of Information Technology and Computer Science, Office 431, Universidad Complutense de Madrid (UCM), Calle Profesor José García Santesmases 9, Ciudad Universitaria, 28040, Madrid, Spain
Fernando Turrado García, Ana Lucila Sandoval Orozco & Luis Javier García Villalba

Authors

Fernando Turrado García
View author publications
You can also search for this author in PubMed Google Scholar
Ana Lucila Sandoval Orozco
View author publications
You can also search for this author in PubMed Google Scholar
Luis Javier García Villalba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luis Javier García Villalba .

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, North Dakota State University, Fargo, North Dakota, USA
Samee U. Khan
School of Information Technologies, The University of Sydney, Sydney, New South Wales, Australia
Albert Y. Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Turrado García, F., Sandoval Orozco, A., García Villalba, L. (2015). Building Scalable Software for Data Centers: An Approach to Distributed Computing at Enterprise Level. In: Khan, S., Zomaya, A. (eds) Handbook on Data Centers. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2092-1_23

Download citation

DOI: https://doi.org/10.1007/978-1-4939-2092-1_23
Published: 17 March 2015
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2091-4
Online ISBN: 978-1-4939-2092-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics