Abstract
Big data can be defined as a large collection of data that it is difficult to process due to its size or complexity. In 2001 Doug Laney, a META Group (now Gartner) analyst, published a research report defining 3 dimensions that characterize big-data problems: Volume, Variety and Velocity (also known as 3V’s). The original report can be found at the garter site [1].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Laney, D.: 3D Data Management Controlling-Data Volume, Velocity and Variety (February 2001)
LSST Corporation: LSST and Technology Innovation (2013)
Google Corporation: Waze Champs Meetup at Waze HQ. http://blog.waze.com/ (2013)
Bollen, J., Mao, H., Zeng, X.: Twitter Mood Predicts the Stock Market. Journal of Computational Science 2(1) (2011) 1–8
Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting Elections with Twitter: What 140 Characters Reveal About Political Sentiment. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media. (May 23–26 2010) 178–185
O’Connor, B., Balasubramanyan, R., Routedge, B., Smith, N.: From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media. (May 23–26 2010) 122–129
Golder, S.A., Macy, M.W.: Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures. Science 333(6051) (may 2002) 1878–1881
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A View of Cloud Computing. Communications of the ACM 53(4) (April 2010) 50–58
Dorband, J.E., Raytheon, J.P., Ranawake, U.: “Commodity Computing Clusters at Goddard Space Flight Center”. Online journal of space communication, School of Media Arts and Studies Scripps College of Communication, Ohio University (2013)
Neuman, B.C.: Scale in Distributed Systems. Readings in Distributed Computing Systems (1994) 463–489
Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of reusable Object-Oriented Software. Addison-Wesley (1994)
Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: Pattern Oriented Software Architecture: A System of Patterns. Volume 1. J. Willey (1999)
Fowler, M.: Analysis Patterns: Reusable Object Models. 1 edn. Addison-Wesley Professional (1996)
Schmidt, D., Stal, M., Rohnert, H., Buschmann, F.: Pattern-oriented Software Architecture: Patterns for Concurrent and Networked Objects. Volume 2. J. Willey (2000)
S. Ishikawa, M.S.: Pattern Language: Towns, Buildings, Construction. Oxford University Press (1977)
Buschmann, F., Henney, K., Schmidt, D.C.: Pattern-Oriented Software Architecture: On Patterns and Pattern Languages. Volume 5. J. Willey (April 2007)
Buschmann, F., Henney, K., Schmidt, D.C.: Pattern-Oriented Software Architecture: A Pattern Language for Distributed Computing. Volume 4. J. Willey (2007)
OPL Working Group, B.U.: A Pattern Language for Parallel Programming ver2.0 (2013)
Fayad, M., Schmidt, D.C.: Object-Oriented Application Frameworks. Communications of the ACM 40(10) (October 1997) 32–38
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implemention, San Francisco, CA (December 2004) 1–13
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press (2011)
Thusoo, A.: Hive - A Petabyte Scale Data Warehouse using Hadoop. https://www.facebook.com/note.php?note_id=89508453919 (2009)
O’Malley, O., Murthy, A.: Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds. http://developer.yahoo.com/blogs/hadoop/hadoop-sorts-petabyte-16-25-hours-terabyte-62-422.html (2013)
Apache Software Foundation: Hive™. http://hive.apache.org/ (2013)
Apache Software Foundation: Pig™. http://pig.apache.org/ (2013)
Microsoft Corporation: DryadLINQ™. http://research.microsoft.com/en-us/projects/dryadlinq/ (2013)
MongoDB Inc: MongoDB™. http://www.mongodb.org/ (2013)
Apache Software Foundation: HBase™. http://hbase.apache.org/ (2013)
Apache Software Foundation: Cassandra™. http://cassandra.apache.org/ (2013)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-Scale Data Analysis. In: Proceedings of the International Conference on Management of Data, New York, NY, USA, ACM (August 2009) 165–178
Dean, J., Ghemawat, S.: MapReduce: A Flexible Data Processing Tool. Communications of the ACM 53(1) (January 2010) 72–77
Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and Parallel DBMSs: Friends or Foes? Communications of the ACM 53(1) (January 2010) 64–71
Apache Software Foundation: Hadoop. http://hadoop.apache.org/ (2013)
Singh, S.: Hadoop at Yahoo!: More Than Ever Before. http://developer.yahoo.com/blogs/hadoop/hadoop-yahoo-more-ever-095826045.html/ (2013)
Apache Software Foundation: Hadoop Wiki. http://wiki.apache.org/hadoop/PoweredBy/ (2013)
Apache Software Foundation: HDFS Users Guide. http://hadoop.apache.org/docs/stable/ hdfs_user_guide.html/ (2013)
NetApp Corporation: Open Solution for Hadoop. http://www.netapp.com/us/solutions/big-data/hadoop.aspx/ (2013)
MapR Technologies, Inc: MapR™ Distribution for Apache Hadoop Advantages. http://www.mapr.com/products/why-mapr/ (2013)
Gupta, K., Jain, R., Koltsidas, I., Pucha, H., Sarkar, P., Seaman, M., Subhraveti, D.: GPFS-SNC: An Enterprise Storage Framework for Virtual-Machine Clouds. IBM Journal of Research and Development 55(6) (December 2011) 2:1–2:10
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A Distributed Storage System for Structured Data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, Berkeley, CA, USA, USENIX Association (November 2006) 15–15
Apache Software Foundation: Apache HBase™. http://hbase.apache.org/acid-semantics.html/ (2013)
Apache Software Foundation: Pig Latin Basics. http://pig.apache.org/docs/r0.11.1/basic.html/ (2013)
Apache Software Foundation: LanguageManual DDL. https://cwiki.apache.org/confluence/ display/Hive/ LanguageManual+DDL/ (2013)
Apache Software Foundation: LanguageManual DML. https://cwiki.apache.org/confluence/ display/Hive/ LanguageManual+DML/ (2013)
Snir, M.: A Compilation of Parallel Patterns http://www.cs.uiuc.edu/homes/snir/PPP/ (2013)
Apache Software Foundation: Apache 2.0 license: http://www.apache.org/licenses/LICENSE-2.0.txt
Acknowledgment
Part of the computations of this work were performed in EOLO, the HPC of Climate Change of the International Campus of Excellence of Moncloa, funded by MECD and MICINN.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this chapter
Cite this chapter
Turrado García, F., Sandoval Orozco, A., García Villalba, L. (2015). Building Scalable Software for Data Centers: An Approach to Distributed Computing at Enterprise Level. In: Khan, S., Zomaya, A. (eds) Handbook on Data Centers. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2092-1_23
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2092-1_23
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2091-4
Online ISBN: 978-1-4939-2092-1
eBook Packages: Computer ScienceComputer Science (R0)