Abstract
Cloud computing techniques take the form of distributed computing by utilizing multiple computers to execute computing simultaneously on the service side. To process the increasing quantity of multimedia data, numerous large-scale multimedia data storage computing techniques in the cloud computing have been developed. Of all the techniques, Hadoop plays a key role in the cloud computing. Hadoop, a computing cluster formed by low-priced hardware, can conduct the parallel computing of petabytes of multimedia data. Hadoop features high-reliability, high-efficiency, and high-scalability. The numerous large-scale multimedia data computing techniques include not only the key core techniques, Hadoop and MapReduce, but also the data collection techniques, such as File Transfer Protocol and Flume. In addition, distributed system configuration allocation, automatic installation, and monitoring platform building and management techniques are all included. As a result, only with the integration of all the techniques, a reliable large-scale multimedia data platform can be offered. In this paper, we introduce how cloud computing can make a breakthrough by proposing a multimedia social network dataset on Hadoop platform and implementing a prototype version. Detailed specifications and design issues are discussed as well. An important finding of this article is that we can save more time if we conduct the multimedia social networking analysis using Cloud Hadoop Platform rather than using a single computer. The advantages of cloud computing over the traditional data processing practices are fully demonstrated in this article. The applicable framework designs and the tools available for the large-scale data processing are also proposed. We show the experimental multimedia data including data sizes and processing time.
Similar content being viewed by others
References
Mundkur P, Tuulos V, Flatow J (2011) Disco: a computing platform for large-scale data analytics. In: Erlang’11: Proceedings of the 10th ACM SIGPLAN workshop on Erlang, New York, NY, USA, ACM, pp 84–89
HDFS Architecture Guide (2010) The Apache Software Foundation. http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_design.html
Taylor R (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinforma 11(Suppl 12):S1
Ghemawat S, Gobioff H, Leung ST (2003) The Google File System. In: Proceedings of the 19th ACM Symposium on operating systems principles, pp. 20–43
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Chang F et al (2006) Bigtable: a distributed storage system for structured data. In: 7th USENIX symposium on operating systems design and implementation (OSDI), pp 205–218
Welcome to Hadoop Common! (2010) Welcome to Hadoop Common!, The Apache Software Foundation. http://hadoop.apache.org/common/. Accessed 7 Sept 2010
Khuc V, Shivade C, Ramnath R, Ramanathan J (2012) Towards building large-scale distributed systems for twitter sentiment analysis. In: ACM symposium on applied, computing, pp 459–464
Liao I-J, Tin-Yu W (2011) A novel 3D streaming protocol supported multi mode display. J Netw Comput Appl 34(5):1509–1517
Tin-Yu W, Chen C-Y, Kuo L-S, Lee W-T, Chao H-C (2012) Cloud-based image processing system with priority-based data distribution mechanism. Comput Commun 35(15):1809–1818
Babu S (2010) Towards automatic optimization of MapReduce programs. In: Proceedings of the first ACM symposium on cloud computing (SOCC)
Stonebraker M, Abadi D, DeWitt DJ, Madden S, Pavlo A, Rasin A (October 2012) MapReduce and Parallel DBMSs: friends or foes. Commun ACM 53(1):64–71
Chaiken R, Jenkins B, Larson P-A, Ramsey B, Shakib D, Weaver S, Zhou J (2008) SCOPE: easy and efficient parallel processing of massive data sets. In: Proceedings of the very large database endowment, vol 1, no. 2, pp 1265–1276
Thoms E (1997) OLAP solutions: building multidimensional information systems, 2nd edn. Wiley, New York. ISBN 978- 0471149316
Lemire D (2007) Data warehousing and OLAP—a research-oriented bibliography. http://www.daniel-lemire.com/OLAP/
Lakshmanan LVS, Pei J, Zhao Y (2003) QCTrees: an efficient summary structure for semantic OLAP, SIGMOD
Eavis T, Taleb A (2012) Towards a scalable, performance-oriented OLAP storage engine. In: Lee SG, Peng Z, Zhou Z (eds) Database systems for advanced applications. Lecture Notes in Computer Science, vol 7239. Springer, Berlin, pp 185–202
Bennett C, Grossman RL, Locke D, Seidman J, Vejcik S (2010) MalStone: towards a benchmark for analytics on large data clouds. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Van der Aalst W (2012) Process mining: overview and opportunities. CM Trans Manag Inf Syst 3(2):7:1–7:17
Apache Sqoop (TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. http://sqoop.apache.org/ (2013)
Mirajkar N, Barde M, Kamble H, Athale R, Singh K (2012) Implementation of private cloud using eucalyptus and an open source operating system. Int J Comput Sci Issues (IJCSI) 9(3):360–364
Acknowledgments
Our deep gratitude goes to Chunghwa Telecom, Broadband network lab, which provided the research related platform for the test and evaluation of our research work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Prof. Mohammad S. Obaidat is a Fellow of IEEE and Fellow of SCS.
Rights and permissions
About this article
Cite this article
Lai, W.K., Chen, YU., Wu, TY. et al. Towards a framework for large-scale multimedia data storage and processing on Hadoop platform. J Supercomput 68, 488–507 (2014). https://doi.org/10.1007/s11227-013-1050-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-1050-4