Towards a framework for large-scale multimedia data storage and processing on Hadoop platform

Lai, Wei Kuang; Chen, Yi-Uan; Wu, Tin-Yu; Obaidat, Mohammad S.

doi:10.1007/s11227-013-1050-4

Towards a framework for large-scale multimedia data storage and processing on Hadoop platform

Published: 05 December 2013

Volume 68, pages 488–507, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Wei Kuang Lai¹,
Yi-Uan Chen¹,
Tin-Yu Wu² &
…
Mohammad S. Obaidat³

944 Accesses
21 Citations
Explore all metrics

Abstract

Cloud computing techniques take the form of distributed computing by utilizing multiple computers to execute computing simultaneously on the service side. To process the increasing quantity of multimedia data, numerous large-scale multimedia data storage computing techniques in the cloud computing have been developed. Of all the techniques, Hadoop plays a key role in the cloud computing. Hadoop, a computing cluster formed by low-priced hardware, can conduct the parallel computing of petabytes of multimedia data. Hadoop features high-reliability, high-efficiency, and high-scalability. The numerous large-scale multimedia data computing techniques include not only the key core techniques, Hadoop and MapReduce, but also the data collection techniques, such as File Transfer Protocol and Flume. In addition, distributed system configuration allocation, automatic installation, and monitoring platform building and management techniques are all included. As a result, only with the integration of all the techniques, a reliable large-scale multimedia data platform can be offered. In this paper, we introduce how cloud computing can make a breakthrough by proposing a multimedia social network dataset on Hadoop platform and implementing a prototype version. Detailed specifications and design issues are discussed as well. An important finding of this article is that we can save more time if we conduct the multimedia social networking analysis using Cloud Hadoop Platform rather than using a single computer. The advantages of cloud computing over the traditional data processing practices are fully demonstrated in this article. The applicable framework designs and the tools available for the large-scale data processing are also proposed. We show the experimental multimedia data including data sizes and processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Robust Cloud-Based Service Architecture for Multimedia Streaming Using Hadoop

A distributed multi-layer MEC-cloud architecture for processing large scale IoT-based multimedia applications

Article 17 December 2018

Big Data Computing with Distributed Computing Frameworks

References

Mundkur P, Tuulos V, Flatow J (2011) Disco: a computing platform for large-scale data analytics. In: Erlang’11: Proceedings of the 10th ACM SIGPLAN workshop on Erlang, New York, NY, USA, ACM, pp 84–89
HDFS Architecture Guide (2010) The Apache Software Foundation. http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_design.html
Taylor R (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinforma 11(Suppl 12):S1
Article Google Scholar
Ghemawat S, Gobioff H, Leung ST (2003) The Google File System. In: Proceedings of the 19th ACM Symposium on operating systems principles, pp. 20–43
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Google Scholar
Chang F et al (2006) Bigtable: a distributed storage system for structured data. In: 7th USENIX symposium on operating systems design and implementation (OSDI), pp 205–218
Welcome to Hadoop Common! (2010) Welcome to Hadoop Common!, The Apache Software Foundation. http://hadoop.apache.org/common/. Accessed 7 Sept 2010
Khuc V, Shivade C, Ramnath R, Ramanathan J (2012) Towards building large-scale distributed systems for twitter sentiment analysis. In: ACM symposium on applied, computing, pp 459–464
https://github.com/cloudera/flume
http://hbase.apache.org/
Liao I-J, Tin-Yu W (2011) A novel 3D streaming protocol supported multi mode display. J Netw Comput Appl 34(5):1509–1517
Google Scholar
Tin-Yu W, Chen C-Y, Kuo L-S, Lee W-T, Chao H-C (2012) Cloud-based image processing system with priority-based data distribution mechanism. Comput Commun 35(15):1809–1818
Google Scholar
Babu S (2010) Towards automatic optimization of MapReduce programs. In: Proceedings of the first ACM symposium on cloud computing (SOCC)
Stonebraker M, Abadi D, DeWitt DJ, Madden S, Pavlo A, Rasin A (October 2012) MapReduce and Parallel DBMSs: friends or foes. Commun ACM 53(1):64–71
http://www.hadoop.iponweb.net/Home/hdfs-over-ftp
Chaiken R, Jenkins B, Larson P-A, Ramsey B, Shakib D, Weaver S, Zhou J (2008) SCOPE: easy and efficient parallel processing of massive data sets. In: Proceedings of the very large database endowment, vol 1, no. 2, pp 1265–1276
http://ganglia.sourceforge.net/
Thoms E (1997) OLAP solutions: building multidimensional information systems, 2nd edn. Wiley, New York. ISBN 978- 0471149316
Lemire D (2007) Data warehousing and OLAP—a research-oriented bibliography. http://www.daniel-lemire.com/OLAP/
Lakshmanan LVS, Pei J, Zhao Y (2003) QCTrees: an efficient summary structure for semantic OLAP, SIGMOD
Eavis T, Taleb A (2012) Towards a scalable, performance-oriented OLAP storage engine. In: Lee SG, Peng Z, Zhou Z (eds) Database systems for advanced applications. Lecture Notes in Computer Science, vol 7239. Springer, Berlin, pp 185–202
Bennett C, Grossman RL, Locke D, Seidman J, Vejcik S (2010) MalStone: towards a benchmark for analytics on large data clouds. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Van der Aalst W (2012) Process mining: overview and opportunities. CM Trans Manag Inf Syst 3(2):7:1–7:17
Google Scholar
Apache Sqoop (TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. http://sqoop.apache.org/ (2013)
Mirajkar N, Barde M, Kamble H, Athale R, Singh K (2012) Implementation of private cloud using eucalyptus and an open source operating system. Int J Comput Sci Issues (IJCSI) 9(3):360–364
Google Scholar

Download references

Acknowledgments

Our deep gratitude goes to Chunghwa Telecom, Broadband network lab, which provided the research related platform for the test and evaluation of our research work.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan, ROC
Wei Kuang Lai & Yi-Uan Chen
Department of Computer Science and Information Engineering, National Ilan University, Ilan, Taiwan, ROC
Tin-Yu Wu
Department of Computer Science and Software Engineering, Monmouth University, W. Long Branch, Monmouth, NJ , 07764, USA
Mohammad S. Obaidat

Authors

Wei Kuang Lai
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Uan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tin-Yu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad S. Obaidat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tin-Yu Wu.

Additional information

Prof. Mohammad S. Obaidat is a Fellow of IEEE and Fellow of SCS.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, W.K., Chen, YU., Wu, TY. et al. Towards a framework for large-scale multimedia data storage and processing on Hadoop platform. J Supercomput 68, 488–507 (2014). https://doi.org/10.1007/s11227-013-1050-4

Download citation

Published: 05 December 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s11227-013-1050-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards a framework for large-scale multimedia data storage and processing on Hadoop platform

Abstract

Access this article

Similar content being viewed by others

A Robust Cloud-Based Service Architecture for Multimedia Streaming Using Hadoop

A distributed multi-layer MEC-cloud architecture for processing large scale IoT-based multimedia applications

Big Data Computing with Distributed Computing Frameworks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards a framework for large-scale multimedia data storage and processing on Hadoop platform

Abstract

Access this article

Similar content being viewed by others

A Robust Cloud-Based Service Architecture for Multimedia Streaming Using Hadoop

A distributed multi-layer MEC-cloud architecture for processing large scale IoT-based multimedia applications

Big Data Computing with Distributed Computing Frameworks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation