Skip to main content
Log in

Towards a framework for large-scale multimedia data storage and processing on Hadoop platform

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Cloud computing techniques take the form of distributed computing by utilizing multiple computers to execute computing simultaneously on the service side. To process the increasing quantity of multimedia data, numerous large-scale multimedia data storage computing techniques in the cloud computing have been developed. Of all the techniques, Hadoop plays a key role in the cloud computing. Hadoop, a computing cluster formed by low-priced hardware, can conduct the parallel computing of petabytes of multimedia data. Hadoop features high-reliability, high-efficiency, and high-scalability. The numerous large-scale multimedia data computing techniques include not only the key core techniques, Hadoop and MapReduce, but also the data collection techniques, such as File Transfer Protocol and Flume. In addition, distributed system configuration allocation, automatic installation, and monitoring platform building and management techniques are all included. As a result, only with the integration of all the techniques, a reliable large-scale multimedia data platform can be offered. In this paper, we introduce how cloud computing can make a breakthrough by proposing a multimedia social network dataset on Hadoop platform and implementing a prototype version. Detailed specifications and design issues are discussed as well. An important finding of this article is that we can save more time if we conduct the multimedia social networking analysis using Cloud Hadoop Platform rather than using a single computer. The advantages of cloud computing over the traditional data processing practices are fully demonstrated in this article. The applicable framework designs and the tools available for the large-scale data processing are also proposed. We show the experimental multimedia data including data sizes and processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Mundkur P, Tuulos V, Flatow J (2011) Disco: a computing platform for large-scale data analytics. In: Erlang’11: Proceedings of the 10th ACM SIGPLAN workshop on Erlang, New York, NY, USA, ACM, pp 84–89

  2. HDFS Architecture Guide (2010) The Apache Software Foundation. http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_design.html

  3. Taylor R (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinforma 11(Suppl 12):S1

    Article  Google Scholar 

  4. Ghemawat S, Gobioff H, Leung ST (2003) The Google File System. In: Proceedings of the 19th ACM Symposium on operating systems principles, pp. 20–43

  5. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Google Scholar 

  6. Chang F et al (2006) Bigtable: a distributed storage system for structured data. In: 7th USENIX symposium on operating systems design and implementation (OSDI), pp 205–218

  7. Welcome to Hadoop Common! (2010) Welcome to Hadoop Common!, The Apache Software Foundation. http://hadoop.apache.org/common/. Accessed 7 Sept 2010

  8. Khuc V, Shivade C, Ramnath R, Ramanathan J (2012) Towards building large-scale distributed systems for twitter sentiment analysis. In: ACM symposium on applied, computing, pp 459–464

  9. https://github.com/cloudera/flume

  10. http://hbase.apache.org/

  11. Liao I-J, Tin-Yu W (2011) A novel 3D streaming protocol supported multi mode display. J Netw Comput Appl 34(5):1509–1517

    Google Scholar 

  12. Tin-Yu W, Chen C-Y, Kuo L-S, Lee W-T, Chao H-C (2012) Cloud-based image processing system with priority-based data distribution mechanism. Comput Commun 35(15):1809–1818

    Google Scholar 

  13. Babu S (2010) Towards automatic optimization of MapReduce programs. In: Proceedings of the first ACM symposium on cloud computing (SOCC)

  14. Stonebraker M, Abadi D, DeWitt DJ, Madden S, Pavlo A, Rasin A (October 2012) MapReduce and Parallel DBMSs: friends or foes. Commun ACM 53(1):64–71

  15. http://www.hadoop.iponweb.net/Home/hdfs-over-ftp

  16. Chaiken R, Jenkins B, Larson P-A, Ramsey B, Shakib D, Weaver S, Zhou J (2008) SCOPE: easy and efficient parallel processing of massive data sets. In: Proceedings of the very large database endowment, vol 1, no. 2, pp 1265–1276

  17. http://ganglia.sourceforge.net/

  18. Thoms E (1997) OLAP solutions: building multidimensional information systems, 2nd edn. Wiley, New York. ISBN 978- 0471149316

  19. Lemire D (2007) Data warehousing and OLAP—a research-oriented bibliography. http://www.daniel-lemire.com/OLAP/

  20. Lakshmanan LVS, Pei J, Zhao Y (2003) QCTrees: an efficient summary structure for semantic OLAP, SIGMOD

  21. Eavis T, Taleb A (2012) Towards a scalable, performance-oriented OLAP storage engine. In: Lee SG, Peng Z, Zhou Z (eds) Database systems for advanced applications. Lecture Notes in Computer Science, vol 7239. Springer, Berlin, pp 185–202

  22. Bennett C, Grossman RL, Locke D, Seidman J, Vejcik S (2010) MalStone: towards a benchmark for analytics on large data clouds. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)

  23. Van der Aalst W (2012) Process mining: overview and opportunities. CM Trans Manag Inf Syst 3(2):7:1–7:17

    Google Scholar 

  24. Apache Sqoop (TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. http://sqoop.apache.org/ (2013)

  25. Mirajkar N, Barde M, Kamble H, Athale R, Singh K (2012) Implementation of private cloud using eucalyptus and an open source operating system. Int J Comput Sci Issues (IJCSI) 9(3):360–364

    Google Scholar 

Download references

Acknowledgments

Our deep gratitude goes to Chunghwa Telecom, Broadband network lab, which provided the research related platform for the test and evaluation of our research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tin-Yu Wu.

Additional information

Prof. Mohammad S. Obaidat is a Fellow of IEEE and Fellow of SCS.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, W.K., Chen, YU., Wu, TY. et al. Towards a framework for large-scale multimedia data storage and processing on Hadoop platform. J Supercomput 68, 488–507 (2014). https://doi.org/10.1007/s11227-013-1050-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-1050-4

Keywords

Navigation