Abstract
Real-time monitoring microblog data can find sensitive information in time and provide help for public sentiment management and control. However, it needs processing large-scale data stream. MapReduce is a framework of processing large-scale data in batch mode, its purpose is to increase throughput, but its real-time performance is limited. Aiming at the real-time performance limitation of MapReduce, RT-SSP (Real-Time Staged Stream Processing), a hybrid staged real-time stream processing scheme both for batch and real-time processing was proposed. By this method large-scale high-speed data stream is locally processed in stages, the communication cost is reduced by storing intermediate results to local node, and key technologies such as cache optimization are used to realize high concurrent read and write. Experiments show that RT-SSP scheme can improve the real-time performance of processing large-scale microblog data stream and achieve speed-up ratio of about 2.3.
Keywords
Foundation items: National Natural Science Foundation of China (No. 60970012); Natural Science Foundation of Shandong Province (No. ZR2013FL005). Author introduction: Yunpeng Cao (1967-), male, master, associate professor, main research directions include large-scale data processing and parallel computing. Haifeng Wang (1976-), male, associate professor, doctor, main research directions include network computing, large-scale data processing and cloud computing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cao, Y.: Monitoring large-scale microblog on GPUs. J. Comput. Inf. Syst. 10(15), 6493–6500 (2014)
Cao, Y., Wang, H.: The key optimal parallel technologies of processing large-scale micro-blog data on GPUs. J. Comput. Inf. Syst. 10(18), 7731–7738 (2014)
Abadi, D.J., Ahmad, Y., Balazinska, M., et al.: The design of the Borealis stream processing engine. In: Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR2005), pp. 277–289. Asilomar, USA (2005)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. ACM Commun. 51(1), 107–113 (2008)
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating MapReduce for multi-core, multiprocessor systems. In: Proceedings of the 13th International Conference on High-Performance Computer Architecture (HPCA2007), Phoenix, USA, pp. 13–24 (2007)
Qi, K., Han, Y., Zhao, Z., Ma, Q.: Real-time data stream processing and key techniques oriented to large-scalr sensor data. Comput. Integr. Manuf. Syst. 19(3), 641–653 (2013)
Condie, T., Conway, N., Alvaro, P., Helerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: Proceedings of the 7th USENIX Symposium on Networked Systems Design, Implementation (NSDI2010), San Jose, USA, pp. 313–328 (2010)
Neumeyer, L., Robbins, L., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Proceedings of the 10th IEEE International Conference on Data Mining Workshops (ICDMW2010), Sydney, Australia, pp. 170–177 (2010)
Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI2006). Seattle, USA, pp. 205–218 (2006)
Lubomir, F.B., Show, A.C.: Operation System Principles. Prentice Hall, New Jersey (2003)
DeCandia, G., Hastorun, D., Jampani, M., et al.: Dynamo: amazon’s highly available key-value store. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP2007). Stevenson, USA, pp. 205–220 (2007)
Qi, K., Han, Y., Zhao, Z., Fang, J.: MapReduce intermediate result cache for concurrent data stream processing. J. Comput. Res. Dev. 50(1), 111–121 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Cao, Y., Wang, H. (2015). The Key Technologies of Real-Time Processing Large Scale Microblog Data Stream. In: Qiang, W., Zheng, X., Hsu, CH. (eds) Cloud Computing and Big Data. CloudCom-Asia 2015. Lecture Notes in Computer Science(), vol 9106. Springer, Cham. https://doi.org/10.1007/978-3-319-28430-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-28430-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28429-3
Online ISBN: 978-3-319-28430-9
eBook Packages: Computer ScienceComputer Science (R0)