ABSTRACT
The growth of massive data makes the real-time data service of meteorological forecast and climate analysis facing severe challenge. Distributed database is a good solution to meet the needs for massive multi-source heterogeneous meteorological data storage. Since the current mainstream HBase database fails to support the non-Rowkey query, the poor performance of the real-time query of meteorological data is unsatisfactory. To address this issue, three kinds of distributed data query optimization strategies are proposed in this paper, including query optimization based on secondary index, secondary index query optimization based on hotscore, and query optimization based on the Redis hot data caching strategy. The corresponding experimental results indicate that the search scheme based on the Redis hot data caching strategy has the best performance among the three schemes, not only can meet the needs of meteorological service query, but also can achieve 3-8 times efficiency enhancement than standard HBase.
- Gantz J, Reinsel D.The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east{J}. IDC iView: IDC Analyze the Future, 2012: 1--16.Google Scholar
- Shen Wen Hai. Discussion on the future infrastructure of meteorological service information system -- the role of "cloud computing" and "big data" in meteorological information technology {J}. Meteorological Science and Technology Progress, 2015 (3):64--66.Google Scholar
- Xiong Anyuan, Zhao Fang, Wang Ying, et al. Design and implementation of the national integrated meteorological information sharing system {J}. Journal of Applied Meteorology, 2015 (4): 500--512.Google Scholar
- Karun A K, Chitharanjan K. A review on hadoop-HDFS infrastructure extensions{C}//Information & Communication Technologies (ICT), 2013 IEEE Conference on. IEEE, 2013: 132--137.Google Scholar
- Taylor R C.An overview of the Hadoop MapReduce HBase framework and its current applications in bioinformatics{J}.Bmc Bioinformatics, 2010, 11(6):3395--3407.Google Scholar
- Bhupathiraju V, Ravuri R P. The dawn of big data-HBase{C}//IT in Business, Industry and Government (CSIBIG), 2014 Conference on.IEEE, 2014:1--4.Google Scholar
- Vashishtha H, Stroulia E. Enhancing query support in hbase via an extended coprocessors framework{C}//European Conference on a Service-Based Internet. Springer, Berlin, Heidelberg, 2011: 75--87. Google ScholarDigital Library
- Liu Xiaoli, Xu Pandeng, Zhu Guobin, et al. Parallel and distributed remote sensing images combined with MapReduce and HBase {J}. Geographic and Geographic Information Science, 2014, 30 (5): 26--28.Google Scholar
- Feng C, Li B. Research of Temporal Information Index Strategy Based on HBase{J}. Procedia Computer Science, 2017, 107:367--372. Google ScholarDigital Library
- Ma T, Xu X, Tang M, et al. MHBase: A Distributed Real-Time Query Scheme for Meteorological Data Based on HBase{J}. Future Internet, 2016, 8(1):6.Google ScholarCross Ref
- Ge Wei, Luo Shengmei, Zhou Wenhui, Zhao Di, Tang Yun, Zhou Juan, Qu Wen Wu, Yuan Chunfeng, Huang Yihua.HiBase: a hierarchical indexing based efficient HBase query technology and system {J}. Computer Journal, 2016, 39 (01): 140--153.Google Scholar
- Cui Chen, Zheng Linjiang, Han Fengping, et al. Design of HBase based two level index based on memory {J}. Computer Application, 2018: 1--8.Google Scholar
- Ge W, Huang Y, Zhao D, et al. Cinhba: A secondary index with hotscore caching policy on key-value data store{C}//International Conference on Advanced Data Mining and Applications. Springer, Cham, 2014: 602--615.Google Scholar
Index Terms
- Research on Distributed Storage and Query Optimization of Multi-source Heterogeneous Meteorological Data
Recommendations
Big data multi-query optimisation with Apache Flink
Big data analytic frameworks, such as MapReduce, Spark and Flink, have recently gained more popularity to process large data. Flink is an open-source Apache-hosted big data analytic framework for processing batch and streaming data. For historical data ...
Extending postgreSQL to support distributed/heterogeneous query processing
DASFAA'07: Proceedings of the 12th international conference on Database systems for advanced applicationsThe evolution from relational DBMS to data integration system brings new challenges to the design and implementation of query execution engine that must be extended to support queries over multiple distributed, heterogeneous, and autonomous data ...
Comments