Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata

Tamil Selvan, S.; Balamurugan, P.; Vijayakumar, M.

doi:10.1007/s10619-020-07319-6

Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata

Published: 05 January 2021

Volume 39, pages 855–872, (2021)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

S. Tamil Selvan¹,
P. Balamurugan² &
M. Vijayakumar³

271 Accesses
1 Citation
Explore all metrics

Abstract

With large volumes of data being generated in recent years and the inception of big data analytics on social media necessitates accurate user query processing with minimum time complexity. Several research works have been conducted in this area, to address accuracy and time complexity involved in query processing, in this work, Wald Adaptive Prefetched Boosting Classification based Czekanowski Similarity MapReduce (WAPBC–CSMR) technique is introduced. The WAPBC–CSMR technique uses the big dataset for processing large number of user queries. First, a technique called, Wald Adaptive Prefetched Boosting is employed with the objective of classifying the big dataset into different classes. To reduce the time involved in classification, in this paper a classifier called Gaussian distributive Rocchio is used that achieves significant classification in minimum time. With the classified results, a Likelihood Radio Test is applied to integrate the weak learner results into strong classification results. Then the classified and refined data are stored on the prefetcher cache. Upon reception of multi-dimensional user queries by the prefetch manager, the queries are now split into multiple keywords and are fed into the map phase, where mapping function is performed using Czekanowski Similarity Index with the objective of identifying the repeated jobs with maximum query processing accuracy. Followed by which the relevant data are retrieved from the prefetcher cache and repeated user query task is removed in the reduce phase via statistical function, therefore contributing to minimum time. Result analysis of WAPBC–CSMR is performed with big dataset using different metrics such as query processing accuracy, error rate and processing time for varied number of user queries. The result shows that WAPBC–CSMR technique enhances query processing accuracy and lessens the time as well as the error rate than the conventional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

Big data preprocessing: methods and prospects

Article Open access 01 November 2016

Big data analytics: a survey

Article Open access 01 October 2015

References

Fathimabi, S., Subramanyam, R.B.V., Somayajulu, D.V.L.N.: MSP: multiple sub-graph query processing using structure-based graph partitioning strategy and map-reduce. J. King Saud Univ.-Comput. Inf. Sci. 31, 22–34 (2019)
Google Scholar
Shi, M., Shen, D., Nie, T., Kou, Y., Yu, G.: HPPQ: a parallel package queries processing approach for large-scale data. Big Data Min. Anal. 1(2), 146–159 (2018)
Article Google Scholar
Smys, S., Joe, C.V.: Big data business analytics as a strategic asset for health care industry. J. ISMAC 1(2), 92–100 (2019)
Google Scholar
Lee, K., Liu, L., Ganti, R.K., Srivatsa, M., Zhang, Q., Zho, Y.: Lightweight indexing and querying services for big spatial data. IEEE Trans. Serv. Comput. 12(3), 343–355 (2019)
Article Google Scholar
Wang, H., Qin, X., Zhou, X., Li, F., Qin, Z., Zhu, Q., Wang, S.: Efficient query processing framework for a big data warehouse: an almost join-free approach. Front. Comput. Sci. 9(2), 224–236 (2015)
Article MathSciNet Google Scholar
Karthiban, M.K., Raj, J.S.: Big data analytics for developing secure internet of everything. J. ISMAC 1(02), 129–136 (2019)
Google Scholar
Tang, Y., Wang, H.S.Q., Liu, X.: Handling multi-dimensional complex queries in key-value data stores. Inf. Syst. 66, 82–96 (2017)
Article Google Scholar
Birjali, M., Beni-Hssane, A., Erritali, M.: Evaluation of high-level query languages based on MapReduce in Big Data. J. Big Data 5, 1–21 (2018)
Article Google Scholar
Xiao, G., Li, K., Zhou, X., Li, K.: Efficient monochromatic and bichromatic probabilistic reverse top-k query processing for uncertain big data. J. Comput. Syst. Sci. 89, 92–113 (2017)
Article MathSciNet Google Scholar
Smys, S.: Energy-aware security routing protocol for WSN in big-data applications. J. ISMAC 1(01), 38–55 (2019)
Google Scholar
Kim, M., Liu, L., Choi, W.: A GPU-aware parallel index for processing high-dimensional big data. IEEE Trans. Comput. 67(10), 1388–1402 (2018)
Article MathSciNet Google Scholar
Fan, H., Ma, Z., Wang, D., Liu, J.: Handling distributed XML queries over large XML data based on MapReduce framework. Inf. Sci. 453, 1–20 (2018)
Article MathSciNet Google Scholar
Franciscus, N., Ren, X., Stantic, B.: Precomputing architecture for flexible and efficient big data analytics. Vietnam J. Comput. Sci. 5(2), 133–142 (2018)
Article Google Scholar
García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M.: Improving distance-join query processing with Voronoi-Diagram based partitioning in SpatialHadoop. Future Gener. Comput. Syst. 111, 723–740 (2020)
Article Google Scholar
Pandian, A.P.: Enhanced edge model for big data in the internet of things based applications. J. Trends Comput. Sci. Smart Technol. (TCSST) 1(1), 63–73 (2019)
Article Google Scholar
Al-Naami, K.M., Seker, S.E., Khan, L.: GISQAF: MapReduce guided spatial query processing and analytics system. Software 46(10), 1329–1349 (2016)
Google Scholar
Li, H., Yoo, J.: Efficient continuous skyline query processing scheme over large dynamic data sets. ETRI J. 38(6), 1197–1206 (2016)
Article Google Scholar
Sahal, R., Khafagy, M.H., Omara, F.A.: Exploiting coarse-grained reused-based opportunities in big data multi-query optimization. J. Comput. Sci. 26, 432–452 (2018)
Article Google Scholar
Joseph, S.I.T., Thanakumar, I.: Survey of data mining algorithm’s for intelligent computing system. J. Trends Comput. Sci. Smart Technol. (TCSST) 1(1), 14–24 (2019)
Article Google Scholar
Wang, Y., Xia, Y., Fang, Q., Xu, X.: AQP++: a hybrid approximate query processing framework for generalized aggregation queries. J. Comput. Sci. 26, 419–431 (2018)
Article MathSciNet Google Scholar
Kim, T., Li, W., Behma, A., Cetindila, I., Vernica, R., Borkar, V., Carey, M.J., Li, C.: Similarity query support in big data management systems. Inf. Syst. 88, 10455 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Erode Sengunthar Engineering College, Perundurai, Erode, India
S. Tamil Selvan
Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai, India
P. Balamurugan
Jai Shriram Engineering College, Tiruppur, India
M. Vijayakumar

Authors

S. Tamil Selvan
View author publications
You can also search for this author in PubMed Google Scholar
P. Balamurugan
View author publications
You can also search for this author in PubMed Google Scholar
M. Vijayakumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Tamil Selvan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tamil Selvan, S., Balamurugan, P. & Vijayakumar, M. Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata. Distrib Parallel Databases 39, 855–872 (2021). https://doi.org/10.1007/s10619-020-07319-6

Download citation

Accepted: 16 December 2020
Published: 05 January 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10619-020-07319-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Big data preprocessing: methods and prospects

Big data analytics: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Big data preprocessing: methods and prospects

Big data analytics: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation