Modeling SmallClient indexing framework for big data analytics

Siddiqa, Aisha; Karim, Ahmad; Chang, Victor

doi:10.1007/s11227-017-2052-4

Modeling SmallClient indexing framework for big data analytics

Published: 18 April 2017

Volume 74, pages 5241–5262, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Aisha Siddiqa¹,
Ahmad Karim² &
Victor Chang³

316 Accesses
3 Citations
Explore all metrics

Abstract

Continually growing big data by the intervention of electronic and automated devices affects the data retrieval performance of contemporary big data analytics technologies and makes exploration and adoption of improved procedures inevitable. Indexing on big data facilitates analytics in a way that it can store, process, access and analyze given data sets quickly and more efficiently once properly designed. This paper aims to propose a novel mathematical model that introduces an indexing mechanism and ensures improved data retrieval performance on data sets with support to growing volume of big data. The model is composed of three modules: block creation, index creation and query execution. Block creation module improves records access performance while avoiding remote access delays. Index creation module allows maximum possible indexes for big data with minimized indexing overhead. Query execution module performs data search and retrieval operation on user search queries. The evaluation of proposed mathematical model ensures that search performance for both small and big data sets is improved with minimized overhead of data uploading and indexing time. We further verify the results by implementing SmallClient logic on four-node physical cluster that prove the improved performance of proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc VLDB Endow 2(1):922–933
Article Google Scholar
Al-Shablan M, Tian Y, Al-Rodhaan M (2016) Secure multi-owner-based cloud computing scheme for big data. Int J Big Data Intell 3(3):182–189
Article Google Scholar
Aye KN, Thein T (2015) A platform for big data analytics on distributed scale-out storage system. Int J Big Data Intell 2(2):127–141
Article Google Scholar
Borthakur D (2008) HDFS architecture guide. Hadoop Apache Project http://hadoop.apache.org/common/docs/current/hdfsdesign.pdf
Chang V (2015) Towards a Big Data system disaster recovery in a Private Cloud. Ad Hoc Netw 35:65–82
Article Google Scholar
Chang V, Ramachandran M (2016) Towards achieving data security with the cloud computing adoption framework. IEEE Trans Serv Comput 9(1):138–151
Article Google Scholar
Chang V, Wills G (2016) A model to compare cloud and non-cloud storage of Big Data. Future Gener Comput Syst 57:56–76
Article Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Dittrich J, Quiané-Ruiz J-A, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc VLDB Endow 3(1–2):515–529
Article Google Scholar
Dittrich J, Quiané-Ruiz J-A, Richter S, Schuh S, Jindal A, Schad J (2012) Only aggressive elephants are fast elephants. Proc VLDB Endow 5(11):1591–1602
Article Google Scholar
Eldawy A, Mokbel MF (2015). SpatialHadoop: a MapReduce framework for spatial data. In: Proceedings of the IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea
Gani A, Siddiqa A, Shamshirband S, Hanum F (2016) A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl Inf Syst 46(2):241–284
Article Google Scholar
Gospodnetic O, Hatcher E (2005) Lucene, Manning, pp 1–421
Hagos DH (2016) Software-defined networking for scalable cloud-based services to improve system performance of Hadoop-based big data applications. Int J Grid High Perform Comput (IJGHPC) 8(2):1–22
Article Google Scholar
Idreos S, Kersten ML Manegold S (2007) Database cracking. In: CIDR, vol. 7, pp 68–78
Jensen K, Kristensen LM, Wells L (2007) Coloured Petri Nets and CPN tools for modelling and validation of concurrent systems. Int J Softw Tools Technol Transf 9(3):213–254
Article Google Scholar
Jin R, Cho H-J, Chung T-S (2014) A group round robin based b-tree index storage scheme for flash memory devices. In: Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication. Siem Reap, Cambodia, ACM, pp 1–6
Kambatla K, Kollias G, Kumar V, Grama A (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573
Article Google Scholar
Kaushik VD, Umarani J, Gupta AK, Gupta AK, Gupta P (2013) An efficient indexing scheme for face database using modified geometric hashing. Neurocomputing 116:208–221
Article Google Scholar
Lam C (2010) Hadoop in action. Manning Publications Co, Greenwich
Google Scholar
Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M (2009) A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, ACM
Richter S, Quiané-Ruiz J-A, Schuh S, Dittrich J (2012) Towards zero-overhead adaptive indexing in Hadoop. arXiv preprint arXiv:1212.3480
Richter S, Quiané-Ruiz J-A, Schuh S, Dittrich J (2014) Towards zero-overhead static and adaptive indexing in Hadoop. VLDB J 23(3):469–494
Article Google Scholar
Sadasivam GS, Subrahmanyam M, Himachalam D, Pinnamaneni BP, Lakshme SM (2016) Corporate governance fraud detection from annual reports using big data analytics. Int J Big Data Intell 3(1):51–60
Article Google Scholar
Schuh S, Dittrich J (2015) AIR: adaptive index replacement in Hadoop. In: 2015 31st IEEE International Conference on Data Engineering Workshops (ICDEW), pp 22–29
Shvachko K, Hairong K, Radia S, Chansler R (2010) 2010 IEEE 26th Symposium on the Hadoop Distributed File System, Mass Storage Systems and Technologies (MSST)
Siddiqa A, Karim A, Abdullah G (2016) Big data storage technologies: a survey. Frontiers of Information Technology & Electronic Engineering (FITEE)
Siddiqa A, Karim A, Chang V (2016) SmallClient for big data: an indexing framework towards fast data retrieval. Cluster Computing 1–16
Suthaharan S (2016) In: Machine learning models and algorithms for big data classification: thinking with examples for effective learning, vol 36. Springer, pp 31–75
Vera-Baquero A, Colomo-Palacios R, Molloy O (2015) Measuring and querying process performance in supply chains: an approach for mining big-data cloud storages. Procedia Comput Sci 64:1026–1034
Article Google Scholar
Wang M, Holub V, Murphy J, O’Sullivan P (2013) High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring. Future Gener Comput Syst 29(8):1943–1962
Article Google Scholar
Zhang J, Huang ML (2016) Data behaviours model for Big Data visual analytics. Int J Big Data Intell 3(1):1–17
Article MathSciNet Google Scholar
Zhuang Y, Jiang N, Wu Z, Li Q, Chiu DKW, Hu H (2014) Efficient and robust large medical image retrieval in mobile cloud computing environment. Inf Sci 263:60–86
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Information Technology, University of Malaya, 50603, Kuala Lumpur, Malaysia
Aisha Siddiqa
Department of Information Technology, Bahauddin Zakariya University, Multan, 60000, Pakistan
Ahmad Karim
IBSS, Xi’an Jiaotong Liverpool University, Suzhou, 100044, China
Victor Chang

Authors

Aisha Siddiqa
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Karim
View author publications
You can also search for this author in PubMed Google Scholar
Victor Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aisha Siddiqa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Siddiqa, A., Karim, A. & Chang, V. Modeling SmallClient indexing framework for big data analytics. J Supercomput 74, 5241–5262 (2018). https://doi.org/10.1007/s11227-017-2052-4

Download citation

Published: 18 April 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11227-017-2052-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling SmallClient indexing framework for big data analytics

Abstract

Access this article

Similar content being viewed by others

SmallClient for big data: an indexing framework towards fast data retrieval

Big Data Analytics: Partitioned B+-Tree-Based Indexing in MapReduce

A survey on indexing techniques for big data: taxonomy and performance evaluation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling SmallClient indexing framework for big data analytics

Abstract

Access this article

Similar content being viewed by others

SmallClient for big data: an indexing framework towards fast data retrieval

Big Data Analytics: Partitioned B+-Tree-Based Indexing in MapReduce

A survey on indexing techniques for big data: taxonomy and performance evaluation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation