Abstract
Continually growing big data by the intervention of electronic and automated devices affects the data retrieval performance of contemporary big data analytics technologies and makes exploration and adoption of improved procedures inevitable. Indexing on big data facilitates analytics in a way that it can store, process, access and analyze given data sets quickly and more efficiently once properly designed. This paper aims to propose a novel mathematical model that introduces an indexing mechanism and ensures improved data retrieval performance on data sets with support to growing volume of big data. The model is composed of three modules: block creation, index creation and query execution. Block creation module improves records access performance while avoiding remote access delays. Index creation module allows maximum possible indexes for big data with minimized indexing overhead. Query execution module performs data search and retrieval operation on user search queries. The evaluation of proposed mathematical model ensures that search performance for both small and big data sets is improved with minimized overhead of data uploading and indexing time. We further verify the results by implementing SmallClient logic on four-node physical cluster that prove the improved performance of proposed approach.
Similar content being viewed by others
References
Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc VLDB Endow 2(1):922–933
Al-Shablan M, Tian Y, Al-Rodhaan M (2016) Secure multi-owner-based cloud computing scheme for big data. Int J Big Data Intell 3(3):182–189
Aye KN, Thein T (2015) A platform for big data analytics on distributed scale-out storage system. Int J Big Data Intell 2(2):127–141
Borthakur D (2008) HDFS architecture guide. Hadoop Apache Project http://hadoop.apache.org/common/docs/current/hdfsdesign.pdf
Chang V (2015) Towards a Big Data system disaster recovery in a Private Cloud. Ad Hoc Netw 35:65–82
Chang V, Ramachandran M (2016) Towards achieving data security with the cloud computing adoption framework. IEEE Trans Serv Comput 9(1):138–151
Chang V, Wills G (2016) A model to compare cloud and non-cloud storage of Big Data. Future Gener Comput Syst 57:56–76
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Dittrich J, Quiané-Ruiz J-A, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc VLDB Endow 3(1–2):515–529
Dittrich J, Quiané-Ruiz J-A, Richter S, Schuh S, Jindal A, Schad J (2012) Only aggressive elephants are fast elephants. Proc VLDB Endow 5(11):1591–1602
Eldawy A, Mokbel MF (2015). SpatialHadoop: a MapReduce framework for spatial data. In: Proceedings of the IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea
Gani A, Siddiqa A, Shamshirband S, Hanum F (2016) A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl Inf Syst 46(2):241–284
Gospodnetic O, Hatcher E (2005) Lucene, Manning, pp 1–421
Hagos DH (2016) Software-defined networking for scalable cloud-based services to improve system performance of Hadoop-based big data applications. Int J Grid High Perform Comput (IJGHPC) 8(2):1–22
Idreos S, Kersten ML Manegold S (2007) Database cracking. In: CIDR, vol. 7, pp 68–78
Jensen K, Kristensen LM, Wells L (2007) Coloured Petri Nets and CPN tools for modelling and validation of concurrent systems. Int J Softw Tools Technol Transf 9(3):213–254
Jin R, Cho H-J, Chung T-S (2014) A group round robin based b-tree index storage scheme for flash memory devices. In: Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication. Siem Reap, Cambodia, ACM, pp 1–6
Kambatla K, Kollias G, Kumar V, Grama A (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573
Kaushik VD, Umarani J, Gupta AK, Gupta AK, Gupta P (2013) An efficient indexing scheme for face database using modified geometric hashing. Neurocomputing 116:208–221
Lam C (2010) Hadoop in action. Manning Publications Co, Greenwich
Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M (2009) A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, ACM
Richter S, Quiané-Ruiz J-A, Schuh S, Dittrich J (2012) Towards zero-overhead adaptive indexing in Hadoop. arXiv preprint arXiv:1212.3480
Richter S, Quiané-Ruiz J-A, Schuh S, Dittrich J (2014) Towards zero-overhead static and adaptive indexing in Hadoop. VLDB J 23(3):469–494
Sadasivam GS, Subrahmanyam M, Himachalam D, Pinnamaneni BP, Lakshme SM (2016) Corporate governance fraud detection from annual reports using big data analytics. Int J Big Data Intell 3(1):51–60
Schuh S, Dittrich J (2015) AIR: adaptive index replacement in Hadoop. In: 2015 31st IEEE International Conference on Data Engineering Workshops (ICDEW), pp 22–29
Shvachko K, Hairong K, Radia S, Chansler R (2010) 2010 IEEE 26th Symposium on the Hadoop Distributed File System, Mass Storage Systems and Technologies (MSST)
Siddiqa A, Karim A, Abdullah G (2016) Big data storage technologies: a survey. Frontiers of Information Technology & Electronic Engineering (FITEE)
Siddiqa A, Karim A, Chang V (2016) SmallClient for big data: an indexing framework towards fast data retrieval. Cluster Computing 1–16
Suthaharan S (2016) In: Machine learning models and algorithms for big data classification: thinking with examples for effective learning, vol 36. Springer, pp 31–75
Vera-Baquero A, Colomo-Palacios R, Molloy O (2015) Measuring and querying process performance in supply chains: an approach for mining big-data cloud storages. Procedia Comput Sci 64:1026–1034
Wang M, Holub V, Murphy J, O’Sullivan P (2013) High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring. Future Gener Comput Syst 29(8):1943–1962
Zhang J, Huang ML (2016) Data behaviours model for Big Data visual analytics. Int J Big Data Intell 3(1):1–17
Zhuang Y, Jiang N, Wu Z, Li Q, Chiu DKW, Hu H (2014) Efficient and robust large medical image retrieval in mobile cloud computing environment. Inf Sci 263:60–86
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Siddiqa, A., Karim, A. & Chang, V. Modeling SmallClient indexing framework for big data analytics. J Supercomput 74, 5241–5262 (2018). https://doi.org/10.1007/s11227-017-2052-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-017-2052-4