Skip to main content
Log in

Modeling SmallClient indexing framework for big data analytics

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Continually growing big data by the intervention of electronic and automated devices affects the data retrieval performance of contemporary big data analytics technologies and makes exploration and adoption of improved procedures inevitable. Indexing on big data facilitates analytics in a way that it can store, process, access and analyze given data sets quickly and more efficiently once properly designed. This paper aims to propose a novel mathematical model that introduces an indexing mechanism and ensures improved data retrieval performance on data sets with support to growing volume of big data. The model is composed of three modules: block creation, index creation and query execution. Block creation module improves records access performance while avoiding remote access delays. Index creation module allows maximum possible indexes for big data with minimized indexing overhead. Query execution module performs data search and retrieval operation on user search queries. The evaluation of proposed mathematical model ensures that search performance for both small and big data sets is improved with minimized overhead of data uploading and indexing time. We further verify the results by implementing SmallClient logic on four-node physical cluster that prove the improved performance of proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc VLDB Endow 2(1):922–933

    Article  Google Scholar 

  2. Al-Shablan M, Tian Y, Al-Rodhaan M (2016) Secure multi-owner-based cloud computing scheme for big data. Int J Big Data Intell 3(3):182–189

    Article  Google Scholar 

  3. Aye KN, Thein T (2015) A platform for big data analytics on distributed scale-out storage system. Int J Big Data Intell 2(2):127–141

    Article  Google Scholar 

  4. Borthakur D (2008) HDFS architecture guide. Hadoop Apache Project http://hadoop.apache.org/common/docs/current/hdfsdesign.pdf

  5. Chang V (2015) Towards a Big Data system disaster recovery in a Private Cloud. Ad Hoc Netw 35:65–82

    Article  Google Scholar 

  6. Chang V, Ramachandran M (2016) Towards achieving data security with the cloud computing adoption framework. IEEE Trans Serv Comput 9(1):138–151

    Article  Google Scholar 

  7. Chang V, Wills G (2016) A model to compare cloud and non-cloud storage of Big Data. Future Gener Comput Syst 57:56–76

    Article  Google Scholar 

  8. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  9. Dittrich J, Quiané-Ruiz J-A, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc VLDB Endow 3(1–2):515–529

    Article  Google Scholar 

  10. Dittrich J, Quiané-Ruiz J-A, Richter S, Schuh S, Jindal A, Schad J (2012) Only aggressive elephants are fast elephants. Proc VLDB Endow 5(11):1591–1602

    Article  Google Scholar 

  11. Eldawy A, Mokbel MF (2015). SpatialHadoop: a MapReduce framework for spatial data. In: Proceedings of the IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea

  12. Gani A, Siddiqa A, Shamshirband S, Hanum F (2016) A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl Inf Syst 46(2):241–284

    Article  Google Scholar 

  13. Gospodnetic O, Hatcher E (2005) Lucene, Manning, pp 1–421

  14. Hagos DH (2016) Software-defined networking for scalable cloud-based services to improve system performance of Hadoop-based big data applications. Int J Grid High Perform Comput (IJGHPC) 8(2):1–22

    Article  Google Scholar 

  15. Idreos S, Kersten ML Manegold S (2007) Database cracking. In: CIDR, vol. 7, pp 68–78

  16. Jensen K, Kristensen LM, Wells L (2007) Coloured Petri Nets and CPN tools for modelling and validation of concurrent systems. Int J Softw Tools Technol Transf 9(3):213–254

    Article  Google Scholar 

  17. Jin R, Cho H-J, Chung T-S (2014) A group round robin based b-tree index storage scheme for flash memory devices. In: Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication. Siem Reap, Cambodia, ACM, pp 1–6

  18. Kambatla K, Kollias G, Kumar V, Grama A (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573

    Article  Google Scholar 

  19. Kaushik VD, Umarani J, Gupta AK, Gupta AK, Gupta P (2013) An efficient indexing scheme for face database using modified geometric hashing. Neurocomputing 116:208–221

    Article  Google Scholar 

  20. Lam C (2010) Hadoop in action. Manning Publications Co, Greenwich

    Google Scholar 

  21. Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M (2009) A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, ACM

  22. Richter S, Quiané-Ruiz J-A, Schuh S, Dittrich J (2012) Towards zero-overhead adaptive indexing in Hadoop. arXiv preprint arXiv:1212.3480

  23. Richter S, Quiané-Ruiz J-A, Schuh S, Dittrich J (2014) Towards zero-overhead static and adaptive indexing in Hadoop. VLDB J 23(3):469–494

    Article  Google Scholar 

  24. Sadasivam GS, Subrahmanyam M, Himachalam D, Pinnamaneni BP, Lakshme SM (2016) Corporate governance fraud detection from annual reports using big data analytics. Int J Big Data Intell 3(1):51–60

    Article  Google Scholar 

  25. Schuh S, Dittrich J (2015) AIR: adaptive index replacement in Hadoop. In: 2015 31st IEEE International Conference on Data Engineering Workshops (ICDEW), pp 22–29

  26. Shvachko K, Hairong K, Radia S, Chansler R (2010) 2010 IEEE 26th Symposium on the Hadoop Distributed File System, Mass Storage Systems and Technologies (MSST)

  27. Siddiqa A, Karim A, Abdullah G (2016) Big data storage technologies: a survey. Frontiers of Information Technology & Electronic Engineering (FITEE)

  28. Siddiqa A, Karim A, Chang V (2016) SmallClient for big data: an indexing framework towards fast data retrieval. Cluster Computing 1–16

  29. Suthaharan S (2016) In: Machine learning models and algorithms for big data classification: thinking with examples for effective learning, vol 36. Springer, pp 31–75

  30. Vera-Baquero A, Colomo-Palacios R, Molloy O (2015) Measuring and querying process performance in supply chains: an approach for mining big-data cloud storages. Procedia Comput Sci 64:1026–1034

    Article  Google Scholar 

  31. Wang M, Holub V, Murphy J, O’Sullivan P (2013) High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring. Future Gener Comput Syst 29(8):1943–1962

    Article  Google Scholar 

  32. Zhang J, Huang ML (2016) Data behaviours model for Big Data visual analytics. Int J Big Data Intell 3(1):1–17

    Article  MathSciNet  Google Scholar 

  33. Zhuang Y, Jiang N, Wu Z, Li Q, Chiu DKW, Hu H (2014) Efficient and robust large medical image retrieval in mobile cloud computing environment. Inf Sci 263:60–86

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aisha Siddiqa.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Siddiqa, A., Karim, A. & Chang, V. Modeling SmallClient indexing framework for big data analytics. J Supercomput 74, 5241–5262 (2018). https://doi.org/10.1007/s11227-017-2052-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2052-4

Keywords

Navigation