Abstract
The expansion of the data is swelling at an astonishing pace. The increasing usage of the digital technology massively increases the growth of the data generated by individuals or organizations/corporation produces big data. The big data environment generally uses the Map reduce framework which will take care of the job execution in Hadoop. Nowadays SPARK is becoming a popular framework which is written on top of the Hadoop framework to elevate the execution speed using runtime environment. A novel CCCa framework is proposed in this paper which includes the classification, clustering and cache techniques. This input data quality is improved by data cleansing activity. Similarity based clustering technique is involved to partition the job data into various clusters. Classification phase predicts the behavior of the data and artificial neural network (ANN) is applied for the classification of big data by means of the back propagation technique. The cache substitution technique is recommended to avoid the repetition of job processing. The proposed framework assures the consumption of less memory, computational time and achieved a higher level of accuracy and the prediction of the behavior of the dataset.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Leung, C.K.-S., Hayduk, Y.: Mining frequent patterns from uncertain data with MapReduce for Big Data analytics. In: Database Systems for Advanced Applications, pp. 440–455 (2013)
Shim, K.: MapReduce algorithms for big data analysis. Proc. VLDB Endow. 5, 2016–2017 (2012)
Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data K-means clustering using MapReduce. J. Supercomput. 70, 1249–1259 (2014)
Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: 2013 IEEE International Conference on Big Data, pp. 111–118 (2013)
Pal, A., Agrawal, S.: An experimental approach towards big data for analyzing memory utilization on a Hadoop cluster using HDFS and MapReduce. In: 2014 First International Conference on Networks & Soft Computing (ICNSC), pp. 442–447 (2014)
Evermann, J., Assadipour, G., Big Data meets process mining: implementing the alpha algorithm with map-reduce. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, pp. 1414–1416 (2014)
Triguero, I., Peralta, D., Bacardit, J., García, S., Herrera, F.: MRPR: A MapReduce solution for prototype reduction in big data classification. Neurocomputing 150, 331–345 (2015)
Chai, H., Wu, G., Zhao, Y.: A document-based data warehousing approach for large scale data mining. In: Pervasive Computing and the Networked World, pp. 69–81. Springer (2013)
Patel, A.B., Birla, M., Nair, U.: Addressing big data problem using Hadoop and Map Reduce. In: Nirma University International Conference on Engineering (NUiCONE), pp. 1–5 (2012)
Chen, D., Shen, C., Feng, J., Le, J.: An efficient parallel Top-k similarity join for massive multidimensional data using spark. Int. J. Database Theory Appl. 8(3), 57–68 (2015). doi:10.14257/ijdta.2015.8.3.06
Apache Spark. http://spark.apache.org/
Xin, R.S., Rosen, J., Zaharia, M.: Shark: SQL and rich analytics at scale. In: Proceedings of the 2013 International Conference on Management of Data, pp. 13–24. ACM (2013)
Hu, R., Dou, W., Liu, J.: ClubCF: A Clustering-based Collaborative Filtering Approach for Big Data Application (2014)
De Francisci Morales, G.: SAMOA: a platform for mining big data streams. In: Proceedings of the 22nd International Conference on World Wide Web Companion, pp. 777–778 (2013)
Yan, W., Brahmakshatriya, U., Xue, Y., Gilder, M., Wise, B.: p-PIC: parallel power iteration clustering for big data. J. Parallel Distrib. Comput. 73, 352–359 (2013)
Koutsoumpakis, G.: Spark-based Application for Abnormal Log Detection. IT 14 057, Examensarbete 30 hp, Uppsala Universitet, September 2014
Jin, C., et al.: A Scalable Hierarchical Clustering Algorithm Using Spark. Northwestern University Evanston, IL 60208, April 2015
Hu, X., et al.: MUSE: asset risk scoring in enterprise network with mutually reinforced reputation propagation. EURASIP J. Inf. Secur., 17 (2014). http://jis.eurasipjournals.com/content/2014/1/17
Yan, Y., et al.: Is Apache Spark Scalable to Seismic Data Analytics and Computations? November 2015
Sabitha, M.S., et al.: Rule Based Data Purification (RuBDaP) model for big data environment. Int. J. Eng. Res. Online 3(6), 528–534 (2015). ISSN: 2321-7758
Saravanan, K., Sasithra, S.: Review on classification based on artificial neural networks. Int. J. Ambient Syst. Appl. (IJASA) 2(4), December 2014. doi:10.5121/ijasa.2014.2402
Arif, M., et al.: Application of data mining using artificial neural network: survey. Int. J. Database Theory Appl. 8(1), 245–270 (2015)
Pradhan, G., et al.: Design of Simple ANN (SANN) model for data classification and its performance comparison with FLANN (Functional Link ANN). IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(10), 105–115 (2009)
Khatri, M.: A survey of naïve bayesian algorithms for similarity in recommendation systems. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(5), 217–219, (2012). ISSN: 2277 128X
Que, Q., Belkin, M.: Back to the future: radial basis function networks revisited. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR: W&CP, Cadiz, Spain, vol. 51 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Subramanian, S.M., Vijayalakshmi, S., Venkataraman, B., Venkumar, P., Rathikaa Sre, R.M. (2018). CCCa Framework - Classification System in Big Data Environment with Clustering and Cache Concepts. In: Abraham, A., Cherukuri, A., Madureira, A., Muda, A. (eds) Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016). SoCPaR 2016. Advances in Intelligent Systems and Computing, vol 614. Springer, Cham. https://doi.org/10.1007/978-3-319-60618-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-60618-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60617-0
Online ISBN: 978-3-319-60618-7
eBook Packages: EngineeringEngineering (R0)