Skip to main content

CCCa Framework - Classification System in Big Data Environment with Clustering and Cache Concepts

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 614))

Abstract

The expansion of the data is swelling at an astonishing pace. The increasing usage of the digital technology massively increases the growth of the data generated by individuals or organizations/corporation produces big data. The big data environment generally uses the Map reduce framework which will take care of the job execution in Hadoop. Nowadays SPARK is becoming a popular framework which is written on top of the Hadoop framework to elevate the execution speed using runtime environment. A novel CCCa framework is proposed in this paper which includes the classification, clustering and cache techniques. This input data quality is improved by data cleansing activity. Similarity based clustering technique is involved to partition the job data into various clusters. Classification phase predicts the behavior of the data and artificial neural network (ANN) is applied for the classification of big data by means of the back propagation technique. The cache substitution technique is recommended to avoid the repetition of job processing. The proposed framework assures the consumption of less memory, computational time and achieved a higher level of accuracy and the prediction of the behavior of the dataset.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Leung, C.K.-S., Hayduk, Y.: Mining frequent patterns from uncertain data with MapReduce for Big Data analytics. In: Database Systems for Advanced Applications, pp. 440–455 (2013)

    Google Scholar 

  2. Shim, K.: MapReduce algorithms for big data analysis. Proc. VLDB Endow. 5, 2016–2017 (2012)

    Article  Google Scholar 

  3. Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data K-means clustering using MapReduce. J. Supercomput. 70, 1249–1259 (2014)

    Article  Google Scholar 

  4. Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: 2013 IEEE International Conference on Big Data, pp. 111–118 (2013)

    Google Scholar 

  5. Pal, A., Agrawal, S.: An experimental approach towards big data for analyzing memory utilization on a Hadoop cluster using HDFS and MapReduce. In: 2014 First International Conference on Networks & Soft Computing (ICNSC), pp. 442–447 (2014)

    Google Scholar 

  6. Evermann, J., Assadipour, G., Big Data meets process mining: implementing the alpha algorithm with map-reduce. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, pp. 1414–1416 (2014)

    Google Scholar 

  7. Triguero, I., Peralta, D., Bacardit, J., García, S., Herrera, F.: MRPR: A MapReduce solution for prototype reduction in big data classification. Neurocomputing 150, 331–345 (2015)

    Article  Google Scholar 

  8. Chai, H., Wu, G., Zhao, Y.: A document-based data warehousing approach for large scale data mining. In: Pervasive Computing and the Networked World, pp. 69–81. Springer (2013)

    Google Scholar 

  9. Patel, A.B., Birla, M., Nair, U.: Addressing big data problem using Hadoop and Map Reduce. In: Nirma University International Conference on Engineering (NUiCONE), pp. 1–5 (2012)

    Google Scholar 

  10. Chen, D., Shen, C., Feng, J., Le, J.: An efficient parallel Top-k similarity join for massive multidimensional data using spark. Int. J. Database Theory Appl. 8(3), 57–68 (2015). doi:10.14257/ijdta.2015.8.3.06

    Article  Google Scholar 

  11. Apache Spark. http://spark.apache.org/

  12. Xin, R.S., Rosen, J., Zaharia, M.: Shark: SQL and rich analytics at scale. In: Proceedings of the 2013 International Conference on Management of Data, pp. 13–24. ACM (2013)

    Google Scholar 

  13. Hu, R., Dou, W., Liu, J.: ClubCF: A Clustering-based Collaborative Filtering Approach for Big Data Application (2014)

    Google Scholar 

  14. De Francisci Morales, G.: SAMOA: a platform for mining big data streams. In: Proceedings of the 22nd International Conference on World Wide Web Companion, pp. 777–778 (2013)

    Google Scholar 

  15. Yan, W., Brahmakshatriya, U., Xue, Y., Gilder, M., Wise, B.: p-PIC: parallel power iteration clustering for big data. J. Parallel Distrib. Comput. 73, 352–359 (2013)

    Article  Google Scholar 

  16. Koutsoumpakis, G.: Spark-based Application for Abnormal Log Detection. IT 14 057, Examensarbete 30 hp, Uppsala Universitet, September 2014

    Google Scholar 

  17. Jin, C., et al.: A Scalable Hierarchical Clustering Algorithm Using Spark. Northwestern University Evanston, IL 60208, April 2015

    Google Scholar 

  18. Hu, X., et al.: MUSE: asset risk scoring in enterprise network with mutually reinforced reputation propagation. EURASIP J. Inf. Secur., 17 (2014). http://jis.eurasipjournals.com/content/2014/1/17

  19. Yan, Y., et al.: Is Apache Spark Scalable to Seismic Data Analytics and Computations? November 2015

    Google Scholar 

  20. Sabitha, M.S., et al.: Rule Based Data Purification (RuBDaP) model for big data environment. Int. J. Eng. Res. Online 3(6), 528–534 (2015). ISSN: 2321-7758

    Google Scholar 

  21. Saravanan, K., Sasithra, S.: Review on classification based on artificial neural networks. Int. J. Ambient Syst. Appl. (IJASA) 2(4), December 2014. doi:10.5121/ijasa.2014.2402

  22. Arif, M., et al.: Application of data mining using artificial neural network: survey. Int. J. Database Theory Appl. 8(1), 245–270 (2015)

    Article  Google Scholar 

  23. Pradhan, G., et al.: Design of Simple ANN (SANN) model for data classification and its performance comparison with FLANN (Functional Link ANN). IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(10), 105–115 (2009)

    Google Scholar 

  24. Khatri, M.: A survey of naïve bayesian algorithms for similarity in recommendation systems. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(5), 217–219, (2012). ISSN: 2277 128X

    Google Scholar 

  25. Que, Q., Belkin, M.: Back to the future: radial basis function networks revisited. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR: W&CP, Cadiz, Spain, vol. 51 (2016)

    Google Scholar 

  26. https://en.wikipedia.org/wiki/Cohen’s_kappa

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sabitha Malli Subramanian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Subramanian, S.M., Vijayalakshmi, S., Venkataraman, B., Venkumar, P., Rathikaa Sre, R.M. (2018). CCCa Framework - Classification System in Big Data Environment with Clustering and Cache Concepts. In: Abraham, A., Cherukuri, A., Madureira, A., Muda, A. (eds) Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016). SoCPaR 2016. Advances in Intelligent Systems and Computing, vol 614. Springer, Cham. https://doi.org/10.1007/978-3-319-60618-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60618-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60617-0

  • Online ISBN: 978-3-319-60618-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics