Skip to main content
Log in

Fuzzy C-means for english sentiment classification in a distributed system

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Sentiment classification plays a significant role in everyday life, in political activities, in activities relating to commodity production, and commercial activities. Finding a solution for the accurate and timely classification of emotions is a challenging task. In this research, we propose a new model for big data sentiment classification in the parallel network environment. Our proposed model uses the Fuzzy C-Means (FCM) method for English sentiment classification with Hadoop MAP (M) /REDUCE (R) in Cloudera. Cloudera is a parallel network environment. Our proposed model can classify the sentiments of millions of English documents in the parallel network environment. We tested our model using the testing data set (which comprised 25,000 English reviews, 12,500 being positive and 12,500 negative) and achieved 60.2 % accuracy. Our English training data set has 60,000 English sentences, comprising 30,000 positive English sentences and 30,000 negative English sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Large movie review dataset (2016) http://ai.stanford.edu/~amaas/data/sentiment/

  2. Singh V K, Singh V K (2015) Vector space model: an information retrieval system. International Journal of Advanced Engineering Research and Studies

  3. Carrera-Trejo V, Sidorov G, Miranda-Jiménez S, Moreno Ibarra M, Cadena Martínez R (2015) Latent Dirichlet allocation complement in the vector space model for multi-label text classification. International Journal of Combinatorial Optimization Problems and Informatics 6(1):7–19

    Google Scholar 

  4. Soucy P, Mineau G W (2005) Beyond TFIDF weighting for text categorization in the vector space model. In: Proceedings of the 19th international joint conference on Artificial intelligence, USA, pp 1130–1135

  5. Hadoop (2016). http://hadoop.apache.org

  6. Apache (2016). http://apache.org

  7. Cloudera (2016). http://www.cloudera.com

  8. Ghaffari M, Ghadiri N (2016) Ambiguity-driven fuzzy C-means clustering: how to detect uncertain clustered records. Applied Intelligence (APIN):1–12

  9. RJ Hathaway J C, Bezdek Y H u (2000) Generalized fuzzy c-means clustering strategies using L/sub p/ norm distances. IEEE Trans Fuzzy Syst 8(5):576–582

    Article  Google Scholar 

  10. Tsao E C -K, Bezdek J C, Pal N R (1994) Fuzzy Kohonen clustering networks. Pattern Recogn 27 (5):757–764

    Article  Google Scholar 

  11. Hathaway R J, Bezdek J C (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern B (Cybern) 31(5):735–744

    Article  Google Scholar 

  12. Lim Y W, Lee S U (1990) On the color image segmentation algorithm based on the thresholding and the fuzzy c-means techniques. Pattern Recogn 23(9):935–952

    Article  Google Scholar 

  13. Bezdek J C, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  14. Pal N R, Bezdek J C (2002) On cluster validity for the fuzzy c-means model. IEEE Trans Fuzzy Syst 3 (3):370–379

    Article  Google Scholar 

  15. Pal N R, Pal K, Keller J M, Bezdek J C (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530

    Article  Google Scholar 

  16. Ahmed M N, Yamany S M, Mohamed N, Farag A A (2002) A modified fuzzy c-means algorithm for bias field estimation and segmentation of MRI data. IEEE Trans Med Imaging 21(3):193–199

    Article  Google Scholar 

  17. Cannon R L, Dave J V, Bezdek J C (2009) Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans Pattern Anal Mach Intell 8(2):248–255

    MATH  Google Scholar 

  18. Bezdek J C, Hathaway R J, Sabin M J, Tucker W T (1987) Convergence theory for fuzzy c-means: Counterexamples and repairs. IEEE Trans Syst Man Cybern 17(5):873–877

    Article  MATH  Google Scholar 

  19. Hathaway R J, Bezdek J C (1994) Nerf c-means: non-euclidean relational fuzzy clustering. Pattern Recogn 27(3):429–437

    Article  Google Scholar 

  20. D-Q Zhang S -C, Chen A (2004) Novel kernelized fuzzy C-means algorithm with application in medical image segmentation. Artif Intell Med 32(1):37–50

    Article  Google Scholar 

  21. Hathaway R J, Davenport J W, Bezdek J C (1989) Relational duals of the c-means clustering algorithms. Pattern Recogn 22(2):205–212

    Article  MathSciNet  MATH  Google Scholar 

  22. Chuang K-S, Tzeng H -L, Chena S, Wu J, Chen T -J (2006) Fuzzy c-means clustering with spatial information for image segmentation. Comput Med Imaging Graph 30(1):9–15

    Article  Google Scholar 

  23. Bahrampour S, Moshiri B, Salahshoor K (2011) Weighted and constrained possibilistic C-means clustering for online fault detection and isolation. Appl Intell (APIN) 35(2):269–284

    Article  Google Scholar 

  24. Zhang D-Q, Chen S -C (2003) Clustering incomplete data using kernel-based fuzzy c-means algorithm. Neural Process Lett 18(3):155–162

    Article  MathSciNet  Google Scholar 

  25. Hall L O, Bensaid A M, Clarke L P, Velthuizen R P (2002) A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Trans Neural Netw 3(5):672–682

    Article  Google Scholar 

  26. Kuo R J, Ho L M, Hu C M (2002) Integration of self-organizing feature map and K-means algorithm for market segmentation. Comput Oper Res 29(11):1475–1493

    Article  MATH  Google Scholar 

  27. Kwok T, Smith K, Lozano S, Taniar D (2002) Parallel Fuzzy c-Means Clustering for Large Data Sets, Euro-Par 2002 Parallel Processing, Volume 2400 of the series Lecture Notes in Computer Science, pp 365–374

  28. Xylogiannopoulos K F, Karampelas P, Alhajj R (2016) Repeated patterns detection in big data using classification and parallelism on LERP Reduced Suffix Arrays. Appl Intell (APIN):1–31

  29. Carns P H, Ligon III W B, Ross R B, Thakur R (2000) PVFS: A parallel file system for linux clusters. In: Proceedings of the extreme linux track: 4th annual linux showcase and conference

  30. Moyer S A, Sunderam V S (1994) PIOUS: a scalable parallel I/o system for distributed computing environments. In: Proceedings of the scalable high-performance computing conference

  31. Shirazi B A, Kavi K M, Hurson A R (1995) Scheduling and load balancing in parallel and distributed systems, scheduling and load balancing in parallel and distributed systems, USA

  32. Andrews G R (1999) Foundations of parallel and distributed programming. In: Foundations of parallel and distributed programming 1st, USA

  33. Gropp W, Lusk E, Doss N, Skjellum A (1996) A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput 22(6):789–828

    Article  MATH  Google Scholar 

  34. Yu Y, Isard M, Fetterly D, Budiu M, Erlingsson Ú, Gunda P K, Currey J (2008) dryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language symposium on operating system design and implementation (OSDI)

  35. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system

  36. Guerrero J M, Matas J, Garcia de Vicuna L, Castilla M, Miret J (2007) Decentralized control for parallel operation of distributed generation inverters using resistive output impedance. IEEE Trans Ind Electron 54:2

    Article  Google Scholar 

  37. van Steen M, Homburg P, Tanenbaum A S (1999) Globe: a wide-area distributed system. IEEE Concurr 7(1):70–78

    Article  Google Scholar 

  38. Shende S S, Malony A D (2006) The tau parallel performance system. Int J High Perform Comput Appl 20(2):287–311

    Article  Google Scholar 

  39. Bagrodia R, Meyer R, Takai M, Chen Y -A, Zeng X, Martin J, Song H Y (1998) Parsec: a parallel simulation environment for complex systems. Computer 31(10):77–85

    Article  Google Scholar 

  40. RumelHart D E, Hinton G E, McClelland J L (1986) A general framework for parallel distributed processing. In: Parallel distributed processing: explorations in the microstructure of cognition, USA, vol 1, pp 45–76

  41. Ikudome K, Fox G C, Kolawa A, Flower J W (1990) An automatic and symbolic parallelization system for distributed memory parallel computers. In: Proceedings of the fifth distributed memory computing conference

  42. Wang H O, Tanaka K, Griffin M (1995) Parallel distributed compensation of nonlinear systems by Takagi-Sugeno fuzzy model

  43. Poria S, Gelbukh A, Cambria E, Hussain A, Huang G -B (2014) EmoSenticSpace: a novel framework for affective common-sense reasoning. Knowl-Based Syst 69:108–123

    Article  Google Scholar 

  44. Poria S, Gelbukh A, Das D, Bandyopadhyay S (2013) Fuzzy clustering for semi-supervised learning – case study: construction of an emotion lexicon. In: Advances in artificial intelligence, volume 7629 of the series lecture notes in computer science, pp 73–86

  45. Vinchurkar S V, Nirkhi S M (2012) feature extraction of product from customer feedback through blog. International Journal of Emerging Technology and Advanced Engineering 2(1):2250–2459

    Google Scholar 

  46. IndiraPriya P, Ghosh D K (2013) A Survey on Different Clustering Algorithms in Data Mining Technique. International Journal of Modern Engineering Research (IJMER) 3(1):267–274

    Google Scholar 

  47. Ghasemi J, Ghaderi R, Karami Mollaei M R, Hojjatoleslami S A (2013) A novel fuzzy Dempster–Shafer inference system for brain MRI segmentation. Inf Sci 223:205–220

    Article  Google Scholar 

  48. Sheeba J I, Vivekanandan K (2014) A fuzzy logic based on sentiment classification. International Journal of Data Mining & Knowledge Management Process (IJDKP) 4(4)

  49. Liu C-L, Chang T -H, Li H -H (2013) Clustering documents with labeled and unlabeled documents using fuzzy semi-Kmeans. Fuzzy Sets Syst 221:48–64

    Article  MathSciNet  MATH  Google Scholar 

  50. Manek A S, Deepa Shenoy P, Chandra Mohan M, Venugopal K R (2016) Aspect term extraction for sentiment analysis in large movie reviews using gini index feature selection method and SVM classifier. World wide web, 1–20. doi:10.1007/s11280-015-0381-x. Print ISSN1386-145x, US

  51. Agarwal B, Mittal N (2016) Machine learning approach for sentiment analysis. Prominent feature extraction for sentiment analysis, 21–45. doi:10.1007/978-3-319-25343-5_3. Print ISBN 978-3-319-25341-1

  52. Agarwal B, Mittal N (2016) Semantic orientation-based approach for sentiment analysis. Prominent feature extraction for sentiment analysis, 77–88. doi:10.1007/978-3-319-25343-5_6. Print ISBN 978-3-319-25341-1

  53. Canuto S, André M, Gonçalves F B (2016) Exploiting new sentiment-based meta-level features for effective sentiment analysis. In: Proceedings of the ninth ACM international conference on web search and data mining (WSDM ’16), New York, USA, pp 53–62

  54. Ahmed S, Danti A (2016) Effective sentimental analysis and opinion mining of web reviews using rule based classifiers. Computational Intelligence in Data Mining 1:171–179. doi:10.1007/978-81-322-2734-2_18. Print ISBN 978-81-322-2732-8, India

    Article  Google Scholar 

  55. Phu V N, Tuoi P T (2014) Sentiment classification using enhanced contextual valence shifters. In: International Conference on Asian Language Processing (IALP), pp 224–229

  56. Tran V T N, Phu V N, Tuoi P T (2014) Learning more chi square feature selection to improve the fastest and most accurate sentiment classification. In: The third asian conference on information systems (ACIS 2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vo Ngoc Phu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Phu, V.N., Dat, N.D., Ngoc Tran, V.T. et al. Fuzzy C-means for english sentiment classification in a distributed system. Appl Intell 46, 717–738 (2017). https://doi.org/10.1007/s10489-016-0858-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0858-z

Keywords

Navigation