Skip to main content

Semantic Genetic Programming for Sentiment Analysis

  • Chapter
  • First Online:
NEO 2015

Part of the book series: Studies in Computational Intelligence ((SCI,volume 663))

Abstract

Sentiment analysis is one of the most important tasks in text mining. This field has a high impact for government and private companies to support major decision-making policies. Even though Genetic Programming (GP) has been widely used to solve real world problems, GP is seldom used to tackle this trendy problem. This contribution starts rectifying this research gap by proposing a novel GP system, namely, Root Genetic Programming, and extending our previous genetic operators based on projections on the phenotype space. The results show that these systems are able to tackle this problem being competitive with other state-of-the-art classifiers, and, also, give insight to approach large scale problems represented on high dimensional spaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic/.

  2. 2.

    The interested reader in how a document collection is processed to obtain a vector representation is referenced to the specialized literature [2, 28].

  3. 3.

    \(x_i\) could be a input-vector or a scalar.

  4. 4.

    The K-nearest neighbor classifier was tested with varying K from 10 to 100 and \(K=30\) gave the highest result.

References

  1. Arora, S., Mayfield, E., Penstein-Ros, C., Nyberg, E.: Sentiment classification using automatically extracted subgraph features. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, CAAGET ’10, pp. 131–139, Stroudsburg, PA, USA (2010). Association for Computational Linguistics. 00030

    Google Scholar 

  2. Baeza-Yates, P.A., Ribeiro-Neto, B.A.: Modern Information Retrieval, 2 edn. Addison-Wesley (2011)

    Google Scholar 

  3. Castelli, M., Silva, S., Vanneschi, L.: A C++ framework for geometric semantic genetic programming. Genet. Program. Evol. Mach. 16(1), 73–81 (2014). 00004

    Article  Google Scholar 

  4. Castelli, M., Trujillo, L., Vanneschi, L., Silva, S., Z-Flores, E., Legrand, P.: Geometric semantic genetic programming with local search. In: Proceedings of the 2015 on Genetic and Evolutionary Computation Conference, GECCO ’15, pp. 999–1006. ACM, New York, NY, USA (2015). 00000

    Google Scholar 

  5. Doucette, J., Lichodzijewski, P., Heywood, M.: Evolving coevolutionary classifiers under large attribute spaces. In: Riolo, R., O’Reilly, U.-M., McConaghy, T. (eds.) Genetic Programming Theory and Practice VII, Genetic and Evolutionary Computation, pp. 37–54. Springer, US (2010). 00008. doi:10.1007/978-1-4419-1626-6_3

  6. Escalante, H.J., Garcia-Limon, M.A., Morales-Reyes, A., Graff, M., Montes-y Gomez, M., Morales, E.F., Martinez-Carranza, J.: Term-weighting learning via genetic programming for text classification. Knowl.-Based Syst. (2015). 00000

    Google Scholar 

  7. Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 40(2):121–144 (2010)

    Google Scholar 

  8. Giannakopoulos, G., Mavridi, P., Paliouras, G., Papadakis, G., Tserpes, K.: Representation models for text classification: a comparative analysis over three web document types. In: Proceedings of the 2Nd International Conference on Web Intelligence, Mining and Semantics, WIMS ’12, pp. 13:1–13:12. ACM, New York, NY, USA (2012)

    Google Scholar 

  9. Graff, Mario, Tellez, E.S., Villasenor, E., Miranda-Jiménez, S.: Semantic genetic programming operators based on projections in the phenotype space. Res. Comput. Sci. 94, 73–85 (2015)

    Google Scholar 

  10. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2):65–70 (1979). 10011

    Google Scholar 

  11. Iqbal, M., Browne, W.N., Zhang, M.: Reusing building blocks of extracted knowledge to solve complex, large-scale boolean problems. IEEE Trans. Evol. Comput. 18(4):465–480 (2014). 00019

    Google Scholar 

  12. Korns, M.F.: Large-scale, time-constrained symbolic regression. In: Riolo, R., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice IV, Genetic and Evolutionary Computation, pp. 299–314. Springer, US (2007). 00019 doi:10.1007/978-0-387-49650-4_18

  13. Korns, M.F.: Large-scale, time-constrained symbolic regression-classification. In: Riolo, R., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice V, Genetic and Evolutionary Computation Series, pp. 53–68. Springer, US, (2008). 00020 doi:10.1007/978-0-387-76308-8_4

  14. Korns, M.F., Nunez, L.: Profiling symbolic regression-classification. In: Genetic Programming Theory and Practice VI, Genetic and Evolutionary Computation, pp. 1–14. Springer, US (2009). 00011 doi:10.1007/978-0-387-87623-8_14

  15. Liu, B.: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, 381 p. Cambridge University Press (2015). ISBN: 1-107-01789-0

    Google Scholar 

  16. Mayfield, E., Penstein-Rosé, C.: Using feature construction to avoid large feature spaces in text classification. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO ’10, pp. 1299–1306. ACM, New York, NY, USA (2010). 00013

    Google Scholar 

  17. McConaghy, T.: Latent variable symbolic regression for high-dimensional inputs. In: Riolo, R., O’Reilly, U.-M., McConaghy, T. (eds.) Genetic Programming Theory and Practice VII, Genetic and Evolutionary Computation, pp. 103–118. Springer, US (2010). 00007. doi:10.1007/978-1-4419-1626-6_7

  18. Moraglio, A., Krawiec, K., Johnson, C.G.: Geometric semantic genetic programming. In: Coello Coello, C.A., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds.) Parallel Problem Solving from Nature - PPSN XII, number 7491 in Lecture Notes in Computer Science, pp. 21–31. Springer, Berlin, Heidelberg (2012)

    Google Scholar 

  19. Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings – Practical on-line search algorithms for texts and biological sequences, 280 p. Cambridge University Press (2002). ISBN 0-521-81307-7

    Google Scholar 

  20. Padr, L., Stanilovsky, E.: Freeling 3.0: Towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012). ELRA, Istanbul, Turkey (2012)

    Google Scholar 

  21. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  22. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  23. Peng, T., Zuo, W., He, F.: Svm based adaptive learning method for text classification from positive and unlabeled documents. Knowl. Inf. Syst. 16(3), 281–301 (2008)

    Article  Google Scholar 

  24. Poli, R.: TinyGP. See Genetic and Evolutionary Computation Conference (GECCO-2004) competition (2004). http://cswww.essex.ac.uk/staff/sml/gecco/TinyGP.html

  25. Poli, R., Langdon, W.B., McPhee, N.F.: A Field Guide to Genetic Programming. Lulu Enterprises UK Ltd (2008)

    Google Scholar 

  26. Romn, J.V., Morera, J.G., Garca Cumbreras, M.A., Martnez Cmara, E., Teresa Martn Valdivia, M., Alfonso Urea Lpez, L.: Overview of tass 2015. CEUR Workshop Proc. 1397:13–21 (2015)

    Google Scholar 

  27. Sammut, C., Webb, G.I. (eds.): Statistical natural language processing. Encyclopedia of Machine Learning, pp. 916–916. Springer, US (2010)

    Google Scholar 

  28. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2008)

    Article  MathSciNet  Google Scholar 

  29. Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)

    Article  MathSciNet  Google Scholar 

  30. Sidorov, G., Miranda-Jiménez, S., Viveros-Jiménez, F., Gelbukh, A., Castro-Sánchez, N., Velásquez, F., Díaz-Rangel, I., Suárez-Guerra, S., Treviño, A., Gordon, J.: Empirical study of machine learning based approach for opinion mining in tweets. In: Proceedings of the 11th Mexican International Conference on Advances in Artificial Intelligence - Volume Part I, MICAI’12, pp. 1–14. Springer, Berlin, Heidelberg (2013)

    Google Scholar 

  31. Silla, C.N. Jr., Pappa, G.L., Freitas, A.A., Kaestner, A.A.: Automatic text summarization with genetic algorithm-based attribute selection. In: Lemaître, C., Reyes, C.A., González, J.A. (eds.) Proceedings 9th Ibero-American Conference on AI Advances in Artificial Intelligence - IBERAMIA 2004. Lecture Notes in Computer Science, vol. 3315, pp. 305–314. Springer, Puebla, Mexico, 22–26 November 2004

    Google Scholar 

  32. Silva, S.: Gplab: A genetic programming toolbox for matlab. http://gplab.sourceforge.net

  33. Uy, N.Q., Anh, P.T., Doan, T.C., Hoai, N.X.: A study on the use of genetic programming for automatic text summarization. In: Dang-Van, H., Sanders, J. (eds.) The Fourth International Conference on Knowledge and Systems Engineering, KSE 2012, pp. 93–98, Danang, Vietnam, 17–19 August 2012

    Google Scholar 

  34. Vanneschi, L., Castelli, M., Manzoni, L., Silva, S.: A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. In: Krawiec, K., Moraglio, A., Hu, T., Ima Etaner-Uyar, A., Hu, B. (eds.) Genetic Programming, number 7831 in Lecture Notes in Computer Science, pp. 205–216. Springer, Berlin, Heidelberg (2013)

    Google Scholar 

  35. Vanneschi, L., Castelli, M., Silva, S.: A survey of semantic methods in genetic programming. Genet. Program. Evol. Mach. 15(2), 195–214 (2014). June

    Article  Google Scholar 

  36. White, D.R.: Software review: the ecj toolkit. Genet. Program. Evol. Mach. 13(1):65–67 (2012)

    Google Scholar 

  37. Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80 (1945)

    Article  Google Scholar 

  38. Zhang, Y., Bhattacharyya, S.: Genetic programming in classifying large-scale data: an ensemble method. Inf. Sci. 163(1–3):85–101 (2004). 00061

    Google Scholar 

  39. Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms. CRC Press (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario Graff .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Graff, M., Tellez, E.S., Jair Escalante, H., Miranda-Jiménez, S. (2017). Semantic Genetic Programming for Sentiment Analysis. In: Schütze, O., Trujillo, L., Legrand, P., Maldonado, Y. (eds) NEO 2015. Studies in Computational Intelligence, vol 663. Springer, Cham. https://doi.org/10.1007/978-3-319-44003-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44003-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44002-6

  • Online ISBN: 978-3-319-44003-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics