Skip to main content

A Study on Different Text Representation Methods for the Negative Selection Algorithm

  • Conference paper
  • First Online:
Distributed Computing and Artificial Intelligence, 19th International Conference (DCAI 2022)

Abstract

Unstructured data, such as text, usually have to be structured before standard machine learning classifiers are applied. In such cases, different representation schemes can be used, such as Bag of Words, the Linguistic Inquiry and Word Count (LIWC), Part-of-Speech Tagging (POS Tagging), and others. The Negative Selection Algorithm (NSA) was designed with inspiration in the immune system to solve binary classification problems, more specifically anomaly detection. This paper investigates the performance of various text representation schemes as input to the NSA. Three different datasets and text representation methods are used, and the results are presented in terms of Accuracy and False Positive Rate.

Supported by FAPESP, CNPq and MackPesquisa.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jo, T.: Text Mining. Studies in Big Data. Springer International Publishing, Cham (2019)

    Google Scholar 

  2. Miner, G., Elder, J., IV., Fast, A., Hill, T., Nisbet, R., Delen, D.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Academic Press (2012)

    Google Scholar 

  3. Manning, C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Nat. Lang. Eng. 16(1), 100–103 (2010)

    MATH  Google Scholar 

  4. Chowdhary, K.: Natural language processing. Fundamentals of Artificial Intelligence, pp. 603–649 (2020)

    Google Scholar 

  5. de Castro, L.N., Ferrari, D.G.: Introdução à mineração de dados: conceitos básicos, algoritmos e aplicações, p. 5. Saraiva, São Paulo (2016)

    Google Scholar 

  6. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Elsevier (2005)

    Google Scholar 

  7. Qader, W.A., Ameen, M.M., Ahmed, B.I.: An overview of bag of words; importance, implementation, applications, and challenges. In: 2019 International Engineering Conference (IEC), pp. 200–204. IEEE (2019)

    Google Scholar 

  8. Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Assoc. 71(2001), 2001 (2001)

    Google Scholar 

  9. Kumawat, D., Jain, V.: POS tagging approaches: a comparison. Int. J. Comput. Appl. 118(6) (2015)

    Google Scholar 

  10. de Castro, L.N., Timmis, J.: Artificial Immune Systems: A New Computational Intelligence Approach. Springer Science & Business Media (2002)

    Google Scholar 

  11. Dasgupta, D., Yu, S., Nino, F.: Recent advances in artificial immune systems: models and applications. Appl. Soft Comput. 11(2), 1574–1587 (2011)

    Article  Google Scholar 

  12. Hofmeyr, S.A., Forrest, S.: Immunity by Design: An Artificial Immune System. (1999)

    Google Scholar 

  13. González, F.A., Dasgupta, D.: Anomaly detection using real-valued negative selection. Genet. Program Evolvable Mach. 4, 383–403 (2003)

    Article  Google Scholar 

  14. Bendiab, E., Kholladi, M.K.: The negative selection algorithm: a supervised learning approach for skin detection and classification. IJCSNS Int. J. Comput. Sci. Netw. Sec. 10, 86–92 (2010)

    Google Scholar 

  15. Ayara, M., Timmis, J., de Lemos, R., de Castro, Leandro N., Duncan, R.: Negative selection: How to generate detectors. In: Proceedings of the 1st International Conference on Artificial Immune Systems (ICARIS), vol. 1, pp. 89–98. University of Kent at Canterbury Printing Unit, University of Kent at Canterbury (2002)

    Google Scholar 

  16. de Castro, Leandro N.: Fundamentals of natural computing: an overview. Phys. Life Rev. 4(1), 1–36 (2007)

    Google Scholar 

  17. Bäck, T., Kok, J.N., Rozenberg, G.: Handbook of Natural Computing. Springer, Heidelberg (2012)

    MATH  Google Scholar 

  18. Jockers, M.L., Thalken, R.: Text Analysis with R. Springer International Publishing (2020)

    Google Scholar 

  19. Hirschberg, J., Manning, C.D.: Advances in natural language processing. Science 349(6245), 261–266 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  20. Zhang, Y., Jin, R., Zhou, Z.H.: Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. Cybern. 1(1–4), 43–52 (2010)

    Article  Google Scholar 

  21. Ramos. J.: Using TF-IDF to Determine Word Relevance in Document Queries. Department of Computer Science, Rutgers University, 23515 BPO Way, Piscataway, NJ, 08855 (2003)

    Google Scholar 

  22. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL Main Papers, pp. 173–180 (2003)

    Google Scholar 

  23. Atwell, E., Hughes, J., Souter, C.: AMALGAM: automatic mapping among lexico-grammatical annotation models. In: Proceedings of the ACL Workshop, pp. 21–20 (2003)

    Google Scholar 

  24. Sentiment Labelled Sentences Dataset, https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences. Accessed April 2022

  25. LIWC-Python - Repository, https://github.com/chbrown/liwc-python. Accessed April 2022

  26. LIWC2015, https://liwc.wpengine.com. Accessed April 2022

  27. Rahutomo, F., Kitasuka, T., Aritsugi, M.: Semantic cosine similarity. In: The 7th International Student Conference on Advanced Science and Technology ICAST, vol. 4, Issue 1, p. 1 (2012)

    Google Scholar 

  28. Li, B., Han, L.: Distance weighted cosine similarity measure for text classification. In: International Conference on Intelligent Data Engineering and Automated Learning, pp. 611-618. Springer, Berlin, Heidelberg (2013)

    Google Scholar 

  29. Imran, A.S., Daudpota, S.M., Kastrati, Z., Batra, R.: Cross-cultural polarity and emotion detection using sentiment analysis and deep learning on COVID-19 related tweets. IEEE Access 8, 181074–181090 (2020)

    Article  Google Scholar 

  30. Kumar, M., Husain, M., Upreti, N., Gupta, D.: Genetic algorithm: Review and application. SSRN 3529843 (2010)

    Google Scholar 

  31. Nazif, H., Lee, L.S.: Optimised crossover genetic algorithm for capacitated vehicle routing problem. Appl. Math. Model. 36(5), 2110–2117 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  32. R. Crandall, W., A. Parnell, J., E. Spillan. J.: Crisis Management: Leading in the New Strategy Landscape. SAGE Publications, Incorporated (2013)

    Google Scholar 

  33. Tarakanov, A., Dasgupta, D.: A formal model of an artificial immune system. Biosystems 55(1–3), 151–158 (2000)

    Article  Google Scholar 

  34. Vrajitoru, D.: Crossover improvement for the genetic algorithm in information retrieval. Inf. Process. Manag. 34(4), 405–415 (1998)

    Article  Google Scholar 

  35. Wason, R.: Deep learning: evolution and expansion. Cogn. Syst. Res. 52, 701–708 (2018)

    Article  Google Scholar 

  36. de Castro, Leandro N., Von Zuben, Fernando J.: Artificial immune systems: Part I - basic theory and applications. Technical Report 210, Universidade Estadual de Campinas (1999)

    Google Scholar 

  37. Filho, P.B., Pardo, T.A.S., Aluisio, S.: An evaluation of the Brazilian Portuguese LIWC dictionary for Sentiment Analysis. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (2013)

    Google Scholar 

  38. Tan, A.H.: Text mining: the state of the art and the challenges. In: Proceedings of the ACL Workshop, pp. 21–20 (1999)

    Google Scholar 

  39. Timmis, J., Knight, T., de Castro, Leandro N., Hart, E.: An Overview of Artificial Immune Systems. In: Paton, R., Bolouri, H., Holcombe, M., Parish, J.H, Tateson, R., (eds.) Computation in Cells and Tissues: Perspectives and Tools for Thought. Natural Computation Series. Springer, pp. 51–86. ISBN (2004). 978-3-540-00358-8

    Google Scholar 

  40. de Castro, Leandro N., Ferrari, Daniel G.: Introdução à Mineração de Dados: Conceitos básicos, algoritmos e aplicações. 1st edn. Editora Saraiva, Brazil (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matheus A. Ferraria .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ferraria, M.A., Ferraria, V.A., de Castro, L.N. (2023). A Study on Different Text Representation Methods for the Negative Selection Algorithm. In: Omatu, S., Mehmood, R., Sitek, P., Cicerone, S., Rodríguez, S. (eds) Distributed Computing and Artificial Intelligence, 19th International Conference. DCAI 2022. Lecture Notes in Networks and Systems, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-031-20859-1_30

Download citation

Publish with us

Policies and ethics