Skip to main content

Discriminatory Expressions to Improve Model Comprehensibility in Short Documents

  • Conference paper
  • First Online:
Pattern Recognition and Artificial Intelligence (ICPRAI 2022)

Abstract

Microblogging sites are being used as analysis avenues due to their peculiarities (promptness, short texts...). Lately, researchers have focused mainly in classification performance rather than interpretability. When the problem requires transparency, it is necessary to build interpretable pipelines, and even though, resulting models are too complex to be considered comprehensible, making it impossible for humans to understand the actual decisions. This paper presents a feature selection mechanism that is able to improve comprehensibility by using less but more meaningful features. Results show that our proposal is better and the most stable one in terms of accuracy, generalisation and comprehensibility in microblogging context.

This work was financially supported by the Spanish Ministry of Economy and Competitiveness (MINECO), project FFI2016-79748-R, and cofinanced by the European Social Fund (ESF). Manuel Francisco Aparicio was supported by the FPI 2017 predoctoral programme, from the Spanish Ministry of Economy and Competitiveness (MINECO), grant reference BES-2017-081202.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/nutcrackerugr/discriminatory-expressions.

References

  1. Alharbi, A., de Doncker, E.: Twitter sentiment analysis with a deep neural network: an enhanced approach using user behavioral information. Cognit. Syst. Res. 54, 50–61 (2019). https://doi.org/10.1016/j.cogsys.2018.10.001

    Article  Google Scholar 

  2. Alonso, J.M., Magdalena, L., González-Rodríguez, G.: Looking for a good fuzzy system interpretability index: an experimental approach. Int. J. Approx. Reasoning 51(1), 115–134 (2009). https://doi.org/10.1016/j.ijar.2009.09.004. https://www.sciencedirect.com/science/article/pii/S0888613X09001418

  3. Alsaig, A., Alsaig, A., Alsadun, M., Barghi, S.: Context based algorithm for social influence measurement on Twitter. In: Cong Vinh, P., Alagar, V. (eds.) ICCASA/ICTCC -2018. LNICST, vol. 266, pp. 136–149. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-06152-4_12

    Chapter  Google Scholar 

  4. Brown, T.B., et al.: Language models are few-shot learners. arXiv:2005.14165 [cs], July 2020

  5. Caropreso, M.F., Matwin, S., Sebastiani, F.: A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization (2001)

    Google Scholar 

  6. Cowan, N.: The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behav. Brain Sci. 24(1), 87–114 (2001). Discussion 114–185

    Google Scholar 

  7. Deng, X., Li, Y., Weng, J., Zhang, J.: Feature selection for text classification: a review. Multimedia Tools Appl. 78(3), 3797–3816 (2019). https://doi.org/10.1007/s11042-018-6083-5

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [cs], October 2018

  9. Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 [cs, stat], February 2017

  10. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. JMLR 3, 1289–1305 (2003)

    Google Scholar 

  11. Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45268-0_6

    Chapter  Google Scholar 

  12. Hans, C., Dobra, A., West, M.: Shotgun stochastic search for “Large p” regression. J. Am. Stat. Assoc. 102 (2005). https://doi.org/10.2307/27639881

  13. Internet Live Stats: Twitter usage statistics - internet live stats (2020). https://www.internetlivestats.com/twitter-statistics/

  14. Meiri, R., Zahavi, J.: Using simulated annealing to optimize the feature selection problem in marketing applications. Eur. J. Oper. Res. 171(3), 842–858 (2006). https://doi.org/10.1016/j.ejor.2004.09.010. http://www.sciencedirect.com/science/article/pii/S0377221704005892

  15. Mencar, C., Fanelli, A.M.: Interpretability constraints for fuzzy information granulation. Inf. Sci. 178(24), 4585–4618 (2008). https://doi.org/10.1016/j.ins.2008.08.015. https://www.sciencedirect.com/science/article/pii/S0020025508003484

  16. Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81–97 (1956). https://doi.org/10.1037/h0043158

    Article  Google Scholar 

  17. Minaee, S., et al.: Deep learning based text classification: a comprehensive review. arXiv:2004.03705 [cs, stat], April 2020, version: 1

  18. Misangyi, V.F., LePine, J.A., Algina, J., Francis Goeddeke, J.: The adequacy of repeated-measures regression for multilevel research: comparisons with repeated-measures ANOVA, multivariate repeated-measures anova, and multilevel modeling across various multilevel research designs. Organ. Res. Methods (2016). https://doi.org/10.1177/1094428105283190. https://journals.sagepub.com/doi/10.1177/1094428105283190

  19. Moreo, A., Navarro, M., Castro, J.L., Zurita, J.M.: A high-performance FAQ retrieval method using minimal differentiator expressions. Knowl. Based Syst. 36, 9–20 (2012). https://doi.org/10.1016/j.knosys.2012.05.015. http://www.sciencedirect.com/science/article/pii/S0950705112001657

  20. O’Dair, M., Fry, A.: Beyond the black box in music streaming: the impact of recommendation systems upon artists. Pop. Commun. (2019). https://doi.org/10.1080/15405702.2019.1627548

    Article  Google Scholar 

  21. Periñán-Pascual, C., Arcas-Túnez, F.: Detecting environmentally-related problems on Twitter. Biosyst. Eng. 177, 31–48 (2019). https://doi.org/10.1016/j.biosystemseng.2018.10.001

    Article  Google Scholar 

  22. Peters, M.E., et al.: Deep contextualized word representations. arXiv:1802.05365 [cs], February 2018

  23. Phillips, A.: The Moral Dilemma of Algorithmic Censorship, August 2018. https://becominghuman.ai/the-moral-dilemma-of-algorithmic-censorship-6d7b6faefe7

  24. Rudin, C.: Please stop explaining black box models for high stakes decisions. arXiv:1811.10154 [cs, stat], November 2018

  25. Twitter Inc.: Q1 2019 earning report. Technical report, Twitter Inc. (2019). https://s22.q4cdn.com/826641620/files/doc_financials/2019/q1/Q1-2019-Slide-Presentation.pdf

  26. Wang, H., Hong, M.: Supervised Hebb rule based feature selection for text classification. Inf. Process. Manag. 56(1), 167–191 (2019). https://doi.org/10.1016/j.ipm.2018.09.004. http://www.sciencedirect.com/science/article/pii/S0306457318305752

  27. Wu, G., Wang, L., Zhao, N., Lin, H.: Improved expected cross entropy method for text feature selection. In: 2015 International Conference on Computer Science and Mechanical Automation (CSMA), pp. 49–54, October 2015. https://doi.org/10.1109/CSMA.2015.17. ISSN: null

  28. Xue, B., Zhang, M., Browne, W.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43, 1656–1671 (2013). https://doi.org/10.1109/TSMCB.2012.2227469

    Article  Google Scholar 

  29. Zheng, H.T., et al.: Learning-based topic detection using multiple features. Concurr. Comput. Pract. Exp. 30(15), e4444 (2018). https://doi.org/10.1002/cpe.4444. wOS:000438339700001

  30. Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newsl. 6(1), 80–89 (2004). https://doi.org/10.1145/1007730.1007741

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Manuel Francisco or Juan Luis Castro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Francisco, M., Castro, J.L. (2022). Discriminatory Expressions to Improve Model Comprehensibility in Short Documents. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2022. Lecture Notes in Computer Science, vol 13363. Springer, Cham. https://doi.org/10.1007/978-3-031-09037-0_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-09037-0_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-09036-3

  • Online ISBN: 978-3-031-09037-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics