Discriminatory Expressions to Improve Model Comprehensibility in Short Documents

Francisco, Manuel; Castro, Juan Luis

doi:10.1007/978-3-031-09037-0_26

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13363))

Included in the following conference series:

International Conference on Pattern Recognition and Artificial Intelligence

1877 Accesses

Abstract

Microblogging sites are being used as analysis avenues due to their peculiarities (promptness, short texts...). Lately, researchers have focused mainly in classification performance rather than interpretability. When the problem requires transparency, it is necessary to build interpretable pipelines, and even though, resulting models are too complex to be considered comprehensible, making it impossible for humans to understand the actual decisions. This paper presents a feature selection mechanism that is able to improve comprehensibility by using less but more meaningful features. Results show that our proposal is better and the most stable one in terms of accuracy, generalisation and comprehensibility in microblogging context.

This work was financially supported by the Spanish Ministry of Economy and Competitiveness (MINECO), project FFI2016-79748-R, and cofinanced by the European Social Fund (ESF). Manuel Francisco Aparicio was supported by the FPI 2017 predoctoral programme, from the Spanish Ministry of Economy and Competitiveness (MINECO), grant reference BES-2017-081202.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Microblog Analysis with Machine Learning for Indic Languages: A Quick Survey

Predicting Emotion Labels for Chinese Microblog Texts

Developing and Evaluating a Readability Measure for Microblogging Communication

Notes

1.
https://github.com/nutcrackerugr/discriminatory-expressions.

References

Alharbi, A., de Doncker, E.: Twitter sentiment analysis with a deep neural network: an enhanced approach using user behavioral information. Cognit. Syst. Res. 54, 50–61 (2019). https://doi.org/10.1016/j.cogsys.2018.10.001
Article Google Scholar
Alonso, J.M., Magdalena, L., González-Rodríguez, G.: Looking for a good fuzzy system interpretability index: an experimental approach. Int. J. Approx. Reasoning 51(1), 115–134 (2009). https://doi.org/10.1016/j.ijar.2009.09.004. https://www.sciencedirect.com/science/article/pii/S0888613X09001418
Alsaig, A., Alsaig, A., Alsadun, M., Barghi, S.: Context based algorithm for social influence measurement on Twitter. In: Cong Vinh, P., Alagar, V. (eds.) ICCASA/ICTCC -2018. LNICST, vol. 266, pp. 136–149. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-06152-4_12
Chapter Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. arXiv:2005.14165 [cs], July 2020
Caropreso, M.F., Matwin, S., Sebastiani, F.: A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization (2001)
Google Scholar
Cowan, N.: The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behav. Brain Sci. 24(1), 87–114 (2001). Discussion 114–185
Google Scholar
Deng, X., Li, Y., Weng, J., Zhang, J.: Feature selection for text classification: a review. Multimedia Tools Appl. 78(3), 3797–3816 (2019). https://doi.org/10.1007/s11042-018-6083-5
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [cs], October 2018
Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 [cs, stat], February 2017
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. JMLR 3, 1289–1305 (2003)
Google Scholar
Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45268-0_6
Chapter Google Scholar
Hans, C., Dobra, A., West, M.: Shotgun stochastic search for “Large p” regression. J. Am. Stat. Assoc. 102 (2005). https://doi.org/10.2307/27639881
Internet Live Stats: Twitter usage statistics - internet live stats (2020). https://www.internetlivestats.com/twitter-statistics/
Meiri, R., Zahavi, J.: Using simulated annealing to optimize the feature selection problem in marketing applications. Eur. J. Oper. Res. 171(3), 842–858 (2006). https://doi.org/10.1016/j.ejor.2004.09.010. http://www.sciencedirect.com/science/article/pii/S0377221704005892
Mencar, C., Fanelli, A.M.: Interpretability constraints for fuzzy information granulation. Inf. Sci. 178(24), 4585–4618 (2008). https://doi.org/10.1016/j.ins.2008.08.015. https://www.sciencedirect.com/science/article/pii/S0020025508003484
Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81–97 (1956). https://doi.org/10.1037/h0043158
Article Google Scholar
Minaee, S., et al.: Deep learning based text classification: a comprehensive review. arXiv:2004.03705 [cs, stat], April 2020, version: 1
Misangyi, V.F., LePine, J.A., Algina, J., Francis Goeddeke, J.: The adequacy of repeated-measures regression for multilevel research: comparisons with repeated-measures ANOVA, multivariate repeated-measures anova, and multilevel modeling across various multilevel research designs. Organ. Res. Methods (2016). https://doi.org/10.1177/1094428105283190. https://journals.sagepub.com/doi/10.1177/1094428105283190
Moreo, A., Navarro, M., Castro, J.L., Zurita, J.M.: A high-performance FAQ retrieval method using minimal differentiator expressions. Knowl. Based Syst. 36, 9–20 (2012). https://doi.org/10.1016/j.knosys.2012.05.015. http://www.sciencedirect.com/science/article/pii/S0950705112001657
O’Dair, M., Fry, A.: Beyond the black box in music streaming: the impact of recommendation systems upon artists. Pop. Commun. (2019). https://doi.org/10.1080/15405702.2019.1627548
Article Google Scholar
Periñán-Pascual, C., Arcas-Túnez, F.: Detecting environmentally-related problems on Twitter. Biosyst. Eng. 177, 31–48 (2019). https://doi.org/10.1016/j.biosystemseng.2018.10.001
Article Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. arXiv:1802.05365 [cs], February 2018
Phillips, A.: The Moral Dilemma of Algorithmic Censorship, August 2018. https://becominghuman.ai/the-moral-dilemma-of-algorithmic-censorship-6d7b6faefe7
Rudin, C.: Please stop explaining black box models for high stakes decisions. arXiv:1811.10154 [cs, stat], November 2018
Twitter Inc.: Q1 2019 earning report. Technical report, Twitter Inc. (2019). https://s22.q4cdn.com/826641620/files/doc_financials/2019/q1/Q1-2019-Slide-Presentation.pdf
Wang, H., Hong, M.: Supervised Hebb rule based feature selection for text classification. Inf. Process. Manag. 56(1), 167–191 (2019). https://doi.org/10.1016/j.ipm.2018.09.004. http://www.sciencedirect.com/science/article/pii/S0306457318305752
Wu, G., Wang, L., Zhao, N., Lin, H.: Improved expected cross entropy method for text feature selection. In: 2015 International Conference on Computer Science and Mechanical Automation (CSMA), pp. 49–54, October 2015. https://doi.org/10.1109/CSMA.2015.17. ISSN: null
Xue, B., Zhang, M., Browne, W.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43, 1656–1671 (2013). https://doi.org/10.1109/TSMCB.2012.2227469
Article Google Scholar
Zheng, H.T., et al.: Learning-based topic detection using multiple features. Concurr. Comput. Pract. Exp. 30(15), e4444 (2018). https://doi.org/10.1002/cpe.4444. wOS:000438339700001
Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newsl. 6(1), 80–89 (2004). https://doi.org/10.1145/1007730.1007741

Download references

Author information

Authors and Affiliations

Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Manuel Francisco & Juan Luis Castro

Authors

Manuel Francisco
View author publications
You can also search for this author in PubMed Google Scholar
Juan Luis Castro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Manuel Francisco or Juan Luis Castro .

Editor information

Editors and Affiliations

Télécom SudParis, Palaiseau, France
Mounîm El Yacoubi
École de Technologie Supérieure, Montreal, QC, Canada
Eric Granger
Hong Kong Baptist University, Kowloon, Kowloon, Hong Kong
Pong Chi Yuen
Indian Statistical Institute, Kolkata, India
Umapada Pal
Université Paris Cité, Paris, France
Nicole Vincent

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Francisco, M., Castro, J.L. (2022). Discriminatory Expressions to Improve Model Comprehensibility in Short Documents. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2022. Lecture Notes in Computer Science, vol 13363. Springer, Cham. https://doi.org/10.1007/978-3-031-09037-0_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-09037-0_26
Published: 02 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09036-3
Online ISBN: 978-3-031-09037-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Discriminatory Expressions to Improve Model Comprehensibility in Short Documents