Abstract
Language can be a tool to marginalize certain groups due to the fact that it may reflect a negative mentality caused by mental barriers or historical delays. In order to prevent misuse of language, several agents have carried out campaigns against discriminatory language, criticizing the use of some terms and phrases. However, there is an important gap in detecting discriminatory text in documents because language is very flexible and, usually, contains hidden features or relations. Furthermore, the adaptation of approaches and methodologies proposed in the literature for text analysis is complex due to the fact that these proposals are too rigid to be adapted to different purposes for which they were intended. The main novelty of the methodology is the use of ontologies to implement the rules that are used by the developed text analyzer, providing a great flexibility for the development of text analyzers and exploiting the ability to infer knowledge of the ontologies. A set of rules for detecting discriminatory language relevant to gender and people with disabilities is also presented in order to show how to extend the functionality of the text analyzer to different discriminatory text areas.




Similar content being viewed by others
Notes
A Weka-ready version of the data set is available at https://sourceforge.net/p/disclangeditor/.
Concepts and properties of the ontology.
References
Ahmed, S. (2007). The language of diversity. Ethnic and Racial Studies, 30(2), 235–256.
Alfonseca, E., Garrido, G., Delort, J. Y., & Peńas, A. (2013). Whad: Wikipedia historical attributes data: Historical structured data extraction and vandalism detection from the wikipedia edit history. Language Resources and Evaluation, 47(4), 1163–1190.
Augoustinos, M., Tuffin, K., & Every, D. (2005). New racism, meritocracy and individualism: Constraining affirmative action in education. Discourse and Society, 16(3), 315–340.
Aussenac-Gilles, N., & Sörgel, D. (2005). Text analysis for ontology and terminology engineering. Applied Ontology, 1(1), 35–46.
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Sebastopol, CA: O’Reilly Media, Inc.
Brading, J., & Curtis, J. (2000). Disability discrimination: A practical guide to the new law. London: Kogan Page Series.
Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the third conference on applied natural language processing, association for computational linguistics, Stroudsburg, PA, USA, ANLC ’92, pp. 152–155. doi:10.3115/974499.974526.
Buitelaar, P., Olejnik, D., & Sintek, M. (2004). A protégé plug-in for ontology extraction from text based on linguistic analysis. In The semantic web: Research and applications, pp. 31–44. Springer.
Chandrasekaran, B., Josephson, J., & Benjamins, V. (1999). What are ontologies, and why do we need them? IEEE Intelligent Systems and Their Applications, 14(1), 20–26.
Chen, Y., Zhou, Y., Zhu, S., & Xu, H. (2012). Detecting offensive language in social media to protect adolescent online safety. In Proceedings—2012 ASE/IEEE international conference on privacy, security, risk and trust and 2012 ASE/IEEE international conference on social computing, SocialCom/PASSAT 2012, pp. 71–80.
Chin, S., Street, W., Srinivasan, P., & Eichmann, D. (2010). Detecting wikipedia vandalism with active learning and statistical language models. In Proceedings of the 4th workshop on information credibility, WICOW’10, pp. 3–10.
Cimiano, P., McCrae, J., & Buitelaar, P. (2016). Lexicon model for ontologies: Community report. https://www.w3.org/2016/05/ontolex/. Accessed 12 July 2016.
Claude, R., & Weston, B. (1992). Human rights in the world community: Issues and action. Pennsylvania: University of Pennsylvania Press.
Colker, R., & Milani, A. (2012). The law of disability discrimination handbook: Statutes and regulatory guidance. New York, NY: LexisNexis.
Dance, F. (1970). The concept of communication. Journal of Communication, 20(2), 201–210.
Drummond, N., Rector, A., Stevens, R., Moulton, G., Horridge, M., Wang, H., & Seidenberg, J. (2006). Putting owl in order: Patterns for sequences in owl. In OWLED.
Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Computing semantic relatedness using wikipedia-based explicit semantic analysis. pp. 1606–1611.
Gangemi, A., Navigli, R., & Velardi, P. (2003). The ontowordnet project: Extension and axiomatization of conceptual relations in wordnet. In The OntoWordNet project: Extension and axiomatization of conceptual relations in WordNet, Vol. 2888, pp. 820–838. Springer.
Garla, V., & Brandt, C. (2012). Ontology-guided feature engineering for clinical text classification. Journal of Biomedical Informatics, 45(5), 992–998.
Hayes, P. J., & Patel-Schneide, P. F. (2014). Rdf 1.1 semantics. https://www.w3.org/TR/rdf11-mt/. Accessed 18 March 2016.
Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on computational linguistics-Volume 2, Association for Computational Linguistics, pp. 539–545.
Hellmann, S., Lehmann, J., Auer, S., & Brümmer, M. (2013). Integrating NLP using linked data. In International semantic web conference, pp. 98–113. Springer.
Horrocks, I. (2008). Ontologies and the semantic web. Communications of the ACM, 51(12), 58–67.
Horrocks, I., Patel-Schneider, P., & Van Harmelen, F. (2003). From SHIQ and RDF to OWL: The making of a web ontology language. Web Semantics, 1(1), 7–26.
Hotho, A., Maedche, A., & Staab, S. (2002). Ontology-based text document clustering. KI, 16(4), 48–54.
Isaac, A., & Summers, E. (2009). Skos simple knowledge organization system primer. w3c recommendation. Technical Report, World Wide Web Consortium (W3C).
Kasper, W., & Vela, M. (2012). Sentiment analysis for hotel reviews. Speech Technology, 4(2), 96–109.
Knijff, J., Frasincar, F., & Hogenboom, F. (2013). Domain taxonomy learning from text: The subsumption method versus hierarchical clustering. Data & Knowledge Engineering, 83, 54–69. doi:10.1016/j.datak.2012.10.002.
Kohler, J., Philippi, S., Specht, M., & Ruegg, A. (2006). Ontology based text indexing and querying for the semantic web. Knowledge-Based Systems, 19(8), 744–754.
Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. (2013). Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications, 40(10), 4065–4074.
Kontostathis, A., Edwards, L., & Leatherman, A. (2009). Chatcoder: Toward the tracking and categorization of internet predators. In Society for industrial and applied mathematics—9th SIAM international conference on data mining 2009, Proceedings in applied mathematics, Vol 3. pp. 1327–1334.
Kubota, R., & Lin, A. (2010). Race, culture, and identities in second language education: Exploring critically engaged practice. New York: Taylor & Francis.
Li, C., Yang, J., & Park, S. (2012). Text categorization algorithms using semantic approaches, corpus-based thesaurus and wordnet. Expert Systems with Applications, 39(1), 765–772.
Litosseliti, L. (2014). Gender and language theory and practice. New York: Taylor & Francis.
Loenen, T., & Rodrigues, P. (1999). Non-discrimination law: Comparative perspectives. Alphen aan den Rijn: Kluwer Law International.
Luo, Q., Chen, E., & Xiong, H. (2011). A semantic term weighting scheme for text categorization. Expert Systems with Applications, 38(10), 12,708–12,716.
Machhour, H., & Kassou, I. (2013). Improving text categorization: A fully automated ontology based approach. In 2013 Third international conference on communications and information technology (ICCIT), IEEE, pp. 67–72.
Maedche, A., & Staab, S. (2001). Ontology learning for the semantic web. IEEE Intelligent Systems and Their Applications, 16(2), 72–79.
McCrae, J., Aguado-de Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gómez-Pérez, A., et al. (2012). Interchanging lexical resources on the semantic web. Language Resources and Evaluation, 46(4), 701–719.
Mowbray, J. (2012). Linguistic justice: International law and language policy. Oxford: OUP.
ODP. (2010). Owl list pattern. http://ontologydesignpatterns.org/wiki/Submissions:List. Accessed 18 May 2016.
Orelus, P. (2011). Rethinking race, class, language, and gender: A dialogue with noam chomsky and other leading scholars. Lanham, MD: Rowman & Littlefield Publishers.
Salguero, A., & Espinilla, M. (2016). Description logic class expression learning applied to sentiment analysis. Cham: Springer. doi:10.1007/978-3-319-30319-2_5.
Santorini, B. (1990). Part-of-speech tagging guidelines for the penn treebank project (3rd revision). Technical Report, University of Pennsylvania.
Schiek, D., & Lawson, A. (2011). European union non-discrimination law and intersectionality: Investigating the triangle of racial, gender and disability discrimination. Farnham: Ashgate.
Shuy, R. W. (2007). Fighting over words: Language and civil law cases: Language and civil law cases. Oxford: Oxford University Press.
Sirin, E., Parsia, B., Grau, B., Kalyanpur, A., & Katz, Y. (2007). Pellet: A practical owl-dl reasoner. Web Semantics, 5(2), 51–53.
Tablan, V., Bontcheva, K., Roberts, I., & Cunningham, H. (2015). Mímir: An open-source semantic search framework for interactive information seeking and discovery. Web Semantics: Science, Services and Agents on the World Wide Web, 30, 52–68. doi:10.1016/j.websem.2014.10.002 http://www.sciencedirect.com/science/article/pii/S1570826814001036, semantic Search.
Talbot, M. (2010). Language and gender. New York: Wiley.
Tontti, J. (2004). Right and prejudice: Prolegomena to a hermeneutical philosophy of law. Farnham: Ashgate.
University of Newcastle. (2006). Inclusive language policy 000797. http://www.newcastle.edu.au/policy/000797.html.
Uschold, M., & Gruninger, M. (1996). Ontologies: Principles, methods and applications. Knowledge Engineering Review, 11(2), 93–136.
Uschold, M., Gruninger, M., et al. (1996). Ontologies: Principles, methods and applications. Knowledge Engineering Review, 11(2), 93–136.
Wang, P., Hu, H. J. J. Z., & Chen, Z. (2009). Using wikipedia knowledge to improve text classification. Knowledge and Information Systems, 19(3), 265–281.
Wei, T., Lu, Y., Chang, H., Zhou, Q., & Bao, X. (2015). A semantic approach for text clustering using wordnet and lexical chains. Expert Systems with Applications, 42(4), 2264–2275. doi:10.1016/j.eswa.2014.10.023.
Weller, P., Purdam, K., Ghanea, N., & Cheruvallil-Contractor, S. (2013). Religion or belief, discrimination and equality: britain in global contexts. London: Bloomsbury Publishing.
Xu, H., Zhang, F., & Wang, W. (2015). Implicit feature identification in chinese reviews using explicit topic mining model. Knowledge-Based Systems, 76, 166–175. doi:10.1016/j.knosys.2014.12.012.
Yates, S. (2001). Gender, language and CMC for education. Learning and Instruction, 11(1), 21–34.
Zhang, F., Ma, Z., & Li, W. (2015). Storing owl ontologies in object-oriented databases. Knowledge-Based Systems, 76, 240–255. doi:10.1016/j.knosys.2014.12.020.
Acknowledgements
This contribution has been supported by the Andalusian Institute of Women, Junta de Andalucía, Spain (Grant No. UNIVER09/2009/23/00).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Relevant class descriptions for discriminative language detection
Extra-visibility \(\equiv \) DisabledPeople \(\sqcup \) RacePeople \(\sqcup \) ReligionPeople \(\sqcup \) Sex \(\sqcap \) \(\exists \) hasNext Noun
InappropriateTitles \(\equiv \) Dr \(\sqcup \) Mr \(\sqcap \) \(\exists \) hasNext (\(\exists \) hasNext (Feminine \(\sqcap \) ProperNoun)) \(\sqcup \) \(\exists \) hasNext (\(\exists \) hasNext (\(\exists \) hasNext (Feminine \(\sqcap \) ProperNoun))) \(\sqcup \) \(\exists \) hasPrevious (\(\exists \) hasPrevious (Feminine \(\sqcap \) ProperNoun)) \(\sqcup \) \(\exists \) hasPrevious (\(\exists \) hasPrevious (\(\exists \) hasPrevious (Feminine \(\sqcap \) ProperNoun))) \(\sqcup \) Mrs \(\sqcup \) Ms \(\sqcap \) \(\exists \) hasNext (\(\exists \) hasNext (Masculine \(\sqcap \) ProperNoun)) \(\sqcup \) \(\exists \) hasNext (\(\exists \) hasNext (\(\exists \) hasNext (Masculine \(\sqcap \) ProperNoun))) \(\sqcup \) \(\exists \) hasPrevious (\(\exists \) hasPrevious (Masculine \(\sqcap \) ProperNoun)) \(\sqcup \) \(\exists \) hasPrevious (\(\exists \) hasPrevious (\(\exists \) hasPrevious (Masculine \(\sqcap \) ProperNoun)))
ManAsVerb \(\equiv \) Man \(\sqcup \) Manning \(\sqcap \) \(\exists \) hasNext The
ManPrecededByForInOf \(\equiv \) Man \(\sqcap \) \(\exists \) hasNext (For \(\sqcup \) In \(\sqcup \) Of)
ManPrecededByForInOf \(\sqsubseteq \) ManAlternative
MenWomenOrder \(\equiv \) He \(\sqcap \) \(\exists \) hasNext (\(\exists \) hasNext She) \(\sqcup \) Him \(\sqcap \) \(\exists \) hasNext (\(\exists \) hasNext Her) \(\sqcup \) His \(\sqcap \) \(\exists \) hasNext (\(\exists \) hasNext Hers) \(\sqcup \) Men \(\sqcap \) \(\exists \) hasNext (\(\exists \) hasNext Women) \(\sqcup \) Sir \(\sqcap \) \(\exists \) hasNext (\(\exists \) hasNext Madam)
NeutralMasculinePronoun \(\equiv \) Masculine \(\sqcap \) Pronoun \(\sqcap \) \(\exists \) isPrecededBy ProperNoun
NeutralMasculinePronoun \(\equiv \) Masculine \(\sqcap \) Pronoun \(\sqcap \) \(\exists \) hasNext (\(\exists \) hasNext (Feminine \(\sqcap \) Pronoun))
SexistDescription \(\equiv \) Adjective \(\sqcap \) \(\exists \) hasNext Women \(\sqcap \) \(\exists \) hasPrevious (And \(\sqcap \) \(\exists \) hasPrevious (Men \(\sqcap \) \(\exists \) hasPrevious Adjective))
Stereotyping \(\equiv \) Sufferer \(\sqcup \) Victim \(\sqcap \) \(\exists \) isFollowedBy Illness \(\sqcup \) \(\exists \) isPrecededBy Illness
Appendix 2: Rules for detecting discriminative language
Rule | Description | Examples | |
---|---|---|---|
2.1. Extra-visibility | It is quite unnecessary to mention a person’s sex, race, ethnic background, religion or disability | Male nurse; female engineer; muslim student; Black police officer | \(\checkmark \) |
3.1.1. Invisibility | Women are often invisible in language due to the use of the masculine pronouns ‘he’, ‘him’, ‘his’ to refer to both men and women, and the use of ‘man’ as a noun, verb or adjective | Mankind; man made | \(\checkmark \) |
3.1.2. Inferiority | Unnecessary mention of gender to suggest that in certain roles women are inferior to men. The use of ‘feminine’ suffixes such as ‘ette’, ‘ess’, ‘ienne’ and ‘trix’ are unnecessary | Female engineer; woman academic; actress | \(\checkmark \) |
3.2.1. Use alternatives for ‘man’ | Mankind; the best man for the job; the man in the street; man of letters, men of science; manpower; manmade | \(\checkmark \) | |
3.2.2. Avoid the use of ‘man’ as a verb | We need someone to man the desk; manning the office; She will man the phones | \(\checkmark \) | |
3.2.4. Find alternatives to ‘he’ and ‘his’ | The pronouns ‘he’, ‘his’ and ‘him’ are frequently used as generic pronouns. As this use is both ambiguous and excludes women, try to find alternatives | The student may exercise his right to appeal | \(\checkmark \) |
3.2.7. Use alternatives for sex-specific occupation terms | Avoid the impression that these positions are male-exclusive. Avoid using occupational titles containing the ‘feminine’ suffixes -ess, -ette, -trix, -ienne. | Chairman; headmaster; headmistress; policeman; businessman; layman; groundsman; actress; executrix; authoress; comedienne | \(\checkmark \) |
3.2.8. Use appropriate titles and other modes of address | The inappropriate use of names, titles, salutations and endearments create the impression that women merit less respect or less serious consideration that men do. Ensure that people’s qualifications are accurately reflected in their title, and that women’s and men’s academic titles are used in a parallel fashion | Albert Einstein and Mrs Mead; Ms Clark and John Howard; Judy Smith and Dr Nguyen | Partially |
3.2.9. Use of Ms, Mrs, Miss, Mr | The use of ‘Ms’ is recommended for all women when the parallel ‘Mr’ is applicable, and ‘Ms’ should be used when a woman’s title of preference is unknown | \(\checkmark \) | |
3.2.10. Avoid patronising expressions | Use the words ‘man’/‘woman’, ‘girl’/‘boy’, ‘gentleman’/‘lady’ in a parallel manner | The girls in the office; Ladies; My girl will take care of that immediately | \(\checkmark \) |
3.2.12. Avoid sexist descriptions | Avoid the use of stereotyped generalisations about men’s and women’s characters and patterns of behaviour | Strong men and domineering women; assertive men and aggressive women; angry men and hysterical women | \(\checkmark \) |
4.1.1. Derogatory labelling | They are still used, and should be avoided. Some acceptable alternatives for such labels are ’person with Down’s Syndrome’, ‘person with an intellectual disability’ | Cripple; mongoloid; deaf and dumb; retarded | \(\checkmark \) |
4.1.2. Depersonalising or impersonal reference | Often people with a disability are referred to collectively as the disabled, the handicapped, the mentally retarded, the blind, the deaf, or paraplegics, spastics, epileptics etc. | The disabled; the handicapped; disabled people; the physically handicapped; a paraplegic; paraplegics; an epileptic; the deaf | \(\checkmark \) |
4.1.3. Stereotyping | Never use the terms ‘victim’ or ‘sufferer’ to refer to a person who has or has had an illness, disease or disability | Victim of AIDS; AIDS sufferer; polio victim | \(\checkmark \) |
Rights and permissions
About this article
Cite this article
Salguero, A., Espinilla, M. A flexible text analyzer based on ontologies: an application for detecting discriminatory language. Lang Resources & Evaluation 52, 185–215 (2018). https://doi.org/10.1007/s10579-017-9387-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-017-9387-6