Skip to main content

Advertisement

Log in

Quantifying the genericness of trademarks using natural language processing: an introduction with suggested metrics

  • Original Research
  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

If a trademark (“mark”) becomes a generic term, it may be cancelled under trademark law, a process known as genericide. Typically, in genericide cases, consumer surveys are brought into evidence to establish a mark’s semantic status as generic or distinctive. Some drawbacks of surveys are cost, delay, small sample size, lack of reproducibility, and observer bias. Today, however, much discourse involving marks is online. As a potential complement to consumer surveys, therefore, we explore an artificial intelligence approach based chiefly on word embeddings: mathematical models of meaning based on distributional semantics that can be trained on texts selected for jurisdictional and temporal relevance. After identifying two main factors in mark genericness, we first offer a simple screening metric based on the ngram frequency of uncapitalized variants of a mark. We then add two word embedding metrics: one addressing contextual similarity of uncapitalized variants, and one comparing the neighborhood density of marks and known generic terms in a category. For clarity and validation, we illustrate our metrics with examples of genericized, somewhat generic, and distinctive marks such as, respectively, DUMPSTER, DOBRO, and ROLEX.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

All data used in the paper is publicly available.

Code availability

Code for reproducing the results can be obtained by contacting the corresponding author.

Notes

  1. Throughout the paper, the usual convention of capitalization is used to refer to a mark generally (e.g., ASPIRIN). However, when a mark is used as an object of analysis it is shown in bold type to emphasize the orthographic form referred to (e.g., Aspirin or aspirin). Terms used in analysis that are not trademarks are rendered in italics (e.g., dog).

  2. We believe, however, that the methods outlined here are relevant to other jurisdictions around the world and, indeed, to many other aspects of monitoring trademark landscapes.

  3. Orthography more generally refers to all the conventions for writing a language including, for example, hyphenation, punctuation and spelling. These are all indeed relevant to trademark genericness but our focus in this paper is on the presence or absence of an initial capital letter such as Rolex or rolex. The lower case form we refer to as an “orthographic variant” or the “regularized form”. These general terms leave the door open to analysis of other cases such as internal capital letter loss as in iPhone and iphone.

  4. It is important to note we use regularize and regularization in this orthographic sense and not in relation to natural language processing or other statistical methods.

  5. DUMPSTER was cancelled in 2015.

  6. In fact, the PEAVEY score may be due to a rare homonym that is an alternative spelling of “peavy”, a timber handling tool, which may account for the wide separation of Peavey and peavey in Fig. 5. In a real world analysis this would have to be differentiated in the corpora used.

  7. In the word embedding model used in this paper, for example, “Friday” and “friday” have a relatively low similarity at 0.43.

  8. The field of explainable artificial intelligence (XAI) will no doubt be relevant to many legal proceedings in coming years. A useful starting point for those interested is Ribeiro et al. (2016).

References

  • Abood A, Feltenberger D (2018) Automated patent landscaping. Artif Intell Law 26:103–125. https://doi.org/10.1007/s10506-018-9222-4

    Article  Google Scholar 

  • Bayer Co. v. United Drug Co., (1921) No. 17: 492, pp. 272 505

  • Chalkidis I, Kampas D (2019) Deep learning in law: early adaptation and legal word embeddings trained on large corpora. Artif Intell Law 27:171–198. https://doi.org/10.1007/s10506-018-9238-9

    Article  Google Scholar 

  • Devlin J, Chang MW, Lee K, Toutanova K (2018). BERT: pre-training of deep bidirectional transformers for language understanding. http://arxiv.org/abs/1810.04805

  • Elliott v. Google Inc (2017) No. 15-15809, 860 1151 (Court of Appeals, 9th Circuit 2017)

  • Fechter GH, Slavin E (2011) Practical Tips on Avoiding genericide. international trademark association (INTA) Bulletin 66(20)

  • Firth JR (1957) A synopsis of linguistic theory 1930–1955. In Studies in Linguistic Analysis, pp 1–32. Oxford: Philological Society. Reprinted in F.R. Palmer (ed) Selected Papers of J.R. Firth 1952–1959, Longman, London

  • Fu R, Guo J, Qin B, Che W, Wang H, Liu T (2014) Learning semantic hierarchies via word embeddings. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, Baltimore, Maryland

  • Geffet M, Dagan I (2005) The distributional inclusion hypotheses and lexical entailment. In: Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05)

  • He R, McAuley J (2016) Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the the 25th international conference on world wide web (WWW ’16)

  • Landes W, Posner R (1987) Trademark law: an economic perspective. J. law Econ 30(2):265–309

    Article  Google Scholar 

  • Linford J (2015) A linguistic justification for protecting generic trademarks. Yale JL Tech 17:110–145

    Google Scholar 

  • List of generic and genericized trademarks (2020) Wikipedia. https://en.wikipedia.org/wiki/List_of_generic_and_genericized_trademarks. Accessed 12 June 2020

  • Michel JB, Shen YK, Aiden AP, Veres A, Gray MK, Pickett JP, Aiden EL (2011) Quantitative analysis of culture using millions of digitized books. Science 331(6014):176. https://doi.org/10.1126/science.1199644

    Article  Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean, J (2013). Efficient estimation of word representations in vector space. http://arxiv.org/abs/1301.3781

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Paper presented at :advances in neural information processing systems (NIPS 2013)

  • Pannitto L, Salicchi L, Lenci A (2018) Refining the distributional inclusion hypothesis for unsupervised hypernym identification. Ital J Comput Linguist 4(2):45–56

    Article  Google Scholar 

  • Pechenick EA, Danforth CM, Dodds PS (2015) Characterizing the google books corpus: strong limits to inferences of socio-cultural and linguistic evolution. PLoS ONE 10(10):e0137041. https://doi.org/10.1371/journal.pone.0137041

    Article  Google Scholar 

  • Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks

  • Ribeiro MT, Singh S, Guestrin C. (2016). Why should I trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp.1135–1144

  • Shwartz V, Goldberg Y, Dagan I. (2016). Improving hypernymy detection with an integrated path-based and distributional method. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Berlin, Germany

  • Stuhlbarg Int’l Sales Co. v. John D. Brush & Co., No. 99-56676, 240 832 (Court of Appeals, 9th Circuit 2001)

  • Walsh MG (2013) Protecting your brand against the heartbreak of genericide. Bus Horiz 56(2):159–166

    Article  Google Scholar 

  • Weeds J, Clarke D, Reffin J, Weir D, Keller B (2014) Learning to Distinguish Hypernyms and Co-Hyponyms. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, Dublin, Ireland

  • Weeds J, Weir D, McCarthy D (2004) Characterising measures of lexical distributional similarity. In: Proceedings of the 20th international conference on computational linguistics (COLING 2004)

  • Younes N, Reips UD (2019) Guideline for improving the reliability of google ngram studies: evidence from religious terms. PLoS ONE 14(3):e0213554. https://doi.org/10.1371/journal.pone.0213554

    Article  Google Scholar 

Download references

Funding

There are no funding sources to declare.

Author information

Authors and Affiliations

Authors

Contributions

The research is the sole original work of the two listed authors, Dr CS and Dr LDV. Quantifying the genericness of trademarks using natural language processing: an introduction with suggested metrics.

Corresponding author

Correspondence to Cameron Shackell.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shackell, C., De Vine, L. Quantifying the genericness of trademarks using natural language processing: an introduction with suggested metrics. Artif Intell Law 30, 199–220 (2022). https://doi.org/10.1007/s10506-021-09291-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-021-09291-7

Keywords

Navigation