research-article

Effective Identification of Distinctive Wordmarks

Authors:

Sujatha Das Gollapalli,

Kim Jung-JaeAuthors Info & Claims

WWW '20: Companion Proceedings of the Web Conference 2020

Pages 471 - 477

https://doi.org/10.1145/3366424.3386200

Published: 20 April 2020 Publication History

Abstract

A wordmark or a logotype string is a stylized text used by companies or businesses for the purposes of identification and branding products. Pie face under the product category “baked goods and pastries” and airbus smarter fleet under the product category “data processing and aircraft maintenance software” are examples of wordmarks. The wordmark strings are manually examined by patent officers for their distinctiveness and uniqueness to be accepted as “protected intellectual property” for specific businesses. We address the problem of automatically identifying acceptable English wordmarks based on their textual content. Different from most text mining tasks, our problem involves the classification of a small set of words (often less than five) in context of a specific product description. We handle this sparsity challenge by designing a range of features for characterizing wordmarks using the syntactic and linguistic properties of words as well as by incorporating co-occurrence and similarity information from external resources such as WordNet, Wikipedia, and Word Embeddings. We investigate machine learning models for this novel task and study their classification effectiveness on a large dataset of about 71K wordmarks.

References

[1]

2004. WIPO intellectual property handbook: policy, law and use. WIPO. http://www.wipo.int/freepublications/en/intproperty/489/wipo_pub_489.pdf

[2]

Charu C. Aggarwal. 2018. Neural Networks and Deep Learning - A Textbook. Springer. https://doi.org/10.1007/978-3-319-94463-0

[3]

Charu C. Aggarwal and ChengXiang Zhai. 2012. A Survey of Text Classification Algorithms. Springer US.

[4]

Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. 2007. Open Information Extraction from the Web. In IJCAI.

[5]

Yoshua Bengio. 2011. Deep Learning of Representations for Unsupervised and Transfer Learning. In Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop - Volume 27(UTLW’11).

Digital Library

[6]

Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag.

Digital Library

[7]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003).

[8]

Ziqiang Cao, Wenjie Li, Sujian Li, and Furu Wei. 2017. Improving Multi-document Summarization via Text Classification. In AAAI.

[9]

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Int. Res. 16, 1 (June 2002).

Digital Library

[10]

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In ACL.

[11]

Dirk De Hertog and Anaïs Tack. 2018. Deep Learning Architecture for Complex Word Identification. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications.

[12]

Gregory Druck, Gideon S. Mann, and Andrew McCallum. 2008. Learning from labeled features using generalized expectation criteria. In SIGIR.

[13]

Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. (1998). https://wordnet.princeton.edu/.

[14]

Wei Fu and Tim Menzies. 2017. Easy over Hard: A Case Study on Deep Learning. In Foundations of Software Engineering.

[15]

Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely Randomized Trees. Mach. Learn. 63, 1 (April 2006).

Digital Library

[16]

Viktor Golem, Mladen Karan, and Jan Šnajder. 2018. Combining Shallow and Deep Learning for Aggressive Text Detection. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). ACL.

[17]

Devamanyu Hazarika, Soujanya Poria, Sruthi Gorantla, Erik Cambria, Roger Zimmermann, and Rada Mihalcea. 2018. CASCADE: Contextual Sarcasm Detection in Online Discussion Forums. In ACL.

[18]

Jingrui He, Wei Shen, Phani Divakaruni, Laura Wynter, and Rick Lawrence. 2013. Improving Traffic Prediction with Tweet Semantics. In IJCAI.

[19]

Thomas Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR.

[20]

Wenyi Huang, Zhaohui Wu, Chen Liang, Prasenjit Mitra, and C. Lee Giles. 2015. A Neural Probabilistic Model for Context Based Citation Recommendation. In AAAI.

[21]

Aminul Islam, Diana Inkpen, and Iluju Kiringa. 2008. Applications of Corpus-based Semantic Similarity and Word Segmentation to Database Schema Matching. The VLDB Journal 17, 5 (Aug. 2008).

Digital Library

[22]

Onur Kuru. 2016. AI-KU at SemEval-2016 Task 11: Word Embeddings and Substring Features for Complex Word Identification.

[23]

Ji Young Lee and Franck Dernoncourt. 2016. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks. In NAACL-HLT.

[24]

Yang Liu and Yi-fang Brook Wu. 2018. Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks. In AAAI.

[25]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. http://nlp.stanford.edu/IR-book/

[26]

Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit. (2002). http://mallet.cs.umass.edu.

[27]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In NeurIPS.

[28]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP).

[29]

Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2018. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv. (Sept. 2018).

[30]

Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kavukcuoglu, and Jason Weston. 2009. Combining Labeled and Unlabeled Data with Word-class Distribution Learning. In CIKM.

[31]

Erik Tjong Kim Sang. 2007. Extracting Hypernym Pairs from the Web. In ACL.

[32]

Le Song, Santosh Vempala, John Wilmes, and Bo Xie. 2017. On the Complexity of Learning Neural Networks. In NeurIPS.

[33]

Xingyi Song, Johann Petrak, and Angus Roberts. 2018. A Deep Neural Network Sentence Level Classification Method with Context Information. In EMNLP.

[34]

N. Thai-Nghe, Z. Gantner, and L. Schmidt-Thieme. 2010. Cost-sensitive learning methods for imbalanced data. In IJCNN.

[35]

Peter D. Turney. 2001. Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL. In Proceedings of the 12th European Conference on Machine Learning.

Digital Library

[36]

Di Wang and Eric Nyberg. 2015. A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering. In ACL-IJCNLP.

[37]

Liwei Wang, Lunjia Hu, Jiayuan Gu, Yue Wu, Zhiqiang Hu, Kun He, and John Hopcroft. 2018. Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation. In NeurIPS.

[38]

Shiyang Wen and Xiaojun Wan. 2014. Emotion Classification in Microblog Texts Using Class Sequential Rules. In AAAI.

[39]

Chen Xing, Yu Wu, Wei Wu, Yalou Huang, and Ming Zhou. 2018. Hierarchical Recurrent Attention Network for Response Generation. In AAAI.

[40]

Huong Nguyen Thi Xuan, Anh Cuong Le, and Le Minh Nguyen. 2012. Linguistic Features for Subjectivity Classification. In Proceedings of the 2012 International Conference on Asian Language Processing.

Digital Library

[41]

Vikas Yadav and Steven Bethard. 2018. A Survey on Recent Advances in Named Entity Recognition from Deep Learning models. In ACL.

[42]

Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. Answer Extraction as Sequence Tagging with Tree Edit Distance. In ACL.

Index Terms

Effective Identification of Distinctive Wordmarks

Index terms have been assigned to the content through auto-classification.

Recommendations

Lexical Function Identification Using Word Embeddings and Deep Learning
Advances in Soft Computing
Abstract
In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical ...
Cross-lingual word analogies using linear transformations between semantic spaces
Highlights
- We generalize the word analogy task to evaluate cross-lingual semantic spaces.
- ...
Abstract
The ability to represent the meaning of words is one of the core parts of natural language understanding (NLU), with applications ranging across machine translation, summarization, question answering, information retrieval, etc. The ...
A study of lexical function detection with word2vec and supervised machine learning
Special Section: Applied Machine Learning and Management of Volatility, Uncertainty, Complexity & Ambiguity (V.U.C.A)

In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Companion Proceedings of the Web Conference 2020

April 2020

854 pages

ISBN:9781450370240

DOI:10.1145/3366424

Editors:
Amal El Fallah Seghrouchni
Sorbonne University, France
,
Gita Sukthankar
University of Central Florida, United States
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
104
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten