skip to main content
10.1145/3366424.3386200acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Effective Identification of Distinctive Wordmarks

Published: 20 April 2020 Publication History

Abstract

A wordmark or a logotype string is a stylized text used by companies or businesses for the purposes of identification and branding products. Pie face under the product category “baked goods and pastries” and airbus smarter fleet under the product category “data processing and aircraft maintenance software” are examples of wordmarks. The wordmark strings are manually examined by patent officers for their distinctiveness and uniqueness to be accepted as “protected intellectual property” for specific businesses. We address the problem of automatically identifying acceptable English wordmarks based on their textual content. Different from most text mining tasks, our problem involves the classification of a small set of words (often less than five) in context of a specific product description. We handle this sparsity challenge by designing a range of features for characterizing wordmarks using the syntactic and linguistic properties of words as well as by incorporating co-occurrence and similarity information from external resources such as WordNet, Wikipedia, and Word Embeddings. We investigate machine learning models for this novel task and study their classification effectiveness on a large dataset of about 71K wordmarks.

References

[1]
2004. WIPO intellectual property handbook: policy, law and use. WIPO. http://www.wipo.int/freepublications/en/intproperty/489/wipo_pub_489.pdf
[2]
Charu C. Aggarwal. 2018. Neural Networks and Deep Learning - A Textbook. Springer. https://doi.org/10.1007/978-3-319-94463-0
[3]
Charu C. Aggarwal and ChengXiang Zhai. 2012. A Survey of Text Classification Algorithms. Springer US.
[4]
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. 2007. Open Information Extraction from the Web. In IJCAI.
[5]
Yoshua Bengio. 2011. Deep Learning of Representations for Unsupervised and Transfer Learning. In Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop - Volume 27(UTLW’11).
[6]
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag.
[7]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003).
[8]
Ziqiang Cao, Wenjie Li, Sujian Li, and Furu Wei. 2017. Improving Multi-document Summarization via Text Classification. In AAAI.
[9]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Int. Res. 16, 1 (June 2002).
[10]
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In ACL.
[11]
Dirk De Hertog and Anaïs Tack. 2018. Deep Learning Architecture for Complex Word Identification. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications.
[12]
Gregory Druck, Gideon S. Mann, and Andrew McCallum. 2008. Learning from labeled features using generalized expectation criteria. In SIGIR.
[13]
Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. (1998). https://wordnet.princeton.edu/.
[14]
Wei Fu and Tim Menzies. 2017. Easy over Hard: A Case Study on Deep Learning. In Foundations of Software Engineering.
[15]
Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely Randomized Trees. Mach. Learn. 63, 1 (April 2006).
[16]
Viktor Golem, Mladen Karan, and Jan Šnajder. 2018. Combining Shallow and Deep Learning for Aggressive Text Detection. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). ACL.
[17]
Devamanyu Hazarika, Soujanya Poria, Sruthi Gorantla, Erik Cambria, Roger Zimmermann, and Rada Mihalcea. 2018. CASCADE: Contextual Sarcasm Detection in Online Discussion Forums. In ACL.
[18]
Jingrui He, Wei Shen, Phani Divakaruni, Laura Wynter, and Rick Lawrence. 2013. Improving Traffic Prediction with Tweet Semantics. In IJCAI.
[19]
Thomas Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR.
[20]
Wenyi Huang, Zhaohui Wu, Chen Liang, Prasenjit Mitra, and C. Lee Giles. 2015. A Neural Probabilistic Model for Context Based Citation Recommendation. In AAAI.
[21]
Aminul Islam, Diana Inkpen, and Iluju Kiringa. 2008. Applications of Corpus-based Semantic Similarity and Word Segmentation to Database Schema Matching. The VLDB Journal 17, 5 (Aug. 2008).
[22]
Onur Kuru. 2016. AI-KU at SemEval-2016 Task 11: Word Embeddings and Substring Features for Complex Word Identification.
[23]
Ji Young Lee and Franck Dernoncourt. 2016. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks. In NAACL-HLT.
[24]
Yang Liu and Yi-fang Brook Wu. 2018. Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks. In AAAI.
[25]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. http://nlp.stanford.edu/IR-book/
[26]
Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit. (2002). http://mallet.cs.umass.edu.
[27]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In NeurIPS.
[28]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP).
[29]
Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2018. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv. (Sept. 2018).
[30]
Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kavukcuoglu, and Jason Weston. 2009. Combining Labeled and Unlabeled Data with Word-class Distribution Learning. In CIKM.
[31]
Erik Tjong Kim Sang. 2007. Extracting Hypernym Pairs from the Web. In ACL.
[32]
Le Song, Santosh Vempala, John Wilmes, and Bo Xie. 2017. On the Complexity of Learning Neural Networks. In NeurIPS.
[33]
Xingyi Song, Johann Petrak, and Angus Roberts. 2018. A Deep Neural Network Sentence Level Classification Method with Context Information. In EMNLP.
[34]
N. Thai-Nghe, Z. Gantner, and L. Schmidt-Thieme. 2010. Cost-sensitive learning methods for imbalanced data. In IJCNN.
[35]
Peter D. Turney. 2001. Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL. In Proceedings of the 12th European Conference on Machine Learning.
[36]
Di Wang and Eric Nyberg. 2015. A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering. In ACL-IJCNLP.
[37]
Liwei Wang, Lunjia Hu, Jiayuan Gu, Yue Wu, Zhiqiang Hu, Kun He, and John Hopcroft. 2018. Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation. In NeurIPS.
[38]
Shiyang Wen and Xiaojun Wan. 2014. Emotion Classification in Microblog Texts Using Class Sequential Rules. In AAAI.
[39]
Chen Xing, Yu Wu, Wei Wu, Yalou Huang, and Ming Zhou. 2018. Hierarchical Recurrent Attention Network for Response Generation. In AAAI.
[40]
Huong Nguyen Thi Xuan, Anh Cuong Le, and Le Minh Nguyen. 2012. Linguistic Features for Subjectivity Classification. In Proceedings of the 2012 International Conference on Asian Language Processing.
[41]
Vikas Yadav and Steven Bethard. 2018. A Survey on Recent Advances in Named Entity Recognition from Deep Learning models. In ACL.
[42]
Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. Answer Extraction as Sequence Tagging with Tree Edit Distance. In ACL.

Index Terms

  1. Effective Identification of Distinctive Wordmarks
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '20: Companion Proceedings of the Web Conference 2020
        April 2020
        854 pages
        ISBN:9781450370240
        DOI:10.1145/3366424
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 April 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. co-occurrence statistics
        2. short-text classification
        3. word embeddings

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        WWW '20
        Sponsor:
        WWW '20: The Web Conference 2020
        April 20 - 24, 2020
        Taipei, Taiwan

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 104
          Total Downloads
        • Downloads (Last 12 months)7
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 02 Mar 2025

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media