skip to main content
10.1145/3366424.3382086acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Measuring and Controlling Text Generation by Semantic Search

Published: 20 April 2020 Publication History

Abstract

Our motivation in this work is to measure patent text generation by semantic search, particularly by textual similarity in high dimensional space for neural network models. The objective is to control patent text generation by semantic search. Conceptually it is an attempt to integrate two subfields in NLP: text generation and semantic search. In our previous milestone of the PatentTransformer project, a prototype based on GPT-2 is capable of generating fluent patent title, abstract, independent claim, and dependent claim. However, beneath the surface form, the quality issue in the generated patent text was less explored. How to control text generation is also a hard problem in NLP field. We would like to address these issues in this work and experiment with different approaches. On the measurement side, this work will address the quality measurement issue from the perspective of textual similarity. Based on that, the approaches we propose include two embedding spaces, span-based textual similarity, and language model for patent claim spans. One the control side, we propose a knob-turning approach for controlling text generation based on measuring a range of textual similarity. In this way, we can search for a Goldilocks zone in which the similarity of generated patent text is close to but not too far from prior patents. We hypothesize that patent novelty may exist in such a zone.

References

[1]
Google AI Blog. 2019. Multilingual Universal Sentence Encoder for Semantic Retrieval. Retrieved March 10, 2020 from https://ai.googleblog.com/2019/07/multilingual-universal-sentence-encoder.html
[2]
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Brussels, Belgium, 169–174. https://doi.org/10.18653/v1/D18-2029
[3]
Google Cloud. [n.d.]. Building a real-time embeddings similarity matching system. Retrieved March 10, 2020 from https://cloud.google.com/solutions/machine-learning/building-real-time-embeddings-similarity-matching-system
[4]
Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2019. Plug and Play Language Models: a Simple Approach to Controlled Text Generation. arXiv preprint arXiv:1912.02164(2019). To appear in ICLR 2020.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
[6]
Santosh Gupta. 2020. Natural Language Recommendations: A novel research paper search engine developed entirely with embedding and transformer models.Retrieved March 10, 2020 from https://github.com/Santosh-Gupta/NaturalLanguageRecommendations
[7]
Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. 2019. CTRL: A Conditional Transformer Language Model for Controllable Generation. arxiv:1909.05858 [cs.CL]
[8]
Jieh-Sheng Lee. 2019. PatentTransformer: A Framework for Personalized Patent Claim Generation. arxiv:1912.03502 [cs.CL] Presented at the 32nd International Conference on Legal Knowledge and Information Systems (JURIX 2019) and to appear in the CEUR Workshop Proceedings.
[9]
Jieh-Sheng Lee. 2019. PatentTransformer/v1. Retrieved March 10, 2020 from https://github.com/jiehsheng/PatentTransformer/tree/master/v1
[10]
Jieh-Sheng Lee. 2020. PatentTransformer/v2. Retrieved March 10, 2020 from https://github.com/jiehsheng/PatentTransformer/tree/master/v2
[11]
Jieh-Sheng Lee and Jieh Hsiang. 2019. Measuring Patent Claim Generation by Span Relevancy. In Proceedings of the Thirteenth International Workshop on Juris-informatics (JURISIN). Keio University Kanagawa, Japan.
[12]
Jieh-Sheng Lee and Jieh Hsiang. 2019. Patent Claim Generation by Fine-Tuning OpenAI GPT-2.arxiv:1907.02052v1 [cs.CL]
[13]
Jieh-Sheng Lee and Jieh Hsiang. 2019. PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model. arxiv:1906.02124 [cs.CL]
[14]
Jieh-Sheng Lee and Jieh Hsiang. 2020. PatentTransformer-2: Controlling Patent Text Generation by Structural Metadata. arxiv:2001.03708 [cs.CL]
[15]
Alec Radrof, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2018. Language Models are Unsupervised Multitask Learners.
[16]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett(Eds.). Curran Associates, Inc., 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
[17]
Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. arXiv preprint arXiv:1906.05714(2019). https://arxiv.org/abs/1906.05714
[18]
Yinfei Yang, Daniel Matthew Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernández Ábrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2019. Multilingual Universal Sentence Encoder for Semantic Retrieval. ArXiv abs/1907.04307(2019).

Index Terms

  1. Measuring and Controlling Text Generation by Semantic Search
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '20: Companion Proceedings of the Web Conference 2020
          April 2020
          854 pages
          ISBN:9781450370240
          DOI:10.1145/3366424
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 20 April 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. GPT-2.
          2. natural language generation
          3. natural language processing
          4. patent
          5. semantic search
          6. textual similarity

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '20
          Sponsor:
          WWW '20: The Web Conference 2020
          April 20 - 24, 2020
          Taipei, Taiwan

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 237
            Total Downloads
          • Downloads (Last 12 months)12
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 02 Mar 2025

          Other Metrics

          Citations

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media