skip to main content
10.1145/3282373.3282414acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
short-paper

Segmentation-based Unsupervised Phrase Detection

Published:19 November 2018Publication History

ABSTRACT

In this paper, we propose a new approach to unsupervised phrase detection that is based on a sentence segmentation. Unlike the existing approach that examines only word-based statistics, the proposed method detects phrases by considering the most likely segmentation for each sentence. We develop a Bayesian model that estimates phrase boundaries and the grammatical roles of each phrase at the same time, which can be trained in an unsupervised manner by using Gibbs sampling. The experimental results show that the phrase detection by using the proposed model can recognize about 30 times more phrases than the existing popular method in the same precision because of the successful detection of infrequent phrases.

References

  1. Fazli Can, Rabia Nuray, and Ayisigi B. Sevdik. 2004. Automatic performance evaluation of Web search engines. Information Processing & Management 40, 3 (2004), 495--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jianfeng Gao and Mark Johnson. 2008. A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08). 344--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Thomas L. Griffiths, Mark Steyvers, David M. Blei, and Joshua B. Tenenbaum. 2004. Automatic Keyphrase Extraction: A Survey of the State of the Art. In Advances in Neural Information Processing Systems 17 (NIPS '04). 537--544.Google ScholarGoogle Scholar
  4. Kazi Saidul Hasan and Vincent Ng. 2014. Automatic Keyphrase Extraction: A Survey of the State of the Art. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL '14). 1262--1273.Google ScholarGoogle ScholarCross RefCross Ref
  5. I. Korkontzelos. 2010. Unsupervised Learning of Multiword Expressions. Ph.D. Dissertation. Department of Computer Science, University of York.Google ScholarGoogle Scholar
  6. Robert V. Lindsey, III William P. Headden, and Michael J. Stipicevic. 2012. A phrase-discovering topic model using hierarchical Pitman-Yor processes. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 12). 214--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Christopher Manning and Hinrich Schuetze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26 (NIPS '13). 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kevin P. Murphy and Mark A. Paskin. 2001. Linear-time inference in Hierarchical HMMs. In Advances in Neural Information Processing Systems 14 (NIPS 01). 833--840. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. John K. Pate and Sharon Goldwater. 2011. Unsupervised syntactic chunking with acoustic cues: computational models for prosodic bootstrapping. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics (CMCL 11). 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Elias Ponvert, Jason Baldridge, and Katrin Erk. 2011. Simple unsupervised grammar induction from raw text with cascaded finite state models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT '11). 1077--1086. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Steven L. Scott. 2002. Bayesian Methods for Hidden Markov Models: Recursive Computing in the 21st Century. J. Amer. Statist. Assoc. 97, 457 (2002), 337--351.Google ScholarGoogle ScholarCross RefCross Ref
  13. Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval. In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining (ICDM '07). 697--702. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Segmentation-based Unsupervised Phrase Detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services
        November 2018
        419 pages

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 November 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)3
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader