short-paper

Segmentation-based Unsupervised Phrase Detection

Author:
Kei Wakabayashi

University of Tsukuba, Tsukuba, Ibaraki

University of Tsukuba, Tsukuba, Ibaraki
View Profile

iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & ServicesNovember 2018Pages 138–142https://doi.org/10.1145/3282373.3282414

Published:19 November 2018Publication History

iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services

Pages 138–142

ABSTRACT

In this paper, we propose a new approach to unsupervised phrase detection that is based on a sentence segmentation. Unlike the existing approach that examines only word-based statistics, the proposed method detects phrases by considering the most likely segmentation for each sentence. We develop a Bayesian model that estimates phrase boundaries and the grammatical roles of each phrase at the same time, which can be trained in an unsupervised manner by using Gibbs sampling. The experimental results show that the phrase detection by using the proposed model can recognize about 30 times more phrases than the existing popular method in the same precision because of the successful detection of infrequent phrases.

References

Fazli Can, Rabia Nuray, and Ayisigi B. Sevdik. 2004. Automatic performance evaluation of Web search engines. Information Processing & Management 40, 3 (2004), 495--514. Google ScholarDigital Library
Jianfeng Gao and Mark Johnson. 2008. A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08). 344--352. Google ScholarDigital Library
Thomas L. Griffiths, Mark Steyvers, David M. Blei, and Joshua B. Tenenbaum. 2004. Automatic Keyphrase Extraction: A Survey of the State of the Art. In Advances in Neural Information Processing Systems 17 (NIPS '04). 537--544.Google Scholar
Kazi Saidul Hasan and Vincent Ng. 2014. Automatic Keyphrase Extraction: A Survey of the State of the Art. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL '14). 1262--1273.Google ScholarCross Ref
I. Korkontzelos. 2010. Unsupervised Learning of Multiword Expressions. Ph.D. Dissertation. Department of Computer Science, University of York.Google Scholar
Robert V. Lindsey, III William P. Headden, and Michael J. Stipicevic. 2012. A phrase-discovering topic model using hierarchical Pitman-Yor processes. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 12). 214--222. Google ScholarDigital Library
Christopher Manning and Hinrich Schuetze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press. Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26 (NIPS '13). 3111--3119. Google ScholarDigital Library
Kevin P. Murphy and Mark A. Paskin. 2001. Linear-time inference in Hierarchical HMMs. In Advances in Neural Information Processing Systems 14 (NIPS 01). 833--840. Google ScholarDigital Library
John K. Pate and Sharon Goldwater. 2011. Unsupervised syntactic chunking with acoustic cues: computational models for prosodic bootstrapping. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics (CMCL 11). 20--29. Google ScholarDigital Library
Elias Ponvert, Jason Baldridge, and Katrin Erk. 2011. Simple unsupervised grammar induction from raw text with cascaded finite state models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT '11). 1077--1086. Google ScholarDigital Library
Steven L. Scott. 2002. Bayesian Methods for Hidden Markov Models: Recursive Computing in the 21st Century. J. Amer. Statist. Assoc. 97, 457 (2002), 337--351.Google ScholarCross Ref
Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval. In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining (ICDM '07). 697--702. Google ScholarDigital Library

Index Terms

Segmentation-based Unsupervised Phrase Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Topic modeling

Recommendations

Topic-Based Unsupervised and Supervised Dictionary Induction
Word translation is a natural language processing task that provides translation between the words of a source and a target language. As a task, it reduces to the induction of a bilingual dictionary, which is typically performed by aligning word ...
Read More
Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modeling
Abstract
Because of its efficiency, word embedding has been widely used in many natural language processing and text modeling tasks. It aims to represent each word by a vector so such that the geometry between these vectors can capture the ...
Read More
Pruning-Based Unsupervised Segmentation for Korean

Compound noun segmentation is a key component for Korean language processing. Supervised approaches require some types of human intervention such as maintaining lexicons, manually segmenting the corpora, or devising heuristic rules. Thus, they suffer ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services
November 2018
419 pages
ISBN:9781450364799
DOI:10.1145/3282373
Editors:
Maria Indrawan-Santiago,
Eric Pardede,
Ivan Luiz Salvadori,
Matthias Steinbauer,
Ismail Khalil,
Gabriele Anderst-Kotsis
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 November 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
hidden Markov models
phrase detection
topic modeling
Qualifiers
- short-paper
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 114
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Segmentation-based Unsupervised Phrase Detection

iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services

ABSTRACT

References

Cited By

Index Terms

Recommendations

Topic-Based Unsupervised and Supervised Dictionary Induction

Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modeling

Pruning-Based Unsupervised Segmentation for Korean