Syntactic and Semantic Disambiguation of Numeral Strings Using an N-Gram Method

Min, Kyongho; Wilson, William H.; Moon, Yoo-Jin

doi:10.1007/11589990_11

Kyongho Min²⁰,
William H. Wilson²¹ &
Yoo-Jin Moon²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3809))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

2498 Accesses

Abstract

This paper describes the interpretation of numerals, and strings including numerals, composed of a number and words or symbols that indicate whether the string is a SPEED, LENGTH, or whatever. The interpretation is done at three levels: lexical, syntactic, and semantic. The system employs three interpretation processes: a word trigram constructor with tokeniser, a rule-based processor of number strings, and n-gram based disambiguation of meanings. We extracted numeral strings from 378 online newspaper articles, finding that, on average, they comprised about 2.2% of the words in the articles. We chose 287 of these articles to provide unseen test data (3251 numeral strings), and used the remaining 91 articles to provide 886 numeral strings for use in manually extracting n-gram constraints to disambiguate the meanings of the numeral strings. We implemented six different disambiguation methods based on category frequency statistics collected from the sample data and on the number of word trigram constraints of each category. Precision ratios for the six methods when applied to the test data ranged from 85.6% to 87.9%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Linguistic Approach to English Phrasal Verbs

On the Structural Disambiguation of Multi-word Terms

Extended N-gram Model for Analysis of Polish Texts

References

Asahara, M., Matsumoto, Y.: Japanese Named Entity Extraction with Redundant Morphological Analysis. In: Proceedings of HLT-NAACL 2003, pp. 8–15 (2003)
Google Scholar
Bikel, D., Schwartz, R., Weischedel, R.: An Algorithm that Learns What’s in a Name. Machine Learning 34, 211–231 (1999)
Article MATH Google Scholar
Black, W., Rinaldi, F., Mowatt, D.: FACILE: Description of the NE system used for MUC-7. In: Proceedings of Message Uunderstanding Conference (MUC-7) (1998)
Google Scholar
Dale, R.: A Framework for Complex Tokenisation and its Application to Newspaper Text. In: Proceedings of the second Australian Document Computing Symposium (1997)
Google Scholar
Earley, J.: An Efficient Context-Free Parsing Algorithm. CACM 13(2), 94–102 (1970)
MATH Google Scholar
Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named Entity Recognition from Diverse Text Types. In: Proceedings of Recent Advances in NLP (2001)
Google Scholar
Nelson, G., Wallis, S., Aarts, B.: Exlporing Natural Language - Working with the British Component of the International Corpus of English. John Benjamins, The Netherlands (2002)
Google Scholar
Polanyi, L., van den Berg, M.: Logical Structure and Discourse Anaphora Resolution. In: Proceedings of ACL 1999 Workshop on The Relation of Discourse/Dialogue Structure and Reference, pp. 10–117 (1999)
Google Scholar
Siegel, M., Bender, E.M.: Efficient Deep Processing of Japanese. In: Proceedings of the 3^rd Workshop on Asian Language Resources and International Standardization (2002)
Google Scholar
Torii, M., Kamboj, S., Vijay-Shanker, K.: An investigation of Various Information Sources for Classifying Biological Names. In: Proceedings of ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 113–120 (2003)
Google Scholar
Wang, H., Yu, S.: The Semantic Knowledge-base of Contemporary Chinese and its Apllication in WSD. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 112–118 (2003)
Google Scholar
Zhou, G., Su, J.: Named Entity Recognition using an HMM-based Chunk Tagger. In: Proceedings of ACL 2002, pp. 473–480 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Sciences, AUT, Auckland, New Zealand
Kyongho Min
School of Computer Science and Engineering, UNSW, Sydney, Australia
William H. Wilson
Department of Management Information Systems, HUFS, YongIn, Kyonggi, Korea
Yoo-Jin Moon

Authors

Kyongho Min
View author publications
You can also search for this author in PubMed Google Scholar
William H. Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Yoo-Jin Moon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Guangxi Normal University, College of CS and IT, Guilin, China, and University of Technology, Faculty of Engineering and Information Technology, Sydney, Australia
Shichao Zhang
Department of Electrical and Computer Systems Engineering, Monash University, 3800, Melbourne, Victoria, Australia
Ray Jarvis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Min, K., Wilson, W.H., Moon, YJ. (2005). Syntactic and Semantic Disambiguation of Numeral Strings Using an N-Gram Method. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11589990_11

Download citation

DOI: https://doi.org/10.1007/11589990_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30462-3
Online ISBN: 978-3-540-31652-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Syntactic and Semantic Disambiguation of Numeral Strings Using an N-Gram Method

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Linguistic Approach to English Phrasal Verbs

On the Structural Disambiguation of Multi-word Terms

Extended N-gram Model for Analysis of Polish Texts

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Syntactic and Semantic Disambiguation of Numeral Strings Using an N-Gram Method

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Linguistic Approach to English Phrasal Verbs

On the Structural Disambiguation of Multi-word Terms

Extended N-gram Model for Analysis of Polish Texts

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation