Comparison of Numeral Strings Interpretation: Rule-Based and Feature-Based N-Gram Methods

Min, Kyongho; Wilson, William H.

doi:10.1007/11941439_152

Kyongho Min²⁰ &
William H. Wilson²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4304))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

3659 Accesses

Abstract

This paper describes a performance comparison for two approaches to numeral string interpretation: manually generated rule-based interpretation of numerals and strings including numerals [8] vs automatically generated feature-based interpretation. The system employs three interpretation processes: word trigram construction with a tokeniser, rule-based processing of number strings, and n-gram based classification. We extracted numeral strings from 378 online newspaper articles, finding that, on average, they comprised about 2.2% of the words in the articles. For feature-based interpretation, we tested on 11 datasets, with random selection of sample data to extract tabular feature-based constraints. The rule-based approach resulted in 86.8% precision and 77.1% recall ratio. The feature-based interpretation resulted in 83.1% precision and 74.5% recall ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Unsupervised Artificial Intelligence Strategy for Recognising Multi-word Expressions in Transformed Bengali Data

Automatic Extraction of Typological Linguistic Features from Descriptive Grammars

Silex: A database for silent-letter endings in French words

Article 18 November 2016

References

Asahara, M., Matsumoto, Y.: Japanese Named Entity Extraction with Redundant Morphological Analysis. In: Proceedings of HLT-NAACL 2003, pp. 8–15 (2003)
Google Scholar
Black, W., Rinaldi, F., Mowatt, D.: FACILE: Description of the NE system used for MUC-7. In: Proceedings of MUC-7 (1998)
Google Scholar
Chieu, L., Ng, T.: Named Entity Recognition: A Maximum Entropy Approach Using Global Information. In: Proceedings of the 19th COLING, pp. 190–196 (2002)
Google Scholar
CoNLL-2003 Language-Independent Named Entity Recognition (2003), http://www.cnts.uia.ac.be/conll2003/ner/2
Dale, R.: A Framework for Complex Tokenisation and its Application to Newspaper Text. In: Proceedings of the second Australian Document Computing Symposium (1997)
Google Scholar
Earley, J.: An Efficient Context-Free Parsing Algorithm. CACM 13(2), 94–102 (1970)
MATH Google Scholar
Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named Entity Recognition from Diverse Text Types. In: Proceedings of Recent Advances in NLP (2001)
Google Scholar
Min, K., Wilson, W.H., Moon, Y.: Syntactic and Semantic Disambiguation of Numeral Strings Using an N-gram Method. In: Zhang, S., Jarvis, R. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 82–91. Springer, Heidelberg (2005)
Chapter Google Scholar
Nelson, G., Wallis, S., Aarts, B.: Exploring Natural Language - working with the British Component of the International Corpus of English. John Benjamins, The Netherlands (2002)
Google Scholar
Polanyi, L., van den Berg, M.: Logical Structure and Discourse Anaphora Resolution. In: Proceedings of ACL 1999 Workshop on The Relation of Discourse/Dialogue Structure and Reference, pp. 110–117 (1999)
Google Scholar
Reiter, E., Sripada, S.: Learning the Meaning and Usage of Time Phrases from a parallel Text-Data Corpus. In: Proceedings of HLT-NAACL 2003 Workshop on Learning Word Meaning from Non-Linguistic Data, vol. 11, pp. 78–85 (2003)
Google Scholar
Siegel, M., Bender, E.M.: Efficient Deep Processing of Japanese. In: Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization (2002)
Google Scholar
Torii, M., Kamboj, S., Vijay-Shanker, K.: An investigation of Various Information Sources for Classifying Biological Names. In: Proceedings of ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 113–120 (2003)
Google Scholar
Wang, H., Yu, S.: The Semantic Knowledge-base of Contemporary Chinese and its Application in WSD. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 112–118 (2003)
Google Scholar
Zhou, G., Su, J.: Named Entity Recognition using an HMM-based Chunk Tagger. In: Proceedings of ACL 2002, pp. 473–480 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Sciences, Auckland University of Technology, New Zealand
Kyongho Min
School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
William H. Wilson

Authors

Kyongho Min
View author publications
You can also search for this author in PubMed Google Scholar
William H. Wilson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DisPRR, National ICT Australia Ltd, QLD, Australia
Abdul Sattar
School of Computing, University of Tasmania, Sandy Bay, 7005, Tasmania, Australia
Byeong-ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Min, K., Wilson, W.H. (2006). Comparison of Numeral Strings Interpretation: Rule-Based and Feature-Based N-Gram Methods. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_152

Download citation

DOI: https://doi.org/10.1007/11941439_152
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49787-5
Online ISBN: 978-3-540-49788-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Comparison of Numeral Strings Interpretation: Rule-Based and Feature-Based N-Gram Methods

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Unsupervised Artificial Intelligence Strategy for Recognising Multi-word Expressions in Transformed Bengali Data

Automatic Extraction of Typological Linguistic Features from Descriptive Grammars

Silex: A database for silent-letter endings in French words

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Comparison of Numeral Strings Interpretation: Rule-Based and Feature-Based N-Gram Methods

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Unsupervised Artificial Intelligence Strategy for Recognising Multi-word Expressions in Transformed Bengali Data

Automatic Extraction of Typological Linguistic Features from Descriptive Grammars

Silex: A database for silent-letter endings in French words

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation