Abstract
This paper describes a performance comparison for two approaches to numeral string interpretation: manually generated rule-based interpretation of numerals and strings including numerals [8] vs automatically generated feature-based interpretation. The system employs three interpretation processes: word trigram construction with a tokeniser, rule-based processing of number strings, and n-gram based classification. We extracted numeral strings from 378 online newspaper articles, finding that, on average, they comprised about 2.2% of the words in the articles. For feature-based interpretation, we tested on 11 datasets, with random selection of sample data to extract tabular feature-based constraints. The rule-based approach resulted in 86.8% precision and 77.1% recall ratio. The feature-based interpretation resulted in 83.1% precision and 74.5% recall ratio.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Asahara, M., Matsumoto, Y.: Japanese Named Entity Extraction with Redundant Morphological Analysis. In: Proceedings of HLT-NAACL 2003, pp. 8–15 (2003)
Black, W., Rinaldi, F., Mowatt, D.: FACILE: Description of the NE system used for MUC-7. In: Proceedings of MUC-7 (1998)
Chieu, L., Ng, T.: Named Entity Recognition: A Maximum Entropy Approach Using Global Information. In: Proceedings of the 19th COLING, pp. 190–196 (2002)
CoNLL-2003 Language-Independent Named Entity Recognition (2003), http://www.cnts.uia.ac.be/conll2003/ner/2
Dale, R.: A Framework for Complex Tokenisation and its Application to Newspaper Text. In: Proceedings of the second Australian Document Computing Symposium (1997)
Earley, J.: An Efficient Context-Free Parsing Algorithm. CACM 13(2), 94–102 (1970)
Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named Entity Recognition from Diverse Text Types. In: Proceedings of Recent Advances in NLP (2001)
Min, K., Wilson, W.H., Moon, Y.: Syntactic and Semantic Disambiguation of Numeral Strings Using an N-gram Method. In: Zhang, S., Jarvis, R. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 82–91. Springer, Heidelberg (2005)
Nelson, G., Wallis, S., Aarts, B.: Exploring Natural Language - working with the British Component of the International Corpus of English. John Benjamins, The Netherlands (2002)
Polanyi, L., van den Berg, M.: Logical Structure and Discourse Anaphora Resolution. In: Proceedings of ACL 1999 Workshop on The Relation of Discourse/Dialogue Structure and Reference, pp. 110–117 (1999)
Reiter, E., Sripada, S.: Learning the Meaning and Usage of Time Phrases from a parallel Text-Data Corpus. In: Proceedings of HLT-NAACL 2003 Workshop on Learning Word Meaning from Non-Linguistic Data, vol. 11, pp. 78–85 (2003)
Siegel, M., Bender, E.M.: Efficient Deep Processing of Japanese. In: Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization (2002)
Torii, M., Kamboj, S., Vijay-Shanker, K.: An investigation of Various Information Sources for Classifying Biological Names. In: Proceedings of ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 113–120 (2003)
Wang, H., Yu, S.: The Semantic Knowledge-base of Contemporary Chinese and its Application in WSD. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 112–118 (2003)
Zhou, G., Su, J.: Named Entity Recognition using an HMM-based Chunk Tagger. In: Proceedings of ACL 2002, pp. 473–480 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Min, K., Wilson, W.H. (2006). Comparison of Numeral Strings Interpretation: Rule-Based and Feature-Based N-Gram Methods. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_152
Download citation
DOI: https://doi.org/10.1007/11941439_152
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49787-5
Online ISBN: 978-3-540-49788-2
eBook Packages: Computer ScienceComputer Science (R0)