Abstract
We present a classifier-combination experimental framework for part-of-speech (POS) tagging in which four different POS taggers are combined in order to get a better result for sentence similarity using Hierarchical Document Signature (HDS). It is important to abstract information available to form humanly accessible structures. The way people think and talk is hierarchical with limited information presented in any one sentence, and that information is always linked together to further information. As such, HDS is a significant way to represent sentences when finding their similarity. POS tagging plays an important role in HDS. But POS taggers available are not perfect in tagging words in a sentence and tend to tag words improperly if they are either not properly cased or do not match the corpus dataset by which these taggers are trained. Thus, different weighted voting strategies are used to overcome some of these drawbacks of these existing taggers. Comparisons between individual taggers and combined taggers under different voting strategies are made. Their results show that the combined taggers provide better results than the individual ones.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fellbaum, C.: Wordnet: an Electronic Lexical Database. Bradford Books (1998)
Gasperin, C., Gamallo, P., Agustini, A., Lopes, G., Lima, V.: Using syntactic contexts for measuring word similarity. In: Proceedings of the Workshop on Semantic Knowledge Acquisition and Categorisation, Helsink, Finland (2001)
Gedeon, T.D., Mital, V.: Information Retrieval in Law using a Neural Network Integrated with Hypertext. In: Proceedings International Joint Conference on Neural Networks, Singapore, pp. 1819–1824 (1991)
Koczy, L.T., Gedeon, T.D., Koczy, J.A.: Fuzzy tolerance relations and relational maps applied to information retrieval. Fuzzy Sets and Systems 126(1), 49–61 (2002)
Huang, Z., Gedeon, T. D.:Information Retrieval Estimation Via Fuzzy Probability. World Automation Congress (WAC), Budapest, Hungary (2006)
Manna, S., Mendis, B., Gedeon, T.: Hierarchical document signature: A specialized application of fuzzy signature for document computing (2009)
Manna, S., Gedeon, T.: Hierarchical Document Signature for Semantic Analysis. In: WCCI 2010, FUZZ-IEEE 2010, Barcelona (2010)
Ali, K.M., Pazzani, M.J.: Error Reduction through Learning Multiple Descriptions. Machine Learning 24(3), 173–202 (1996)
Chan, P.K., Stolfo, S.J.: A Comparative Evaluation of Voting and Meta-Learning of Partition Data. In: 12th International Conference on Machine Learning (1995)
Gedeon, T.D., Wong, P.M., Harris, D.: Balancing Bias and Variance: Network Topology and Pattern Set Reduction Techniques. In: Sandoval, F., Mira, J. (eds.) IWANN 1995. LNCS, vol. 930, pp. 551–558. Springer, Heidelberg (1995)
OpenNLP, http://opennlp.sourceforge.net/api/opennlp/tools/lang/english/PosTagger.html
Lingpipe Tagger, http://alias-i.com/lingpipe/demos/tutorial/posTags/read-me.html
Halacsy, P., Kornai, A., Oravercz, C.: HunPos -an open source trigram tagger. In: Proceedings of the Demo and Poster Session of the 45th Annual Meeting of the ACL, pp. 209–212 (2007)
Brants, T.: TnT-A Statistic Part-of-Speech Tagger. In: Proceedings of ANLP-NAACL Confference (2000)
CRFTagger, http://crftagger.sourceforge.net
Sjöbergh, J.: A Comparative Evaluation of Voting and Meta-Learning of Partition Data. In: Proceedings of RANLP-2003, Borovets, Bulgaria (2003)
Rama Sree, R.J., Kusuma kumari, P.: Combining Pos Taggers For Improved Accuracy To Create Telugu Annotated Texts For Information Retrieval. In: ICUDL 2007, Carnegie Mellon University, Pittsburgh, USA-ULIB, (2007)
OpenNLP MAXENT, http://maxent.sourceforge.net
van Halteren, H.: Comparison of Tagging Strategies, a Prelude of Democratic Tagging. Research in Humanities Computing 4, 207–215 (1996)
Van Halteren, H., Zavrel, J., Daelemans, W.: Improving Data Driven Worldclass Tagging by System Combination. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (1998)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Computational Linguistics 19 (1993)
CoNLL2000 dataset, www.cnts.ua.ac.be/conll2000/chunking/
Microsoft Research Paraphrase Corpus, http://research.microsoft.com/en-us/downloads/607d14d9-20cd-47e3-85bc-a2f65cd28042/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liao, J., Mendis, B.S.U., Manna, S. (2010). Improving Hierarchical Document Signature Performance by Classifier Combination. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds) Neural Information Processing. Theory and Algorithms. ICONIP 2010. Lecture Notes in Computer Science, vol 6443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17537-4_84
Download citation
DOI: https://doi.org/10.1007/978-3-642-17537-4_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17536-7
Online ISBN: 978-3-642-17537-4
eBook Packages: Computer ScienceComputer Science (R0)