Abstract
This paper presents a part-of-speech tagging method based on a min-max modular neural-network model. The method has three main steps. First, a large-scale tagging problem is decomposed into a number of relatively smaller and simpler subproblems according to the class relations among a given training corpus. Secondly, all of the subproblems are learned by smaller network modules in parallel. Finally, following two simple module combination laws, all of the trained network modules are integrated into a modular parallel tagging system that produces solutions to the original tagging problem. The proposed method has several advantages over existing tagging systems based on multilayer perceptrons. (1) Training times can be drastically reduced and desired learning accuracy can be easily achieved; (2) the method can scale up to larger tagging problems; (3) the tagging system has quick response and facilitates hardware implementation. In order to demonstrate the effectiveness of the proposed method, we perform simulations on two different language corpora: a Thai corpus and a Chinese corpus, which have 29,028 and 45,595 ambiguous words, respectively. We also compare our method with several existing tagging models including hidden Markov models, multilayer perceptrons and neuro-taggers. The results show that both the learning accuracy and generalization performance of the proposed tagging model are better than statistical models and multilayer perceptrons, and they are comparable to the most successful tagging models.
Similar content being viewed by others
References
Q. Ma, M. Sun, and H. Isahara, “A multi-neuro tagger applied in Chinese texts,” in Proc. of 1998 Int. Conf. Chinese Info. Processing, Beijing, Nov. 18–20, 1998, pp. 200–207.
E. Brill, “Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging,” Computational Linguistics, vol. 21, no.4, pp. 543–565, 1994.
T. Charoenporn, V. Sornlertlamvanich, and H. Isahara, “Building a large Thai text corpus—part of speech tagged corpus: ORCHID,” in Proc. Natural Language Processing Pacific Rim Symposium 1997, Phuket, Thailand, 1997, pp. 509–512.
B. Merialdo, “Tagging English text with a probabilistic model,” Computational Linguistics, vol. 20, no.2, pp. 155–171, 1994.
R. Weischedel, M. Meteer, R. Schwartz, L. Ramshaw, and J. Palmucci, “Coping with ambiguity and unknown words through probabilistic models,” Computational Linguistics, vol. 19, no.2, pp. 359–382, 1993.
E. Charniak, Statistical Language Learning, MIT Press: Cambridge, MA, 1993.
C.D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press: Cambridge, MA, 1999.
J. Benello, A.W. Mackie, and J.A. Anderson, “Syntactic category disambiguation with neural networks,” Computer Speech and Language, vol. 3, pp. 203–217, 1989.
M. Nakamura, K. Maruyama, T. Kawabata, and K. Shikano, “Neural network approach to word category prediction for English texts,” in Proc. COLING’90, Helsinki University, 1990, pp. 213–218.
H. Schmid, “Part-of-speech tagging with neural networks,” in Proc. COLING’94, Kyoto, Japan, 1994, pp. 172–176.
Q. Ma and H. Isahara, “A multi-neuro tagger using variable lengths of contexts,” in Proc. COLING-ACL’98, Montreal, 1998, pp. 802–806.
Q. Ma, K. Uchimoto, M. Murata, and H. Isahara, “Hybrid neuro and rule-based part of speech taggers,” in Proc. COLING-2000, Saarbrücken, 2000, pp. 509–515.
S. Haykin, Neural Networks, 2nd edn., Prentice-Hall, Inc., 1999.
D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing: Exploration in the Microstructure of Cognition, edited by D.E. Rumelhart, J.L. McClelland, and PDP Research Group, MIT Press: Cambridge, MA, vol. 1, 1986, pp. 318–362.
J. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann: San Mateo, CA, 1993.
B.L. Lu and M. Ito, “Task decomposition based on class relations: A modular neural network architecture for pattern classification,” in Biological and Artificial Computation: From Neuroscience to Technology, Lecture Notes in Computer Sciences, edited by J. Mira, R. Moreno-Diaz, and J. Cabestany, Springer-Verlag: New York, vol. 1240, 1997, pp. 330–339.
B.L. Lu and M. Ito, “Task decomposition and module combination based on class relations: A modular neural network for pattern classification,” IEEE Trans. Neural Networks, vol. 10, no.5, pp. 1244–1256, 1999.
B.L. Lu and M. Ichikawa, “Emergence of learning: An approach to coping with NP-complete problems in learning,” in Proc. IJCNN’2000, Como, Italy, 2000, July 24–27, vol. 4, pp. 159–164.
N.J. Nilsson, Learning Machines: Foundations of Trainable Pattern Classifying Systems, McGraw-Hill: New York, 1965; reissued as The Mathematical Foundations of Learning Machines, Morgan Kaufmann, San Mateo, CA, 1990.
M.I. Jordan and R.A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Computation, vol. 6, pp. 181–214, 1994.
C.Y. Baldwin and K.B. Clark, Design Rules: The Power of Modularity, vol. 1, MIT Press: Cambridge, MA, 2000.
J.H. Friedman, “Another approach to polychotomous classification,” Technical Report (ftp://stat.stanford.edu/pub/friedman/poly.ps.Z), Stanford University, 1996.
G.S. Almasi and A. Gottlieb, Highly Parallel Computing, 2nd edn., The Benjamin/Cummings Publishing Company, Inc., 1994.
M.S. Sun, “Design of Chinese taggers,” Technical Report, Tsinghua University, 1996, in Chinese.
J.A. Anderson, An Introduction to Neural Networks, MIT Press: Cambridge, MA, 1995.
R. Anand, K.G. Mehrota, C.K. Mohan, and S. Ranka, “An improved algorithm for neural network classification of imbalanced training sets,” IEEE Trans. Neural Networks, vol. 4, pp. 962–973, 1993.
Q. Ma, K. Uchimoto, M. Murata, and H. Isahara, “Elastic neural networks for part of speech tagging,” in Proc. IJCNN’99, Washington DC, 1999, pp. 2991–2996.
J.S. Judd, Neural Network Design and the Complexity of Learning, MIT Press: Cambridge, MA, 1990.
A.L. Blum and R.L. Rivest, “Training a 3-node neural network is NP-complete,” Neural Networks, vol. 5, pp. 117–127, 1992.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, BL., Ma, Q., Ichikawa, M. et al. Efficient Part-of-Speech Tagging with a Min-Max Modular Neural-Network Model. Applied Intelligence 19, 65–81 (2003). https://doi.org/10.1023/A:1023868723792
Issue Date:
DOI: https://doi.org/10.1023/A:1023868723792