Abstract
Word segmentation is an essential step in building natural language applications such as machine translation, text summarization, and cross-lingual information retrieval. For certain oriental languages where word boundary is not clearly defined, a recognition process can become very challenging. One of the serious problems is dealing with word ambiguity. In this paper, we investigate the use of Linear Support Vector Machines (LSVM) for word boundary disambiguation. We empirically show, in the Vietnamese case, that LSVM obtains a better result when comparing to the Trigram Language Model approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Feng, S.-q., Hou, S.-q.: Context-Based Approach for Covering Ambiguity Resolution in Chinese Word Segmentation. In: 2009 Second International Conference on Information and Computing Science, ICIC, vol. 2, pp. 43–46 (2009)
Nguyen, D.: Using Search Engine to Construct a Scalable Corpus for Vietnamese Lexical Development for Word Segmentation. In: The 7th Workshop on Asian Language Resources (ALR7). Conjunction with ACL-IJCNLP 2009, Suntec City, Singapore (2009)
Lê, H.P., Nguyen, T.M.H., Roussanaly, A., Ho, T.V.: A hybrid approach to word segmentation of Vietnamese texts. In: 2nd International Conference on Language and Automata Theory and Applications, Tarragona, Spain (2008)
Nguyen, D.: Query preprocessing: improving web search through a Vietnamese word tokenization approach. In: SIGIR 2008, pp. 765–766 (2008)
Nguyen, C.T., Nguyen, T.K., Phan, X.H., Nguyen, L.M., Ha, Q.T.: Vietnamese word segmentation with CRFs and SVMs: An investigation. In: Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation (PACLIC 2006), Wuhan, CH (2006)
Luo, X., Sun, M., Tsou, B.K.: Covering ambiguity resolution in Chinese word segmentation based on contextual information. In: Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, August 24-September 1, pp. 1–7 (2002)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Center Research in Computing Technology, Harvard University, TR-10-98 (1998)
Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)
Tong, S., Koller, D.: Support vector machine active learning with application to text classification. In: Proceedings of the Seventeenth International Conference on Machine Learning (2000)
Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of ACM-CIKM 1998, pp. 148–155 (November 1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nguyen, D., Zhang, D. (2010). Text Disambiguation Using Support Vector Machine: An Initial Study. In: Zhang, BT., Orgun, M.A. (eds) PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science(), vol 6230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15246-7_60
Download citation
DOI: https://doi.org/10.1007/978-3-642-15246-7_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15245-0
Online ISBN: 978-3-642-15246-7
eBook Packages: Computer ScienceComputer Science (R0)