Skip to main content

Text Disambiguation Using Support Vector Machine: An Initial Study

  • Conference paper
PRICAI 2010: Trends in Artificial Intelligence (PRICAI 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6230))

Included in the following conference series:

  • 1596 Accesses

Abstract

Word segmentation is an essential step in building natural language applications such as machine translation, text summarization, and cross-lingual information retrieval. For certain oriental languages where word boundary is not clearly defined, a recognition process can become very challenging. One of the serious problems is dealing with word ambiguity. In this paper, we investigate the use of Linear Support Vector Machines (LSVM) for word boundary disambiguation. We empirically show, in the Vietnamese case, that LSVM obtains a better result when comparing to the Trigram Language Model approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Feng, S.-q., Hou, S.-q.: Context-Based Approach for Covering Ambiguity Resolution in Chinese Word Segmentation. In: 2009 Second International Conference on Information and Computing Science, ICIC, vol. 2, pp. 43–46 (2009)

    Google Scholar 

  2. Nguyen, D.: Using Search Engine to Construct a Scalable Corpus for Vietnamese Lexical Development for Word Segmentation. In: The 7th Workshop on Asian Language Resources (ALR7). Conjunction with ACL-IJCNLP 2009, Suntec City, Singapore (2009)

    Google Scholar 

  3. Lê, H.P., Nguyen, T.M.H., Roussanaly, A., Ho, T.V.: A hybrid approach to word segmentation of Vietnamese texts. In: 2nd International Conference on Language and Automata Theory and Applications, Tarragona, Spain (2008)

    Google Scholar 

  4. Nguyen, D.: Query preprocessing: improving web search through a Vietnamese word tokenization approach. In: SIGIR 2008, pp. 765–766 (2008)

    Google Scholar 

  5. Nguyen, C.T., Nguyen, T.K., Phan, X.H., Nguyen, L.M., Ha, Q.T.: Vietnamese word segmentation with CRFs and SVMs: An investigation. In: Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation (PACLIC 2006), Wuhan, CH (2006)

    Google Scholar 

  6. Luo, X., Sun, M., Tsou, B.K.: Covering ambiguity resolution in Chinese word segmentation based on contextual information. In: Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, August 24-September 1, pp. 1–7 (2002)

    Google Scholar 

  7. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Center Research in Computing Technology, Harvard University, TR-10-98 (1998)

    Google Scholar 

  8. Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)

    Google Scholar 

  9. Tong, S., Koller, D.: Support vector machine active learning with application to text classification. In: Proceedings of the Seventeenth International Conference on Machine Learning (2000)

    Google Scholar 

  10. Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of ACM-CIKM 1998, pp. 148–155 (November 1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, D., Zhang, D. (2010). Text Disambiguation Using Support Vector Machine: An Initial Study. In: Zhang, BT., Orgun, M.A. (eds) PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science(), vol 6230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15246-7_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15246-7_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15245-0

  • Online ISBN: 978-3-642-15246-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics