Skip to main content
Log in

VN-KIM IE: Automatic Extraction of Vietnamese Named-Entities on the Web

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

The most fascinating advantage of the semantic web would be its capability of understanding and processing the contents of web pages automatically. Basically, the semantic web realization involves two main tasks: (1) Representation and management of a large amount of data and metadata for web contents; (2) Information extraction and annotation on web pages. On the one hand, recognition of named-entities is regarded as a basic and important problem to be solved, before deeper semantics of a web page could be extracted. On the other hand, semantic web information extraction is a language-dependent problem, which requires particular natural language processing techniques. This paper introduces VN-KIM IE, the information extraction module of the semantic web system VN-KIM that we have developed. The function of VN-KIM IE is to automatically recognize named-entities in Vietnamese web pages, by identifying their classes, and addresses if existing, in the knowledge base of discourse. That information is then annotated to those web pages, providing a basis for NE-based searching on them, as compared to the current keyword-based one. The design, implementation, and performance of VN-KIM IE are presented and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Berners-Lee T., Hendler J. and Lassila O., The Semantic Web. Scientific American, 2001.

  2. Bontcheva K., Kiryakov A., Cunningham H., Popov B. and Dimitrov M., “Semantic Web Enabled, Open Source Language Technology,” in Proc. of EACL Workshop dedicated to Language Technology and the Semantic WebBudapest, Hungary, 2003.

  3. Brickley D. and Guha R., “Resource Description Framework (RDF) Schema Specificatio,” W3C Technical Report, 1999.

  4. Cao T.H., Do H.T., Pham B.T.N., Huynh T.N. and Vu D.Q., “Conceptual Graphs for Knowledge Querying in VN-KIM,” in Contributions to the 13 th International Conference on Conceptual Structures, Kassel, Germany, Kassel University Press, pp. 27-40, 2005.

  5. Cao T.H. and Huynh D.T., “Approximate Knowledge Graph Retrieval: Measures and Realization,” in Fuzzy Logic and the Semantic Web (Sanchez E. ed.), Elsevier Science, to appear, 2006.

  6. Cao T.H., Nguyen T.H.D. and Qui T.C.T., “Searching the Web: a Semantics-Based Approach,” in Modelling, Simulation and Optimization of Complex Processes (H.G. et al. eds.), Springer, Berlin, pp. 57-68, 2005.

  7. Cao T.H., Ta M.T.H. and Luong T.Q., “A Domain-Specific Concept-Based Searching System,” in Proc. of the Vietnam-Japan Workshop on Active Mining, Ha Noi, Vietnam, Japanese Artificial Intelligence Society SIG-KBS-A403, pp. 197-200, 2004.

  8. Chau N.Q., Tuoi P.T and Cao T.H., “Vietnamese Proper Noun Recognition,” in Proc. of the 4 th IEEE Int. Conf. on Computer Sciences, Ho Chi Minh City, Vietnam, pp. 145-152, 2006.

  9. Chinchor N. and Robinson P., “MUC-7 Named Entity Task Definition,” in Proc. of the MUC, 1998.

  10. Cunningham H., Maynard D. and Tablan V., “JAPE: a Java Annotation Patterns Engine (2nd Edition),” Technical Report CS–00–10, Department of Computer Science, University of Sheffield, 2000.

  11. Cunningham H., Maynard D., Bontcheva K. and Tablan V., “GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications,” in Proc. of the 40 th Anniversary Meeting of the Association for Computational Linguistics 2002.

  12. Erdmann M., Maedche A., Schnurr H. and Staab S., “From Manual to Semi-Automatic Semantic Annotation: About Ontology-Based Text Annotation Tools,” in Proc. of the COLING Workshop on Semantic Annotation and Intelligent Content, 2000.

  13. Grishman R. and Sundheim B., “Message Understanding Conference – 6: A Brief History,” in Proc. of COLING-96, 1996.

  14. Handschuh S., Staab S. and Ciravegna F., “S-CREAM – Semi-Automatic CREAtion of Metadata,” in Proc. of the 13 th Int. Conf. on Knowledge Engineering and Management, Springer Verlag, 2002.

  15. Kahan J., Koivunen M., Prud’Hommeaux E. and Swick R., “Annotea: An Open RDF Infrastructure for Shared Web Annotations,” in Proc. of WWW10 Conf., Hong Kong pp. 623-632.

  16. Kampman A., Harmelen F. and Broekstra J., “Sesame: a Generic Architecture for Storing and Querying RDF and RDF Schema,” in Proc. of the 1 st Int. Semantic Web Conf., 2002.

  17. Kiryakov A., Popov B., Terziev I., Manov D. and Ognyanoff D., “Semantic Annotation, Indexing, and Retrieval,” in Web Semantics, 2, 1, 2005.

  18. Lassila O. and Swick R., “Resource Description Framework (RDF) Model and Syntax Specification,” W3C Technical Report, 1999.

  19. Le P. and Cao T.H., “Automatic News Extraction from Web Pages,” in Addendum Contributions to the 4 th IEEE Int. Conf. on Computer Sciences, Ho Chi Minh City, Vietnam, pp. 47-52, 2006.

  20. Luong X.Vu., Rules of Vietnamese Accent Placement (in Vietnamese), Vietnam Lexicography Centre.

  21. Noy N.F., Sintek M., Decker S., Crubezy M., Fergerson R. W. and Musen M. A., “Creating Semantic Web Contents with Protégé-2000,” in IEEE Intelligent Systems,2, 16, pp. 60–71, 2001.

  22. Popov B., Kiryakov A., Kirilov A., Manov D., Ognyanoff D. and Goranov M., “KIM – Semantic Annotation Platform,” in Proc. of the 2 nd Int. Semantic Web Conf., Florida, USA 2003.

  23. Popov B., Kiryakov A., Ognyanoff D., Manov D., Kirilov A. and Goranov M., “Towards Semantic Web Information Extraction,” in Proc. of 2 nd Int. Semantic Web Conf., Florida, USA, 2003.

  24. Staab S., Mädche A. and Handschuh S., “An Annotation Framework for the Semantic Web,” in Proc. of the 1 st Int. Workshop on Multimedia Annotation, Tokyo, Japan 2001.

  25. Thin T.T., “Vietnamese Scripts on Computers – A Simple Idea about Accent Placement on Vietnamese Documents (in Vietnamese),” in Language Magazine, 1, pp. 72-76, 1995.

  26. Vargas-Vera M., Motta E., Domingue J., Lanzoni M., Stutt A. and Ciravegna F., “MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup,” in Proc. of EKAW-02, Springer Verlag, 2002.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Truc-Vien T. Nguyen.

About this article

Cite this article

Nguyen, TV.T., Cao, T.H. VN-KIM IE: Automatic Extraction of Vietnamese Named-Entities on the Web. New Gener. Comput. 25, 277–292 (2007). https://doi.org/10.1007/s00354-007-0018-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-007-0018-4

Keywords

Navigation