Skip to main content

Effective Chinese Organization Name Linking to a List-Like Knowledge Base

  • Conference paper
  • First Online:
The Semantic Web and Web Science (CSWS 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 480))

Included in the following conference series:

Abstract

Entity Linking is widely used in entity retrieval and semantic search. It refers mentions in unstructured documents to their representations in a knowledge base (KB). The frequently used KB (e.g. Wikipedia) usually contains abundant information corresponding to each entity, such as properties, name variations and text descriptions, which can help to find candidates and disambiguate the links. In this paper, we link organization names in Chinese documents to a list-like KB. Compared to typical KBs, the records in our KB are simply Chinese organization full names. The massive variations, or abbreviations in the documents cannot be directly matched to any organization name in the KB and bring about ambiguities, thus make the linking task difficult. At first, we enrich the KB with the abbreviations. Making use of the information from Hudong Baike and other sources, we design a pattern based full name annotation method to help generate abbreviations for all the names in the KB. To resolve the ambiguity problem, we propose a two-stage linking generation approach utilizing the co-occurrence of abbreviations and full names in the same document or document cluster, where the linked full names in the first stage constraint the linking of abbreviations in the second stage. We apply our approach to police inquiry document corpus. The experiment results show the effectiveness of our approach and outperforms the one-stage approach significantly in terms of precision and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.baike.com/

  2. 2.

    http://pinyin.sogou.com/dict/

  3. 3.

    http://ictclas.nlpir.org/

  4. 4.

    https://github.com/ansjsun/ansj_seg

  5. 5.

    https://github.com/xpqiu/fnlp/

References

  1. Zhong, L.W., Zheng, F.: Study on approach to retrieval of chinese organization name based on its abbreviated name. J. Chin. Inf. Process. 21, 38–42 (2007)

    Google Scholar 

  2. Chua, T.S., Liu, J.: Learning pattern rules for chinese named entity extraction. In: Proceedings of AAAI/IAAI, 411–418 (2002)

    Google Scholar 

  3. Houfeng, W., Wuguang, S.: A simple rule-based approach to organization name recognition in chinese text. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 769–772. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Ling, Y., Yang, J., He, L.: Chinese organization name recognition based on multiple features. In: Chau, M., Wang, G., Yue, W.T., Chen, H. (eds.) PAISI 2012. LNCS, vol. 7299, pp. 136–144. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  5. Fu, C., Fu, G.: A dual-layer CRFs based method for chinese nested named entity recognition. In: 9th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 2546–2550. IEEE, New York (2012)

    Google Scholar 

  6. Wu, X., Wu, Z., Jia, J., et al.: Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers. In: 8th International Symposium on Chinese Spoken Language Processing, pp. 363–367. IEEE, New York (2012)

    Google Scholar 

  7. Zhang, W., Su, J., Tan, C.L. et al.: Entity linking leveraging: automatically generated annotation. In: COLING 2010, pp. 1290–1298. ACL, Stroudsburg (2010)

    Google Scholar 

  8. Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: 34th ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 765–774. ACM, New York (2011)

    Google Scholar 

  9. Liu, X., Li, Y., Wu, H., et al.: Entity linking for tweets. In: The 51th Annual Meeting of the Association for Computational Linguistics, pp. 1304–1311. ACL, Stroudsburg (2013)

    Google Scholar 

  10. Shen, W., Wang, J., Luo, P., et al.: LIEGE: link entities in web lists with knowledge base. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1424–1432. ACM, New York (2012)

    Google Scholar 

Download references

Acknowledgements

This work is funded by The 3rd Research Institute of The Ministry of Public Security through project No: C13601. We thank Tong Ruan for the guidance of the project, and thank Chen Wang for her proofreading.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haofen Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xue, C., Wang, H., Jin, B., Wang, M., Gao, D. (2014). Effective Chinese Organization Name Linking to a List-Like Knowledge Base. In: Zhao, D., Du, J., Wang, H., Wang, P., Ji, D., Pan, J. (eds) The Semantic Web and Web Science. CSWS 2014. Communications in Computer and Information Science, vol 480. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45495-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45495-4_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45494-7

  • Online ISBN: 978-3-662-45495-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics