Skip to main content

Weakly-Supervised Named Entity Extraction Using Word Representations

  • Conference paper
  • First Online:
Book cover Database Systems for Advanced Applications (DASFAA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10179))

Included in the following conference series:

  • 1674 Accesses

Abstract

Named entity extraction is a key subtask of Information Extraction (IE), and also an important component for many Natural Language Processing (NLP) and Information Retrieval (IR) tasks. This paper proposes a weakly-supervised named entity extraction method by learning word representations on web-scale corpus. The highlights of our method include: (1) Word representations could be trained on either web documents or query logs; (2) Finding correct named entities is guided by a small set of seed entities, without any need for domain knowledge or human labor, allowing for the acquisition of named entities of any domain. Extensive experiments have been conducted to verify the effectiveness and efficiency of our method, comparing with the state-of-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Song, W., Zhao, S., Zhang, C., Wu, H., Wang, H., Liu, L., Wang, H.: Exploiting collective hidden structures in webpage titles for open domain entity extraction. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, 18–22 May 2015, pp. 1014–1024 (2015)

    Google Scholar 

  2. Bikel, D.M., Miller, S., Schwartz, R.M., Weischedel, R.M.: Nymble: a high-performance learning name-finder. In: ANLP, pp. 194–201 (1997)

    Google Scholar 

  3. Chieu, H.L., Ng, H.T.: Named entity recognition: a maximum entropy approach using global information. In: 19th International Conference on Computational Linguistics, COLING 2002, Howard International House and Academia Sinica, Taipei, Taiwan, 24 August–1 September 2002 (2002)

    Google Scholar 

  4. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Held in Cooperation with HLT-NAACL 2003, Edmonton, Canada, 31 May–1 June, pp. 188–191 (2003)

    Google Scholar 

  5. Downey, D., Broadhead, M., Etzioni, O.: Locating complex named entities in web text. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, Hyderabad, India, 6–12 January 2007, pp. 2733–2739 (2007)

    Google Scholar 

  6. Jain, A., Pennacchiotti, M.: Open entity extraction from web search query logs. In: Proceedings of the Conference on 23rd International Conference on Computational Linguistics, COLING 2010, Beijing, China, 23–27 August 2010, pp. 510–518 (2010)

    Google Scholar 

  7. Pasca, M.: Weakly-supervised discovery of named entities using web search queries. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, 6–10 November 2007, pp. 683–690 (2007)

    Google Scholar 

  8. Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100–110. Citeseer (1999)

    Google Scholar 

  9. Etzioni, O., Cafarella, M.J., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell 165(1), 91–134 (2005)

    Article  Google Scholar 

  10. Parameswaran, A.G., Garcia-Molina, H., Rajaraman, A.: Towards the web of concepts: extracting concepts from large datasets. PVLDB 3(1), 566–577 (2010)

    Google Scholar 

  11. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a Meeting Held Lake Tahoe, Nevada, United States, 5–8 December 2013, pp. 3111–3119 (2013)

    Google Scholar 

  12. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  13. Dalvi, B.B., Cohen, W.W., Callan, J.: Websets: extracting sets of entities from the web using unsupervised information extraction. CoRR, abs/1307.0261 (2013)

    Google Scholar 

  14. Talukdar, P.P., Reisinger, J., Pasca, M., Ravichandran, D., Bhagat, R., Pereira, F.C.N.: Weakly-supervised acquisition of labeled class instances using graph random walks. In: 2008 Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Honolulu, Hawaii, USA, 25–27 October 2008 A Meeting of SIGDAT, a Special Interest Group of the ACL, pp. 582–590 (2008)

    Google Scholar 

  15. Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, Barcelona, Catalonia, Spain, 16–22 July 2011, pp. 3–10 (2011)

    Google Scholar 

  16. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, vol. 7, pp. 2670–2676 (2007)

    Google Scholar 

  17. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013)

    Google Scholar 

  18. Sougou Labs. http://www.sogou.com/labs/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junfei Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Deng, K., Wang, D., Liu, J. (2017). Weakly-Supervised Named Entity Extraction Using Word Representations. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55705-2_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55704-5

  • Online ISBN: 978-3-319-55705-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics