Skip to main content

DBpedia Entity Type Detection Using Entity Embeddings and N-Gram Models

  • Conference paper
  • First Online:
Knowledge Engineering and Semantic Web (KESW 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 786))

Included in the following conference series:

Abstract

This paper presents and evaluates a method for the detection of DBpedia entity types (classes) that can be used to assess DBpedia’s quality and to complete missing types for un-typed resources. This method compares entity embeddings with traditional N-gram models coupled with clustering and classification. We evaluate the results for 358 typical DBpedia classes. Our results show that entity embeddings outperform n-gram models for type detection and can contribute to the improvement of DBpedia’s quality, maintenance, and evolution. This is a step toward improving the quality of Linked Open Data in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/idio/wiki2vec.

  2. 2.

    https://dumps.wikimedia.org/enwiki/.

  3. 3.

    http://www.nltk.org/.

  4. 4.

    https://lvdmaaten.github.io/tsne/.

  5. 5.

    http://dbpedia.org/page/Barack_Obama.

  6. 6.

    http://weka.sourceforge.net/doc.dev/weka/core/tokenizers/NGramTokenizer.html.

  7. 7.

    http://dbpedia.org/sparql.

  8. 8.

    http://www.site.uottawa.ca/~diana/resources/kesw17/.

  9. 9.

    http://scikit-learn.org/stable/.

References

  1. Krötzsch, M., Vrandečić, D., Völkel, M., Haller, H., Studer, R.: Semantic wikipedia. Web Semant. 5(4), 251–261 (2007)

    Article  Google Scholar 

  2. Morsey, M., Lehmann, J., Auer, S., Stadler, C., Hellmann, S.: DBpedia and the live extraction of structured data from Wikipedia. Program 46(2), 157–181 (2012)

    Article  Google Scholar 

  3. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Bizer, C.: DBpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6(2), 167–195 (2015)

    Google Scholar 

  4. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  5. Zhang, Z., Chen, S., Feng, Z.: Semantic annotation for web services based on DBpedia. In: 2013 IEEE 7th International Symposium on Service Oriented System Engineering (SOSE), pp. 280–285 (2013)

    Google Scholar 

  6. Keong, B.V., Anthony, P.: Meta search engine powered by DBpedia. In: Proceedings of the 2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011, pp. 89–93 (2011)

    Google Scholar 

  7. Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using DBpedia. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM), pp. 465–474 (2013)

    Google Scholar 

  8. Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR 2013), pp. 1–12 (2013)

    Google Scholar 

  9. Hu, Z., Huang, P., Deng, Y., Gao, Y., Xing, E.: Entity hierarchy embedding. In: Proceedings of the Association for Computational Linguistics 2015 (ACL 2015), pp. 1292–1300 (2015)

    Google Scholar 

  10. Chen, T., Tang, L.A., Sun, Y., Chen, Z., Zhang, K.: Entity embedding-based anomaly detection for heterogeneous categorical events. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2016), vol. 2016, pp. 1396–1403, January 2016

    Google Scholar 

  11. Zaveri, A., Kontokostas, D., Sherif, M.A., Bühmann, L., Morsey, M., Auer, S., Lehmann, J.: User-driven quality evaluation of DBpedia. In: Proceedings of the 9th International Conference on Semantic Systems - I-SEMANTICS 2013, p. 97 (2013)

    Google Scholar 

  12. Kontokostas, D., Westphal, P., Auer, S., Hellmann, S., Lehmann, J., Cornelissen, R., Zaveri, A.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web - WWW 2014, pp. 747–758 (2014)

    Google Scholar 

  13. Gerber, D., Hellmann, S., Bühmann, L., Soru, T., Usbeck, R., Ngonga Ngomo, A.-C.: Real-time RDF extraction from unstructured data streams. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 135–150. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41335-3_9

    Chapter  Google Scholar 

  14. Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Semant. Web Inf. Syst. (IJSWIS) 10, 63–86 (2014)

    Article  Google Scholar 

  15. Seok, M., Song, H.-J., Park, C.-Y., Kim, J.-D., Kim, Y.-S.: Named entity recognition using word embedding as a feature 1. Int. J. Softw. Eng. Appl. 10(2), 93–104 (2016)

    Google Scholar 

  16. Ganguly, D., Roy, D., Mitra, M., Jones, G.J.F.: Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 795–798 (2015)

    Google Scholar 

  17. Zhou, G., He, T., Zhao, J., Hu, P.: Learning continuous word embedding with metadata for question retrieval in community question answering. In: Proceedings of ACL (2015)

    Google Scholar 

  18. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  19. Goldberg, Y., Levy, O.: Word2vec explained: deriving Mikolov et al. Negative-Sampling Word-Embedding Method. arXiv Preprint arXiv:1402.3722, 2, 1–5 (2014)

  20. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  21. Roark, B., Collins, M.: Discriminative n-gram language modeling. Comput. Speech Lang. 21(2), 1–30 (2007)

    Article  Google Scholar 

  22. Jurafsky, D., Martin, J.H.: N-Gram. Speech and Language Processing (2014). https://lagunita.stanford.edu/c4x/Engineering/CS-224N/asset/slp4.pdf

  23. Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics (2010)

    Google Scholar 

  24. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In NIPS, pp. 1–9 (2013)

    Google Scholar 

  25. Han, L., Embrechts, M., Szymanski, B., Sternickel, K., Ross, A.: Random forests feature selection with kernel partial least squares: detecting ischemia from MagnetoCardiograms. In: Proceedings of the European Symposium on Artificial Neural Networks, Burges, Belgium, pp. 221–226 (2006)

    Google Scholar 

  26. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. 3rd edn. Morgan Kaufmann, San Francisco (2012)

    Google Scholar 

  27. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the Natural Sciences and Engineering Research Council of Canada (NSERC) for the financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanqing Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhou, H., Zouaq, A., Inkpen, D. (2017). DBpedia Entity Type Detection Using Entity Embeddings and N-Gram Models. In: Różewski, P., Lange, C. (eds) Knowledge Engineering and Semantic Web. KESW 2017. Communications in Computer and Information Science, vol 786. Springer, Cham. https://doi.org/10.1007/978-3-319-69548-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69548-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69547-1

  • Online ISBN: 978-3-319-69548-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics