Skip to main content

Relation-Level Vector Representation for Relation Extraction and Classification on Specialized Data

  • Conference paper
  • First Online:
Book cover Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence (IEA/AIE 2022)

Abstract

During this last decade, word embeddings models learning continuous vector representations of words have been established and integrated in several applications of Natural Language Processing (NLP). These models have been subsequently extended to learn representations of other textual objects such as word senses/definitions, fragments of textual documents or even whole texts. In this paper, we focus on the creation of continuous vector representations for relations. We propose a model where vectors embeddings of relations are deduced from (multi-)words embeddings. The training of these representations is carried out from a business corpus referring to several specialized domains such as health, justice, urbanism, or elections. The quality of these representations is evaluated on the task of identifying/classifying lexical-semantic relations from texts with binary classifiers (Machine Learning techniques). This task consists in classifying from a relation embedding in input, the relation type of this representation. The obtained results are good and surpass the performances for a recent one state-of-the-art system dedicated to the creation of relations representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.jeuxdemots.org/diko.php.

  2. 2.

    https://radimrehurek.com/gensim/models/word2vec.html.

  3. 3.

    https://scikit-learn.org/stable/.

References

  1. Rumelhart, D.E., McClelland, J.L.: Distributed representations (1986)

    Google Scholar 

  2. Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990)

    Article  Google Scholar 

  3. Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR), pp. 1–12 (2013)

    Google Scholar 

  4. Hill, F., Cho, K., Korhonen, A.: Learning distributed representations of sentences from unlabelled data. In: Proceedings of NAACL, San Diego, California, pp. 1367–1377, June 2016. https://doi.org/10.18653/v1/N16-1162

  5. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks (2019)

    Google Scholar 

  6. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014). https://arxiv.org/abs/1405.4053

  7. Chen, M.: Efficient vector representation for documents through corruption (2017)

    Google Scholar 

  8. Wu, L., et al.: Word mover’s embedding: from Word2Vec to document embedding (2018)

    Google Scholar 

  9. Galke, L., Saleh, A., Scherp, A.: Word embeddings for practical information retrieval. In: INFORMATIK 2017, pp. 2155–2167 (2017). https://doi.org/10.18420/in2017_215

  10. Camacho-Collados, J., Espinosa-Anke, L., Shoaib, J., Schockaert, S.: A latent variable model for learning distributional relation vectors, Macau, China (2019)

    Google Scholar 

  11. Lafourcade, M.: Making people play for lexical acquisition with the JeuxDeMots prototype. In: SNLP 2007: 7th International Symposium on NLP, p. 7 (2007). https://hal-lirmm.ccsd.cnrs.fr/lirmm-00200883

  12. Turney, P.D.: Measuring semantic similarity by latent relational analysis. In: Proceedings of IJCAI, pp. 1136–1141 (2005)

    Google Scholar 

  13. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  14. Jameel, S., Bouraoui, Z., Schockaert, S.: Unsupervised learning of distributional relation vectors. In: Proceedings of ACL (Volume 1: Long Paper), Australia, pp. 23–33 (2018)

    Google Scholar 

  15. Washio, K., Kato, T.: Filling missing paths: modeling co-occurrences of word pairs and dependency paths for recognizing lexical semantic relations. In: NAACL, pp. 1123–1133 (2018)

    Google Scholar 

  16. Joshi, M., Choi, E., Levy, O., Weld, D.S., Zettlemoyer, L.: Pair2Vec: compositional word-pair embeddings for cross-sentence inference (2019)

    Google Scholar 

  17. Espinosa-Anke, L., Schockaert, S., Camacho-Collados, J.: Relational word embeddings (2019)

    Google Scholar 

  18. Chin, J., Havasi, C., Speer, R.: ConceptNet 5.5: an open multilingual graph of general knowledge. In: Proceedings of AAAI, pp. 4444–4451 (2017)

    Google Scholar 

  19. Granada, R., Vieira, R., Trojahn, C., Aussenac-Gilles, N.: Evaluating the Complementarity of Taxonomic Relation Extraction Methods Across Different Languages (2018). http://export.arxiv.org/pdf/1811.03245

  20. Panchenko, A., Naets, H., Brouwers, L., Fairon, C.: Recherche et visualisation de mots sémantiquement liés. In: TALN-RÉCITAL, pp. 747–754 (2013)

    Google Scholar 

  21. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics (COLING 1992), Stroudsburg, PA, USA, vol. 2, pp. 539–545 (1992)

    Google Scholar 

  22. Bunescu, R.C., Mooney, R.J.: A shortest path dependency kernel for relation extraction. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 724–731 (2005)

    Google Scholar 

  23. Panchenko, A., et al.: TAXI at SemEval-2016 task 13: a taxonomy induction method based on lexico-syntactic patterns, substrings and focused crawling (2016)

    Google Scholar 

  24. Bordea, G., Buitelaar, P., Faralli, S., Navigli, R.: SemEval-2015 task 17: taxonomy extraction evaluation (TExEval) (2015)

    Google Scholar 

  25. Mintz, M.D., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 1003–1011 (2009)

    Google Scholar 

  26. Brin, S.: Extracting patterns and relations from the world wide web. In: Atzeni, P., Mendelzon, A., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999). https://doi.org/10.1007/10704656_11

    Chapter  Google Scholar 

  27. Etzioni, O., et al.: Methods for domain-independent information extraction from the web: an experimental comparison. In: American Association for Artificial Intelligence (AAAI), pp. 391–398 (2004)

    Google Scholar 

  28. Bunescu, R.C., Mooney, R.J.: Subsequence kernels for relation extraction (2005)

    Google Scholar 

  29. Kambhatla, N.: Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations, Morristown, NJ, USA (2004)

    Google Scholar 

  30. Miller, S., Fox, H., Ramshaw, L., Weischedel, R.: A Novel Use of Statistical Parsing to Extract Information from Text (2000). https://aclanthology.org/A00-2030

  31. Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pp. 296–303 (2006)

    Google Scholar 

  32. Cartier, E.: Extraction automatique de relations sémantiques dans les définitions: approche hybride, construction d’un corpus de relations sémantiques pour le français. In: Actes de la 22e conférence sur le TALN, pp. 131–145, June 2015. https://aclanthology.org/2015.jeptalnrecital-long.12

  33. Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: a Python natural language processing toolkit for many human languages (2020)

    Google Scholar 

  34. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. Machine Learning in Python, p. 6 (2011)

    Google Scholar 

  35. Lang, K.: NewsWeeder: learning to filter netnews. In: Prieditis, A., Russell, S. (eds.) Machine Learning Proceedings 1995, San Francisco, CA, pp. 331–339 (1995). https://doi.org/10.1016/B978-1-55860-377-6.50048-7

  36. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Camille Gosset .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gosset, C., Billami, M.B., Lafourcade, M., Bortolaso, C., Derras, M. (2022). Relation-Level Vector Representation for Relation Extraction and Classification on Specialized Data. In: Fujita, H., Fournier-Viger, P., Ali, M., Wang, Y. (eds) Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. IEA/AIE 2022. Lecture Notes in Computer Science(), vol 13343. Springer, Cham. https://doi.org/10.1007/978-3-031-08530-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08530-7_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08529-1

  • Online ISBN: 978-3-031-08530-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics