Skip to main content
Log in

Comparing explicit and predictive distributional semantic models endowed with syntactic contexts

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this article, we introduce an explicit count-based strategy to build word space models with syntactic contexts (dependencies). A filtering method is defined to reduce explicit word-context vectors. This traditional strategy is compared with a neural embedding (predictive) model also based on syntactic dependencies. The comparison was performed using the same parsed corpus for both models. Besides, the dependency-based methods are also compared with bag-of-words strategies, both count-based and predictive ones. The results show that our traditional count-based model with syntactic dependencies outperforms other strategies, including dependency-based embeddings, but just for the tasks focused on discovering similarity between words with the same function (i.e. near-synonyms).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/

  2. code.google.com/p/word2vec/

  3. We use bow to refer to linear bag-of-word contexts, which must be distinguished from continuous bag-of-words (CBOW). Unlike linear bag-of-words, CBOW uses continuous distributed representation of the context. It is a learning strategy that tries to predict a given word given its context, instead of predicting the context given a word as in the skip-gram model.

  4. The number of target words differs from predictive models due to multiple heuristics and thresholds (hyperparameters) used to generate both predictive and count-based models.

  5. https://www.openthesaurus.de/

  6. http://fegalaz.usc.es/~gamallo/resources/count-models.tar.gz

References

  • Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the Association for Computational Linguistics, NAACL ’09 (pp. 19–27).

  • Baroni, M., & Lenci, A. (2010). Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4), 673–721.

    Article  Google Scholar 

  • Baroni, M., Bernardi, R., & Zamparelli, R. (2014a). Frege in space: A program for compositional distributional semantics. LiLT, 9, 241–346.

    Google Scholar 

  • Baroni, M., Dinu, G., & Kruszewski, G. (2014b). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Volume 1: Long papers) (pp. 238–247). Baltimore, Maryland.

  • Biemann, C., & Riedl, M. (2013). Text: Now in 2d! A framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1), 55–95.

    Article  Google Scholar 

  • Blacoe, W., & Lapata, M. (2012). A comparison of vector-based representations for semantic composition. In Empirical methods in natural language processing—EMNLP-2012 (pp. 546–556). Jeju Island, Korea.

  • Bordag, S. (2008) A comparison of co-occurrence and similarity measures as simulations of context. In 9th CICLing (pp. 52–63).

  • Bullinaria, J . A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39(3), 510–526.

    Article  Google Scholar 

  • Bullinaria, J. A., & Levy, J. P. (2013). Limiting factors for mapping corpus-based semantic representations to brain activity. PLoS One, 8(3), e57191.

    Article  Google Scholar 

  • Chen, Z. (2003). Assessing sequence comparison methods with the average precision criterion. Bioinformatics, 19, 2456–2460.

    Article  Google Scholar 

  • Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In International conference on machine learning. ICML.

  • Curran, J. R., & Moens, M. (2002). Improvements in automatic thesaurus extraction. In ACL workshop on unsupervised lexical acquisition (pp. 59–66). Philadelphia.

  • Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.

    Google Scholar 

  • Fellbaum, C. (1998). A semantic network of english: The mother of all WordNets. Computer and the Humanities, 32, 209–220.

    Article  Google Scholar 

  • Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., et al. (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1), 116–131.

    Article  Google Scholar 

  • Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., et al. (2005). New experiments in distributional representations of synonymy. In Proceedings of the ninth conference on computational natural language learning (pp. 25–32).

  • Gamallo, P. (2008). Comparing window and syntax based strategies for semantic extraction. In PROPOR-2008. Lecture Notes in Computer Science (pp. 41–50). Springer.

  • Gamallo, P. (2009). Comparing different properties involved in word similarity extraction. In 14th Portuguese conference on artificial intelligence (EPIA’09), LNCS (Vol. 5816, pp. 634–645). Aveiro: Springer.

  • Gamallo, P. (2015). Dependency parsing with compression rules. In International workshop on parsing technology (IWPT 2015), Bilbao, Spain.

  • Gamallo, P., & Bordag, S. (2011). Is singular value decomposition useful for word simalirity extraction. Language Resources and Evaluation, 45(2), 95–119.

    Article  Google Scholar 

  • Gamallo, P., & González, I. (2011). A grammatical formalism based on patterns of part-of-speech tags. International Journal of Corpus Linguistics, 16(1), 45–71.

    Article  Google Scholar 

  • Gamallo, P., Agustini, A., & Lopes, G. (2005). Clustering syntactic positions with similar semantic requirements. Computational Linguistics, 31(1), 107–146.

    Article  Google Scholar 

  • Goldberg, Y., & Nivre, J. (2012). A dynamic oracle for arc-eager dependency parsing. In COLING 2012, 24th international conference on computational linguistics proceedings of the conference: Technical papers, 8–15 (pp. 959–976). Mumbai, India.

  • Grefenstette, G. (1993). Evaluation techniques for automatic semantic extraction: Comparing syntactic and window-based approaches. In Workshop on acquisition of lexical knowledge from text SIGLEX/ACL. Columbus, OH.

  • Harris, Z. (1985). Distributional structure. In J. Katz (Ed.), The philosophy of linguistics (pp. 26–47). New York: Oxford University Press.

    Google Scholar 

  • Hofmann, M. J., & Jacobs, A. M. (2014). Interactive activation and competition models and semantic context: From behavioral to brain data. Neuroscience and Biobehavioral Reviews, 46(Part 1), 85–104.

    Article  Google Scholar 

  • Hofmann, M., Kuchinke, L., Biemann, C., Tamm, S., & Jacobs, A. (2011). Remembering words in context as predicted by an associative read-out model. Frontiers in Psychology, 252(2), 85–104.

    Google Scholar 

  • Huang, E., Socher, R., & Manning, C. (2012). Improving word representations via global context and multiple word prototypes. In ACL-2012 (pp. 873–882). Jeju Island, Korea.

  • Landauer, T., & Dumais, S. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquision, induction and representation of knowledge. Psychological Review, 10(2), 211–240.

    Article  Google Scholar 

  • Lebret, R., & Collobert, R. (2015). Rehabilitation of count-based models for word vector representations. In Gelbukh, A. F. (Ed.), CICLing (1), Springer, Lecture Notes in Computer Science (Vol. 9041, pp. 417–429).

  • Levy, O., & Goldberg, Y. (2014a). Dependency-based word embeddings. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, ACL 2014, June 22–27, 2014, Baltimore, MD, USA (pp. 302–308).

  • Levy, O., & Goldberg, Y. (2014b) Linguistic regularities in sparse and explicit word representations. In Proceedings of the eighteenth conference on Computational Natural Language Learning, CoNLL 2014, Baltimore, Maryland, USA, June 26–27, 2014 (pp. 171–180).

  • Levy, O., & Goldberg, Y., (2014c) Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014 (December) (pp. 2177–2185). Montreal, Quebec, Canada.

  • Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.

    Google Scholar 

  • Lin, D. (1998). Automatic retrieval and clustering of similar words. In COLING-ACL’98, Montreal.

  • Lu, C. H., Ong, C. S., Hsub, W. L., & Leeb, H. K. (2011). Using filtered second order co-occurrence matrix to improve the traditional co-occurrence model. In Computer technologies and information sciences, Department of Computer Science and Information Engineering, National Taiwan University, http://www.osti.gov/eprints/topicpages/documents/record/803/2113132.html.

  • Mikolov, T., Yih, W., & Zweig, G., (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies (pp. 746–751). Atlanta, Georgia.

  • Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161–199.

    Article  Google Scholar 

  • Padró, M., Idiart, M., Villavicencio, A., & Ramisch, C. (2014). Nothing like good old frequency: Studying context filters for distributional thesauri. In Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL (pp. 419–424).

  • Peirsman, Y., Heylen, K., & Speelman, D. (2007). Finding semantically related words in Dutch, co-occurrences versus syntactic contexts. In CoSMO workshop (pp. 9–16). Roskilde, Denmark.

  • Seretan, V., & Wehrli, E. (2006). Accurate collocation extraction using a multilingual parser. In21st international conference on computational linguistics and the 44th annual meeting of the ACL (pp. 953–960).

  • Turney, P. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In 12th European conference of machine learning (pp. 491–502).

  • Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics, 32(3), 379–416.

    Article  Google Scholar 

  • Zhu, P. (2015). N-Grams based linguistic search engine. International Journal of Computational Linguistics Research, 6(1), 1–7.

Download references

Acknowledgments

This research has been partially funded by the Spanish Ministry of Economy and Competitiveness through project FFI2014-51978-C2-1-R. We are very grateful to Omer Levy and Yoav Goldberg for sending us the parsed corpus used to build their embeddings. Moreover, we are also very grateful to the reviewers for their useful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pablo Gamallo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gamallo, P. Comparing explicit and predictive distributional semantic models endowed with syntactic contexts. Lang Resources & Evaluation 51, 727–743 (2017). https://doi.org/10.1007/s10579-016-9357-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-016-9357-4

Keywords

Navigation