Skip to main content

Leveraging Pattern Associations for Word Embedding Models

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10177))

Abstract

Word embedding method has been shown powerful to capture words association, and facilitated numerous applications by effectively bridging lexical gaps. Word semantic is encoded with vectors and modeled based on n-gram language models, as a result it only takes into consideration of words co-occurrences in a shallow slide windows. However, the assumption of the language modelling ignores valuable associations between words in a long distance beyond n-gram coverage. In this paper, we argue that it is beneficial to jointly modeling both surrounding context and flexible associative patterns so that the model can cover long distance and intensive association. We propose a novel approach to combine associated patterns for word embedding method via joint training objection. We apply our model for query expansion in document retrieval task. Experimental results show that the proposed method can perform significantly better than the state-of-the-arts baseline models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994, pp. 487–499 (1994)

    Google Scholar 

  2. Baroni, M., Murphy, B., Barbu, E., Poesio, M.: Strudel: a corpus-based semantic model based on properties and types. Cogn. Sci. 34(2), 222–254 (2010)

    Article  Google Scholar 

  3. Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. SIGKDD 2(2), 66–75 (2000)

    Article  MATH  Google Scholar 

  4. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. JMLR 3, 1137–1155 (2003)

    MATH  Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Bollegala, D., Maehara, T., Yoshida, Y., Kawarabayashi, K.: Learning word representations from relational graphs. In: AAAI 2015, pp. 2146–2152 (2015)

    Google Scholar 

  7. Cheng, H., Yan, X., Han, J., Hsu, C.: Discriminative frequent pattern analysis for effective classification. In: ICDE 2007, pp. 716–725 (2007)

    Google Scholar 

  8. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. JMLR 12, 2493–2537 (2011)

    MATH  Google Scholar 

  9. Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. TKDE 27(6), 1629–1642 (2015)

    Google Scholar 

  10. Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: learning sense embeddings for word and relational similarity. In: ACL 2015, pp. 95–105 (2015)

    Google Scholar 

  11. Lebret, R., Collobert, R.: Word embeddings through hellinger PCA. In: EACL 2014, pp. 482–490 (2014)

    Google Scholar 

  12. Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: ACL 2014, pp. 302–308 (2014)

    Google Scholar 

  13. Li, J., Li, J., Fu, X., Masud, M.A., Huang, J.Z.: Learning distributed word representation with multi-contextual mixed embedding. KBS 106, 220–230 (2016)

    Google Scholar 

  14. Liu, Y., Liu, Z., Chua, T., Sun, M.: Topical word embeddings. In: AAAI 2015, pp. 2418–2424 (2015)

    Google Scholar 

  15. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)

    Google Scholar 

  16. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013, pp. 3111–3119 (2013)

    Google Scholar 

  17. Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: NIPS 2008, pp. 1081–1088 (2008)

    Google Scholar 

  18. Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: NIPS 2013, pp. 2265–2273 (2013)

    Google Scholar 

  19. Murphy, B., Talukdar, P.P., Mitchell, T.M.: Learning effective and interpretable semantic models using non-negative sparse embedding. In: COLING 2012, pp. 1933–1950 (2012)

    Google Scholar 

  20. Nam, J., Loza Mencía, E., Fürnkranz, J.: All-in text: learning document, label, and word representations jointly. In: AAAI 2016, pp. 1948–1954 (2016)

    Google Scholar 

  21. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP 2014, pp. 1532–1543 (2014)

    Google Scholar 

  22. Robertson, S.E., Zaragoza, H., Taylor, M.J.: Simple BM25 extension to multiple weighted fields. In: CIKM 2004, pp. 42–49 (2004)

    Google Scholar 

  23. Rothe, S., Schütze, H.: Autoextend: extending word embeddings to embeddings for synsets and lexemes. In: ACL 2015, pp. 1793–1803 (2015)

    Google Scholar 

  24. Schwartz, R., Reichart, R., Rappoport, A.: Symmetric pattern based word embeddings for improved word similarity prediction. In: CoNLL 2015, pp. 258–267 (2015)

    Google Scholar 

  25. Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Learning word representations by jointly modeling syntagmatic and paradigmatic relations. In: ACL 2015, pp. 136–145 (2015)

    Google Scholar 

  26. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. JAIR 37, 141–188 (2010)

    MathSciNet  MATH  Google Scholar 

  27. Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: KDD 2002, pp. 639–644 (2002)

    Google Scholar 

Download references

Acknowledgments

The work was supported by National Nature Science Foundation of China (Grant No. 61132009), National Basic Research Program of China (973 Program, Grant No. 2013CB329303).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Liu, Q., Huang, H., Gao, Y., Wei, X., Geng, R. (2017). Leveraging Pattern Associations for Word Embedding Models. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10177. Springer, Cham. https://doi.org/10.1007/978-3-319-55753-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55753-3_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55752-6

  • Online ISBN: 978-3-319-55753-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics