Skip to main content
Log in

Statistical word sense aware topic models

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

LDA has been proved effective in modeling the semantic relation between surface words. This semantic information in the document collection is useful to measure the topic distribution for a document. In general, a surface word may significantly contribute to several topics in a document collection. LDA measures the contribution of a surface word to each topic and considers a surface word to be identical across all documents. However, a surface word may present different signatures in different contexts, i.e., polysemous words can be used with different senses in different contexts. Intuitively, disambiguating word senses for topic models can enhance their discriminative capabilities. In this work, we propose a joint model to automatically induce document topics and word senses simultaneously. Instead of using some pre-defined word sense resources, we capture the word sense information via a latent variable and directly induce them in a fully unsupervised manner from the corpora. Experimental results show that the proposed joint model outperforms the baselines significantly in document clustering and improves the word sense induction as well against a standalone non-parametric model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Note that in this paper, we use Dir to represent Dirichelt distribution while we use DP to represent Dirichlet Process.

  2. \(p(z|w)\) can be calculated with \( p(z|w) \propto p(w|z) \Sigma p(z|d)p(d) \) where \(p(w|z)\) and \(p(z|d)\) are parameters of the model that can be estimated while we estimate \(p(d)\) to be the proportion of \(d\)’s document length to the length of the entire document collection.

References

  • Agirre E, Soroa A (2007) Semeval-2007 task 02: Evaluating word sense induction and discrimination systems. In: Proceedings of the 4th International Workshop on Semantic Evaluations, Association for Computational Linguistics, Stroudsburg, PA, USA, SemEval ’07, pp 7–12. http://dl.acm.org/citation.cfm?id=1621474.1621476

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022. http://dl.acm.org/citation.cfm?id=944919.944937

  • Boyd-Graber J, Blei D (2007) Putop: Turning predominant senses into a topic model for word sense disambiguation. In: Proceedings of the 4th International Workshop on Semantic Evaluations, Association for Computational Linguistics, Stroudsburg, PA, USA, SemEval ’07, pp 277–281. http://dl.acm.org/citation.cfm?id=1621474.1621534

  • Boyd-Graber JL, Blei DM, Zhu X (2007) A topic model for word sense disambiguation. In: EMNLP-CoNLL, pp 1024–1033

  • Brody S, Lapata M (2009) Bayesian word sense induction. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, EACL ’09, pp 103–111. http://dl.acm.org/citation.cfm?id=1609067.1609078

  • Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57. doi:10.1109/MCI.2014.2307227

  • Chemudugunta C, Smyth P, Steyvers M (2008) Combining concept hierarchies and statistical topic models. In: Proceedings of the 17th ACM conference on Information and knowledge management, ACM, New York, NY, USA, CIKM ’08, pp 1469–1470. doi:10.1145/1458082.1458337. http://doi.acm.org/10.1145/1458082.1458337

  • Denkowski M (2009) A survey of techniques for unsupervised word sense induction. Language and Statistics II Literature Review.

  • Dietz L, Bickel S, Scheffer T (2007) Unsupervised prediction of citation influences. In: Proceedings of the 24th International Conference on Machine Learning, pp 233–240

  • Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th international joint conference on Artifical intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’07, pp 1606–1611. http://dl.acm.org/citation.cfm?id=1625275.1625535

  • Griffiths TL, Steyvers M (2004) Finding scientific topics. PNAS 101(suppl. 1):5228–5235

    Article  Google Scholar 

  • Guo W, Diab M (2011) Semantic topic models: combining word distributional statistics and dictionary definitions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, USA, EMNLP ’11, pp 552–561. http://dl.acm.org/citation.cfm?id=2145432.2145496

  • Hotho A, Staab S, Stumme G (2003) Wordnet improves text document clustering. In: Proc. of the SIGIR 2003 Semantic Web Workshop, pp 541–544.

  • Hovy E, Marcus M, Palmer M, Ramshaw L, Weischedel R (2006) Ontonotes: the 90% solution. Proceedings of the human language technology conference of the NAACL. Companion Volume, Short Papers, Association for Computational Linguistics , pp 57–60

  • Huang HH, Kuo YH (2010) Cross-lingual document representation and semantic similarity measure: a fuzzy set and rough set based approach. Trans Fuz Sys 18(6), pp. 1098–1111. doi:10.1109/TFUZZ.2010.2065811

  • Klapaftis IP, Manandhar S (2013) Evaluating word sense induction and disambiguation methods. Lang Resour Eval 47(3):579–605. doi:10.1007/s10579-012-9205-0

  • Kong J, Graff D (2005) Tdt4 multilingual broadcast news speech corpus. Linguistic Data Consortium. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp

  • Lau RYK, Xia Y, Ye Y (2014) A probabilistic generative model for mining cybercriminal networks from online social media. IEEE Comput Intell Mag 9(1):31–43. doi:10.1109/MCI.2013.2291689

  • Lewis DD (1997) Reuters-21578 text categorization test collection, distribution 1.0. http://www.research.att.com/~lewis/reuters21578.html

  • Li L, Roth B, Sporleder C (2010) Topic models for word sense disambiguation and token-based idiom detection. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Stroudsburg, PA, pp 1138–1147

  • McCarthy D, Koeling R, Weeds J, Carroll J (2004) Finding predominant word senses in untagged text. In: Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, ACL ’04. doi:10.3115/1218955.1218991

  • Navigli R (2009) Word sense disambiguation: A survey. ACM Comput Surv 41(2):10:1–10:69. doi:10.1145/1459352.1459355

  • Navigli R, Crisafulli G (2010) Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, USA, EMNLP ’10, pp 116–126, URL http://dl.acm.org/citation.cfm?id=1870658.1870670

  • Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11), pp. 613–620. doi: 10.1145/361219.361220

  • Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. Proceedings of international conference on new methods in language processing, Manchester, UK 12:44–49

  • Schtze H, Pedersen J (1995) Information retrieval based on word senses. In: Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval.

  • Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’00, pp 208–215. doi:10.1145/345508.345578

  • Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In. In KDD Workshop on Text Mining.

  • Stokoe C, Oakes MP, Tait J (2003) Word sense disambiguation in information retrieval revisited. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, ACM, New York, NY, USA, SIGIR ’03, pp 159–166. doi:10.1145/860435.860466

  • Teh YW, Jordan MI, Beal MJ, Blei DM (2004) Hierarchical dirichlet processes. Journal of the American Statistical Association 101.

  • Tufiş D, Koeva S (2007) Ontology-supported text classification based on cross-lingual word sense disambiguation. In: Applications of Fuzzy Sets Theory. Springer, Berlin, pp 447–455.

  • Wang X, McCallum A, Wei X (2007) Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, IEEE Computer Society, Washington, DC, USA, ICDM ’07, pp 697–702. doi:10.1109/ICDM.2007.86

  • Yao X, Van Durme B (2011) Nonparametric bayesian word sense induction. In: Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing, Association for Computational Linguistics, pp 10–14. http://cs.jhu.edu/xuchen/paper/Yao2011WSI.slides.pdf. http://cs.jhu.edu/xuchen/paper/Yao2011WSI.pdf

Download references

Acknowledgments

We thank the reviewers for the insightful comments. This work is supported by Natural Science Foundation of China (NSFC: 61272233).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunqing Xia.

Additional information

Communicated by L. Xie.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, G., Xia, Y., Sun, J. et al. Statistical word sense aware topic models. Soft Comput 19, 13–27 (2015). https://doi.org/10.1007/s00500-014-1372-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-014-1372-z

Keywords

Navigation