Interval Semi-supervised LDA: Classifying Needles in a Haystack

Bodrunova, Svetlana; Koltsov, Sergei; Koltsova, Olessia; Nikolenko, Sergey; Shimorina, Anastasia

doi:10.1007/978-3-642-45114-0_21

Svetlana Bodrunova²²,
Sergei Koltsov²²,
Olessia Koltsova²²,
Sergey Nikolenko²² &
…
Anastasia Shimorina²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8265))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1364 Accesses
16 Citations

Abstract

An important text mining problem is to find, in a large collection of texts, documents related to specific topics and then discern further structure among the found texts. This problem is especially important for social sciences, where the purpose is to find the most representative documents for subsequent qualitative interpretation. To solve this problem, we propose an interval semi-supervised LDA approach, in which certain predefined sets of keywords (that define the topics researchers are interested in) are restricted to specific intervals of topic assignments. We present a case study on a Russian LiveJournal dataset aimed at ethnicity discourse analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3(4-5), 993–1022 (2003)
MATH Google Scholar
Griffiths, T., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101 (suppl. 1), 5228–5335 (2004)
Article Google Scholar
Blei, D.M., Lafferty, J.D.: Correlated topic models. Advances in Neural Information Processing Systems 18 (2006)
Google Scholar
Li, S.Z.: Markov Random Field Modeling in Image Analysis. Advances in Pattern Recognition. Springer (2009)
Google Scholar
Chang, J., Blei, D.M.: Hierarchical relational models for document networks. Annals of Applied Statistics 4(1), 124–150 (2010)
Article MathSciNet MATH Google Scholar
Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424–433. ACM, New York (2006)
Chapter Google Scholar
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM, New York (2006)
Google Scholar
Wang, C., Blei, D.M., Heckerman, D.: Continuous time dynamic topic models. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (2008)
Google Scholar
Blei, D.M., McAuliffe, J.D.: Supervised topic models. Advances in Neural Information Processing Systems 22 (2007)
Google Scholar
Lacoste-Julien, S., Sha, F., Jordan, M.I.: DiscLDA: Discriminative learning for dimensionality reduction and classification. In: Advances in Neural Information Processing Systems, vol. 20 (2008)
Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press, Arlington (2004)
Google Scholar
Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., Steyvers, M.: Learning author-topic models from text corpora. ACM Trans. Inf. Syst. 28, 1–38 (2010)
Article Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101(476), 1566–1581 (2004)
Article MathSciNet Google Scholar
Blei, D.M., Jordan, M.I., Griffiths, T.L., Tennenbaum, J.B.: Hierarchical topic models and the nested chinese restaurant process. Advances in Neural Information Processing Systems 13 (2004)
Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Sharing clusters among related groups: Hierarchical Dirichlet processes. Advances in Neural Information Processing Systems 17, 1385–1392 (2005)
Google Scholar
Williamson, S., Wang, C., Heller, K.A., Blei, D.M.: The IBP compound Dirichlet process and its application to focused topic modeling. In: Proceedings of the 27th International Conference on Machine Learning, pp. 1151–1158 (2010)
Google Scholar
Chen, X., Zhou, M., Carin, L.: The contextual focused topic model. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 96–104. ACM, New York (2012)
Chapter Google Scholar
Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proc. 26th Annual International Conference on Machine Learning, ICML 2009, pp. 25–32. ACM, New York (2009)
Google Scholar
Andrzejewski, D., Zhu, X.: Latent Dirichlet allocation with topic-in-set knowledge. In: Proc. NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, SemiSupLearn 2009, pp. 43–48. Association for Computational Linguistics, Stroudsburg (2009)
Chapter Google Scholar
Barth, F.: Introduction. In: Barth, F. (ed.) Ethnic Groups and Boundaries: The Social Organization of Culture Difference, pp. 9–38. George Allen and Unwin, London (1969)
Google Scholar
Hechter, M.: Internal colonialism: the Celtic fringe in British national development, pp. 1536–1966. Routledge & Kegan Paul, London (1975)
Google Scholar
Hall, S.: Ethnicity: Identity and difference. Radical America 23(4), 9–22 (1991)
Google Scholar
Voltmer, K.: The Media in Transitional Democracies. Polity, Cambridge (2013)
Google Scholar
Nyamnjoh, F.B.: Africa’s Media, Democracy and the Politics of Belonging. Zed Books, London (2005)
Google Scholar
ter Wal, J. (ed.): Racism and cultural diversity in the mass media: An overview of research and examples of good practice in the EU member states, 1995-2000, pp. 1995–2000. European Monitoring Centre on Racism and Xenofobia, Vienna (2002)
Google Scholar
Downing, J.D.H., Husbands, C.: Representing Race: Racisms, Ethnicity and the Media. Sage, London (2005)
Google Scholar
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of the 26th International Conference on Machine Learning, pp. 1105–1112. ACM, New York (2009)
Google Scholar
Wallach, H.M.: Structured topic models for language. PhD thesis, University of Cambridge (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Internet Studies (LINIS), National Research University Higher School of Economics, ul. Soyuza Pechatnikov, d. 16, 190008, St. Petersburg, Russia
Svetlana Bodrunova, Sergei Koltsov, Olessia Koltsova, Sergey Nikolenko & Anastasia Shimorina

Authors

Svetlana Bodrunova
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Koltsov
View author publications
You can also search for this author in PubMed Google Scholar
Olessia Koltsova
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Nikolenko
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Shimorina
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidad Autónoma del Estado de Hidalgo, Ciudad Universitaria,, Carretera Pachuca–Tulancingo km 4.5, Hidalgo, Mexico
Félix Castro
Centro de Investigación en Computación, Instituto Politécnico Nacional, Av. Juan Dios Bátiz s/n, Col. Nueva Industrial Vallejo, 07738, Mexico City, Mexico
Alexander Gelbukh
Tecnológico de Monterrey, Campus Estado de México,, Carretera Lago de Guadalupe Km 3.5, Atizapán de Zaragoza,, CP 52926, Estado de México, Mexico
Miguel González

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bodrunova, S., Koltsov, S., Koltsova, O., Nikolenko, S., Shimorina, A. (2013). Interval Semi-supervised LDA: Classifying Needles in a Haystack. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-45114-0_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45113-3
Online ISBN: 978-3-642-45114-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics