Skip to main content

‘Linguistics-Lite’ Topic Extraction from Multilingual Social Media Data

  • Conference paper
  • First Online:
Social Computing, Behavioral-Cultural Modeling, and Prediction (SBP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9021))

Abstract

To achieve accurate situation assessments and information dominance the commander needs accurate and rapid insight into the socio-cognitive landscape of his communities of interest. This requires insight into the key actors, groups, and their issues and concerns, and to have early indicators of changes. Social media (which by its nature is noisy and multilingual) is increasing the amount and type of data available for early assessment of rapidly emerging and changing situations such as disasters or crises. In this paper, we present a way of extracting topics from this kind of data in a principled and scalable fashion – regardless of the mix of languages, subject matter, or provenance of data (e.g. Twitter, VKontakte). Using a non-trivial validation task, we demonstrate that the technique is highly accurate (around 92%). We then show the results of applying the technique to a sample of around 100,000 Twitter posts generally relating to the early-2014 conflict in Ukraine, and explain how these results – or comparable results of applying the technique to other datasets – would enable a busy analyst quickly to gain a top-down understanding of a large set of data and help him or her to decide where to focus more detailed attention.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gartner Group: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011. Gartner Report ID G00208112 (2011)

    Google Scholar 

  2. Lindsay, B.: Social Media and Disasters: Current Uses, Future Options, and Policy Considerations. Congressional Research Service 7-5700, R41987 (2011)

    Google Scholar 

  3. Drozdova, K., Samoilov, M.: Predictive analysis of concealed social network activities based on communication technology choices: early-warning detection of attack signals from terrorist organizations. Comp. and Math. Org. Theory 16(1), 61–88 (2010)

    Article  Google Scholar 

  4. Costa, B., Boiney, J.: Social Radar. MITRE Technical Report #120088 (2012)

    Google Scholar 

  5. Chew, P.: Critiquing Text Analysis in Social Modeling: Best Practices, Limitations, and New Frontiers. Soc. Computing, Behavioral-Cultural Modeling & Prediction, pp. 350-358 (2013)

    Google Scholar 

  6. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  7. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Soc. for Inf. Science 41(6), 391–407 (1990)

    Google Scholar 

  8. Chew, P.A., Bader, B.W., Helmreich, S., Abdelali, A., Verzi, S.J.: An Information-Theoretic, Vector-Space Model Approach to Cross-Language Information Retrieval. Journal of Natural Language Engineering 17(1), 37–70 (2011)

    Article  Google Scholar 

  9. Young, P.: Cross Language Information Retrieval Using Latent Semantic Indexing. Master’s thesis, University of Knoxville, Tennessee: Knoxville, TN (1994)

    Google Scholar 

  10. Brown, P.F., Della Pietra, V.J., Della Pietra, S.A., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Comp. Ling. 19(2), 263–311 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter A. Chew .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chew, P.A. (2015). ‘Linguistics-Lite’ Topic Extraction from Multilingual Social Media Data. In: Agarwal, N., Xu, K., Osgood, N. (eds) Social Computing, Behavioral-Cultural Modeling, and Prediction. SBP 2015. Lecture Notes in Computer Science(), vol 9021. Springer, Cham. https://doi.org/10.1007/978-3-319-16268-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16268-3_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16267-6

  • Online ISBN: 978-3-319-16268-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics