Skip to main content
Log in

High Value Media Monitoring With Machine Learning

Using Machine Learning to Drive Cost Effectiveness in an Established Business

  • Research Project
  • Published:
KI - Künstliche Intelligenz Aims and scope Submit manuscript

Abstract

The Gorkana Group provides high quality media monitoring services to its clients. This paper describes an ongoing project aimed at increasing the amount of automation in Gorkana Group’s workflow through the application of machine learning and language processing technologies. It is important that Gorkana Group’s clients should have a very high level of confidence, that, if an article is relevant to one of their briefs, then they will be shown the article. However, delivering this high-quality media monitoring service means that humans are required to read through very large quantities of data, only a small portion of which is typically deemed relevant. The challenge being addressed by the work reported in this paper is how to efficiently achieve such high-quality media monitoring in the face of huge increases in the amount of the data that needs to be monitored. We show that, while machine learning can be applied successfully to this real world business problem, the constraints of the task give rise to a number of interesting challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Smoothing redistributes the total probability mass over all features so that some mass is deducted from the seen features and assigned to unseen features, thus avoiding zero probabilities [1, 4, 14].

  2. http://www.cs.waikato.ac.nz/ml/weka/.

  3. Note that true population probabilities of types in natural language text are a hypothetical and an ill defined concept. The probabilities can be measured for large corpora, but this is an estimate of the hypothesised true probabilities.

References

  1. Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 13(4):359–393

    Article  Google Scholar 

  2. Clarke D, Lane P, Hender P (2011) Developing robust models for favourability analysis. In: Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis (WASSA 2.011), Portland, Oregon, June 2011. Association for Computational Linguistics, Stroudsburg pp 44–52.

    Google Scholar 

  3. Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305

    MATH  Google Scholar 

  4. Gale WA, Church KW (1994) What’s wrong with adding one? In: Oostdijk N, de Haan P (eds) Corpus based research in language. Honour of Jan Aarts. Rodopi, Amsterdam, pp 189–200

    Google Scholar 

  5. Gale WA, Sampson G (1995) Good-turing frequency estimation without tears. J Quant Linguist 2(3):217–237

    Article  Google Scholar 

  6. Good IJ, Turing AM (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40:237–264

    MathSciNet  MATH  Google Scholar 

  7. Green PD, Lane PCR, Rainer AW, Scholz S (2010) Selecting measures in origin analysis. In: Proceedings of the thirtieth SGAI international conference on artificial intelligence

    Google Scholar 

  8. Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2):195–215

    Article  Google Scholar 

  9. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  10. Mladenić D (1998) Feature subset selection in text-learning. In: Machine learning: ECML-98, pp 95–100

    Chapter  Google Scholar 

  11. Rogati M, Yang Y (2002) High-performing feature selection for text classification. In: Proceedings of the eleventh international conference on information and knowledge management. ACM, New York, pp 659–661

    Google Scholar 

  12. Tang L, Liu H (2005) Bias analysis in text classification for highly skewed data. In: Proceedings of the fifth IEEE international conference on data mining (ICDM ’05), Washington, DC, USA. IEEE Comput. Soc., Los Alamitos, pp 781–784

    Chapter  Google Scholar 

  13. Tufte E (2004) Sparkline theory and practice. http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1 May 2004

  14. Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst 22(2):179–214

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matti Lyra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lyra, M., Clarke, D., Morgan, H. et al. High Value Media Monitoring With Machine Learning. Künstl Intell 27, 255–265 (2013). https://doi.org/10.1007/s13218-013-0255-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13218-013-0255-2

Keywords

Navigation