High Value Media Monitoring With Machine Learning

Lyra, Matti; Clarke, Daoud; Morgan, Hamish; Reffin, Jeremy; Weir, David

doi:10.1007/s13218-013-0255-2

High Value Media Monitoring With Machine Learning

Using Machine Learning to Drive Cost Effectiveness in an Established Business

Research Project
Published: 14 June 2013

Volume 27, pages 255–265, (2013)
Cite this article

KI - Künstliche Intelligenz Aims and scope Submit manuscript

Matti Lyra¹,
Daoud Clarke²,
Hamish Morgan¹,
Jeremy Reffin¹ &
…
David Weir¹

242 Accesses
1 Citation
Explore all metrics

Abstract

The Gorkana Group provides high quality media monitoring services to its clients. This paper describes an ongoing project aimed at increasing the amount of automation in Gorkana Group’s workflow through the application of machine learning and language processing technologies. It is important that Gorkana Group’s clients should have a very high level of confidence, that, if an article is relevant to one of their briefs, then they will be shown the article. However, delivering this high-quality media monitoring service means that humans are required to read through very large quantities of data, only a small portion of which is typically deemed relevant. The challenge being addressed by the work reported in this paper is how to efficiently achieve such high-quality media monitoring in the face of huge increases in the amount of the data that needs to be monitored. We show that, while machine learning can be applied successfully to this real world business problem, the constraints of the task give rise to a number of interesting challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Disruptive Innovation: Large Scale Multimedia Data Mining

Transforming Business Approaches with an Integrated Machine Learning and Algorithmic System, Guided by Visual Data Dashboards

An Empirical Comparison of Methods for Multi-label Data Stream Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Smoothing redistributes the total probability mass over all features so that some mass is deducted from the seen features and assigned to unseen features, thus avoiding zero probabilities [1, 4, 14].
http://www.cs.waikato.ac.nz/ml/weka/.
Note that true population probabilities of types in natural language text are a hypothetical and an ill defined concept. The probabilities can be measured for large corpora, but this is an estimate of the hypothesised true probabilities.

References

Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 13(4):359–393
Article Google Scholar
Clarke D, Lane P, Hender P (2011) Developing robust models for favourability analysis. In: Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis (WASSA 2.011), Portland, Oregon, June 2011. Association for Computational Linguistics, Stroudsburg pp 44–52.
Google Scholar
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
MATH Google Scholar
Gale WA, Church KW (1994) What’s wrong with adding one? In: Oostdijk N, de Haan P (eds) Corpus based research in language. Honour of Jan Aarts. Rodopi, Amsterdam, pp 189–200
Google Scholar
Gale WA, Sampson G (1995) Good-turing frequency estimation without tears. J Quant Linguist 2(3):217–237
Article Google Scholar
Good IJ, Turing AM (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40:237–264
MathSciNet MATH Google Scholar
Green PD, Lane PCR, Rainer AW, Scholz S (2010) Selecting measures in origin analysis. In: Proceedings of the thirtieth SGAI international conference on artificial intelligence
Google Scholar
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2):195–215
Article Google Scholar
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Book MATH Google Scholar
Mladenić D (1998) Feature subset selection in text-learning. In: Machine learning: ECML-98, pp 95–100
Chapter Google Scholar
Rogati M, Yang Y (2002) High-performing feature selection for text classification. In: Proceedings of the eleventh international conference on information and knowledge management. ACM, New York, pp 659–661
Google Scholar
Tang L, Liu H (2005) Bias analysis in text classification for highly skewed data. In: Proceedings of the fifth IEEE international conference on data mining (ICDM ’05), Washington, DC, USA. IEEE Comput. Soc., Los Alamitos, pp 781–784
Chapter Google Scholar
Tufte E (2004) Sparkline theory and practice. http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1 May 2004
Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst 22(2):179–214
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, University of Sussex, Brighton, UK
Matti Lyra, Hamish Morgan, Jeremy Reffin & David Weir
Gorkana Group, 28–42 Banner Street, London, UK
Daoud Clarke

Authors

Matti Lyra
View author publications
You can also search for this author inPubMed Google Scholar
Daoud Clarke
View author publications
You can also search for this author inPubMed Google Scholar
Hamish Morgan
View author publications
You can also search for this author inPubMed Google Scholar
Jeremy Reffin
View author publications
You can also search for this author inPubMed Google Scholar
David Weir
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Matti Lyra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lyra, M., Clarke, D., Morgan, H. et al. High Value Media Monitoring With Machine Learning. Künstl Intell 27, 255–265 (2013). https://doi.org/10.1007/s13218-013-0255-2

Download citation

Published: 14 June 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s13218-013-0255-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High Value Media Monitoring With Machine Learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Disruptive Innovation: Large Scale Multimedia Data Mining

Transforming Business Approaches with an Integrated Machine Learning and Algorithmic System, Guided by Visual Data Dashboards

An Empirical Comparison of Methods for Multi-label Data Stream Classification

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now