Challenges in Applying Machine Learning to Media Monitoring

Lyra, Matti; Clarke, Daoud; Morgan, Hamish; Reffin, Jeremy; Weir, David

doi:10.1007/978-1-4471-4739-8_32

Challenges in Applying Machine Learning to Media Monitoring

Matti Lyra³,
Daoud Clarke⁴,
Hamish Morgan³,
Jeremy Reffin³ &
…
David Weir³

Conference paper
First Online: 01 January 2012

851 Accesses

Abstract

The Gorkana Group provides high quality media monitoring services to its clients. This paper describes an ongoing project aimed at increasing the amount of automation in Gorkana Group’s workflow through the application of machine learning and language processing technologies. It is important that Gorkana Group’s clients should have a very high level of confidence that if an article has been published, that is relevant to one of their briefs, then they will be shown the article. However, delivering this high-quality media monitoring service means that humans are having to read through very large quantities of data, only a small portion of which is typically deemed relevant. The challenge being addressed by the work reported in this paper is how to efficiently achieve such high-quality media monitoring in the face of huge increases in the amount of the data that needs to be monitored. This paper discusses some of the findings that have emerged during the early stages of the project. We show that, while machine learning can be applied successfully to this real world business problem, the distinctive constraints of the task give rise to a number of interesting challenges.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A.L. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial intelligence, 97(1-2):245–271, 1997.
Article MathSciNet MATH Google Scholar
Stanley F. Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13(4):359 – 393, 1999.
MathSciNet Google Scholar
Daoud Clarke, Peter Lane, and Paul Hender. Developing robust models for favourability analysis. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011), pages 44–52, Portland, Oregon, June 2011. Association for Computational Linguistics.
Google Scholar
G. Forman. An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 3:1289–1305, 2003.
MATH Google Scholar
William A Gale and Kenneth W Church. What’s Wrong with Adding One? In Nelleke Oostdijk and Peter de Haan, editors, Corpus Based Research in Language: In Honour of Jan Aarts, pages 189–200. Rodopi, Amsterdam, 1994.
Google Scholar
P.D. Green, P. C. R. Lane, A.W. Rainer, and S. Scholz. Selecting measures in origin analysis. In Proceedings of the Thirtieth SGAI International Conference on Artificial Intelligence, 2010.
Google Scholar
M. Kubat, R.C. Holte, and S. Matwin. Machine learning for the detection of oil spills in satellite radar images. Machine learning, 30(2):195–215, 1998.
Article Google Scholar
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.
Google Scholar
D. Mladeni’c. Feature subset selection in text-learning. Machine Learning: ECML-98, pages 95–100, 1998.
Google Scholar
M. Rogati and Y. Yang. High-performing feature selection for text classification. In Proceedings of the eleventh international conference on Information and knowledge management, pages 659–661. ACM, 2002.
Google Scholar
Lei Tang and Huan Liu. Bias analysis in text classification for highly skewed data. In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM ’05, pages 781–784, Washington, DC, USA, 2005. IEEE Computer Society.
Google Scholar
Edward Tufte. Sparkline theory and practice. http://www.edwardtufte.com/ bboard/q-and-a-fetch-msg?msg_id=OR&topic_id=1, May 2004.
Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179–214, April 2004.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, University of Sussex, Brighton, UK
Matti Lyra, Hamish Morgan, Jeremy Reffin & David Weir
Gorkana Group, 28–42 Banner Street, London, UK
Daoud Clarke

Authors

Matti Lyra
View author publications
You can also search for this author in PubMed Google Scholar
Daoud Clarke
View author publications
You can also search for this author in PubMed Google Scholar
Hamish Morgan
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Reffin
View author publications
You can also search for this author in PubMed Google Scholar
David Weir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matti Lyra .

Editor information

Editors and Affiliations

School of Computing, University of Portsmouth, Whitepost Lane The Lilacs, Portsmouth, PO1 3AH, Hampshire, United Kingdom
Max Bramer
School of Computing, Engineering & Mathe, University of Brighton, Lewes Road, Brighton, BN2 4GJ, West Sussex, United Kingdom
Miltos Petridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lyra, M., Clarke, D., Morgan, H., Reffin, J., Weir, D. (2012). Challenges in Applying Machine Learning to Media Monitoring. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_32

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4739-8_32
Published: 09 October 2012
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4738-1
Online ISBN: 978-1-4471-4739-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics