Abstract
The Gorkana Group provides high quality media monitoring services to its clients. This paper describes an ongoing project aimed at increasing the amount of automation in Gorkana Group’s workflow through the application of machine learning and language processing technologies. It is important that Gorkana Group’s clients should have a very high level of confidence that if an article has been published, that is relevant to one of their briefs, then they will be shown the article. However, delivering this high-quality media monitoring service means that humans are having to read through very large quantities of data, only a small portion of which is typically deemed relevant. The challenge being addressed by the work reported in this paper is how to efficiently achieve such high-quality media monitoring in the face of huge increases in the amount of the data that needs to be monitored. This paper discusses some of the findings that have emerged during the early stages of the project. We show that, while machine learning can be applied successfully to this real world business problem, the distinctive constraints of the task give rise to a number of interesting challenges.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
A.L. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial intelligence, 97(1-2):245–271, 1997.
Stanley F. Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13(4):359 – 393, 1999.
Daoud Clarke, Peter Lane, and Paul Hender. Developing robust models for favourability analysis. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011), pages 44–52, Portland, Oregon, June 2011. Association for Computational Linguistics.
G. Forman. An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 3:1289–1305, 2003.
William A Gale and Kenneth W Church. What’s Wrong with Adding One? In Nelleke Oostdijk and Peter de Haan, editors, Corpus Based Research in Language: In Honour of Jan Aarts, pages 189–200. Rodopi, Amsterdam, 1994.
P.D. Green, P. C. R. Lane, A.W. Rainer, and S. Scholz. Selecting measures in origin analysis. In Proceedings of the Thirtieth SGAI International Conference on Artificial Intelligence, 2010.
M. Kubat, R.C. Holte, and S. Matwin. Machine learning for the detection of oil spills in satellite radar images. Machine learning, 30(2):195–215, 1998.
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.
D. Mladeni’c. Feature subset selection in text-learning. Machine Learning: ECML-98, pages 95–100, 1998.
M. Rogati and Y. Yang. High-performing feature selection for text classification. In Proceedings of the eleventh international conference on Information and knowledge management, pages 659–661. ACM, 2002.
Lei Tang and Huan Liu. Bias analysis in text classification for highly skewed data. In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM ’05, pages 781–784, Washington, DC, USA, 2005. IEEE Computer Society.
Edward Tufte. Sparkline theory and practice. http://www.edwardtufte.com/ bboard/q-and-a-fetch-msg?msg_id=OR&topic_id=1, May 2004.
Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179–214, April 2004.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this paper
Cite this paper
Lyra, M., Clarke, D., Morgan, H., Reffin, J., Weir, D. (2012). Challenges in Applying Machine Learning to Media Monitoring. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_32
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4739-8_32
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4738-1
Online ISBN: 978-1-4471-4739-8
eBook Packages: Computer ScienceComputer Science (R0)