Automatic Discovery of Patterns in Media Content

Cristianini, Nello

doi:10.1007/978-3-642-21458-5_2

Nello Cristianini¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6661))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

1131 Accesses
3 Altmetric

Abstract

The strong trend towards the automation of many aspects of scientific enquiry and scholarship has started to affect also the social sciences and even the humanities. Several recent articles have demonstrated the application of pattern analysis techniques to the discovery of non-trivial relations in various datasets that have relevance for social and human sciences, and some have even heralded the advent of “Computational Social Sciences” and “Culturomics”. In this review article I survey the results obtained over the past 5 years at the Intelligent Systems Laboratory in Bristol, in the area of automating the analysis of news media content. This endeavor, which we approach by combining pattern recognition, data mining and language technologies, is traditionally a part of the social sciences, and is normally performed by human researchers on small sets of data. The analysis of news content is of crucial importance due to the central role that the global news system plays in shaping public opinion, markets and culture. It is today possible to access freely online a large part of global news, and to devise automated methods for large scale constant monitoring of patterns in content. The results presented in this survey show how the automatic analysis of millions of documents in dozens of different languages can detect non-trivial macro-patterns that could not be observed at a smaller scale, and how the social sciences can benefit from closer interaction with the pattern analysis, artificial intelligence and text mining research communities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aday, S.: Chasing the bad news: An analysis of 2005 iraq and afghanistan war coverage on nbc and fox news channel. Journal of Communications 60, 144–164 (2010)
Article Google Scholar
Ali, O., Flaounas, I., De Bie, T., Mosdell, N., Lewis, J., Cristianini, N.: Automating news content analysis: An application to gender bias and readability. In: Workshop on Applications of Pattern Analysis (WAPA). JMLR: Workshop and Conference Proceedings, Windsor, UK, pp. 36–43 (2010)
Google Scholar
Ali, O., Cristianini, N.: Information fusion for entity matching in unstructured data. In: Papadopoulos, H., Andreou, A.S., Bramer, M. (eds.) AIAI 2010. IFIP Advances in Information and Communication Technology, vol. 339, pp. 162–169. Springer, Heidelberg (2010)
Chapter Google Scholar
Ariely, D., Berns, G.: Neuromarketing: the hope and hype of neuroimaging in business. Nature Reviews Neuroscience 11, 284–292 (2010)
Article Google Scholar
Bach, F.: Bolasso: model consistent lasso estimation through the bootstrap. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008 (2008)
Google Scholar
Bautin, M., Ward, C., Patil, A., Skiena, S.: Access: News and blog analysis for the social sciences. In: 19th Int. World Wide Web Conference, WWW 2010 (2010)
Google Scholar
Chang, C., Lin, C.: LIBSVM : A library for support vector machines. Software available at (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Coyle, K.: Mass digitization of books. The Journal of Academic Librarianship 32, 641–645 (2006)
Article Google Scholar
Crane, G.: What do you do with a million books? D-Lib Magazine 12 (2006)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Book MATH Google Scholar
Cristianini, N.: Scientific method and patterns in data. In: Samalam, V. (ed.) Procs. of the 5th UK BCS Knowledge Discovery and Data Mining Symposium, University of Salford (2009)
Google Scholar
Cristianini, N.: Are we there yet? Neural Networks (2010)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: Gate: A framework and graphical development environment for robust nlp tools and applications. In: Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, Philadelphia, USA, pp. 168–175 (2002)
Google Scholar
Greenbaum, D., Luscombe, N.M., Janson, R., et al.: Interrelating different types of genomic data, from proteome to secretome: ’oming in on function. Genome Research
Google Scholar
Editorial: Defining the scientific method. Nature Methods 6, 237 (2009)
Google Scholar
Esuli, A., Sebastiani, F.: Sentiwordnet: A publicly available lexical resource for opinion mining. In: Proceedings of LREC, pp. 417–422 (2006)
Google Scholar
Flaounas, I., Ali, O., Turchi, M., Snowsill, T., Nicart, F., Bie, T.D., Cristianini, N.: Noam: News outlets analysis and monitoring system. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM, New York (2011)
Google Scholar
Flaounas, I., Turchi, M., Ali, O., Fyson, N., Bie, T.D., Mosdell, N., Lewis, J., Cristianini, N.: The structure of eu mediasphere. PLoS ONE e14243 (2010)
Google Scholar
Flaounas, I., Ali, O., Bie, T.D., Mosdell, N., Lewis, J., Cristianini, N.: Massive-scale automated analysis of news-content: Topics, style and gender (2011) (submitted for publication)
Google Scholar
Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32(3), 221–233 (1948)
Article Google Scholar
González, M., Barabási, A.L.: Complex networks: From data to models. Nature Physics 3, 224–225 (2007)
Article Google Scholar
Grivell, L.: Mining the bibliome: searching for a needle in a haystack? EMBO Reports 3(3), 200–203 (2002), http://www.nature.com/embor/journal/v3/n3/full/embor199.html
Article Google Scholar
Janes, K., Yaffe, M.: Data-driven modelling of signal-transduction networks. Nature Reviews Molecular Cell Biology 7, 820–828 (2006)
Article Google Scholar
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: Machine Translation Summit X, pp. 79–86 (2005)
Google Scholar
Koehn, P., Hoang, H., et al.: Moses: Open source toolkit for statistical machine translation. In: Annual Meeting-Association for Computational Linguistics ACL 2007, demonstration session, vol. 45 (2007)
Google Scholar
Lampos, V., De Bie, T., Cristianini, N.: Flu detector-tracking epidemics on twitter. In: Machine Learning and Knowledge Discovery in Databases, pp. 599–602 (2010)
Google Scholar
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., et al.: Computational Social Science. Science 323(5915), 721–723 (2009)
Article Google Scholar
Lewis, D., Yang, Y., Rose, T., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)
Google Scholar
Lloyd, L., Kechagias, D., Skiena, S.: Lydia: A system for large-scale news analysis. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 161–166. Springer, Heidelberg (2005)
Chapter Google Scholar
Michel, J., Shen, Y., Aiden, A., Veres, A., Gray, M., Pickett, J., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., et al.: Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331(6014), 176 (2011)
Article Google Scholar
Potthast, T.: Paradigm shifts versus fashion shifts? EMBO Reports 10, S42–S45 (2009)
Article Google Scholar
Sandhaus, E.: The new york times annotated corpus. In: Linguistic Data Consortium, Philadelphia (2008)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Snowsill, T., Flaounas, I., Bie, T.D., Cristianini, N.: Detecting events in a million new york times articles. In: Balcátar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) (ECML/PKDD 2010). LNCS (LNAI), vol. 6321, pp. 615–618. Springer, Heidelberg (2010)
Google Scholar
Snowsill, T., Nicart, F., Stefani, M., de Bie, T., Cristianini, N.: Finding surprising patterns in textual data streams. In: 2nd International Workshop on Cognitive Information Processing, pp. 405–410 (2010)
Google Scholar
Steinberger, R., Pouliquen, B., der Goot, E.V.: An Introduction to the Europe Media Monitor Family of Applications. In: Information Access in a Multilingual World-Proceedings of the SIGIR 2009 Workshop (SIGIR-CLIR 2009), pp. 1–8 (2009)
Google Scholar
Turchi, M., Flaounas, I., Ali, O., De Bie, T., Snowsill, T., Cristianini, N.: Found in translation. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 746–749. Springer, Heidelberg (2009)
Chapter Google Scholar
Watts, D.: A twenty-first century science. Nature 445(7127), 489 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Systems Laboratory, University of Bristol, UK
Nello Cristianini

Authors

Nello Cristianini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics, Università degli Studi di Palermo, Via Archirafi 34, 90123, Palermo, Italy
Raffaele Giancarlo
Department of Computer Science, University of ’Piemonte Orientale’, Viale T. Michel 11, 15121, Alessandria, Italy
Giovanni Manzini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cristianini, N. (2011). Automatic Discovery of Patterns in Media Content. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-21458-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21457-8
Online ISBN: 978-3-642-21458-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics