Skip to main content

Automatic Discovery of Patterns in Media Content

  • Conference paper
Combinatorial Pattern Matching (CPM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6661))

Included in the following conference series:

Abstract

The strong trend towards the automation of many aspects of scientific enquiry and scholarship has started to affect also the social sciences and even the humanities. Several recent articles have demonstrated the application of pattern analysis techniques to the discovery of non-trivial relations in various datasets that have relevance for social and human sciences, and some have even heralded the advent of “Computational Social Sciences” and “Culturomics”. In this review article I survey the results obtained over the past 5 years at the Intelligent Systems Laboratory in Bristol, in the area of automating the analysis of news media content. This endeavor, which we approach by combining pattern recognition, data mining and language technologies, is traditionally a part of the social sciences, and is normally performed by human researchers on small sets of data. The analysis of news content is of crucial importance due to the central role that the global news system plays in shaping public opinion, markets and culture. It is today possible to access freely online a large part of global news, and to devise automated methods for large scale constant monitoring of patterns in content. The results presented in this survey show how the automatic analysis of millions of documents in dozens of different languages can detect non-trivial macro-patterns that could not be observed at a smaller scale, and how the social sciences can benefit from closer interaction with the pattern analysis, artificial intelligence and text mining research communities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aday, S.: Chasing the bad news: An analysis of 2005 iraq and afghanistan war coverage on nbc and fox news channel. Journal of Communications 60, 144–164 (2010)

    Article  Google Scholar 

  2. Ali, O., Flaounas, I., De Bie, T., Mosdell, N., Lewis, J., Cristianini, N.: Automating news content analysis: An application to gender bias and readability. In: Workshop on Applications of Pattern Analysis (WAPA). JMLR: Workshop and Conference Proceedings, Windsor, UK, pp. 36–43 (2010)

    Google Scholar 

  3. Ali, O., Cristianini, N.: Information fusion for entity matching in unstructured data. In: Papadopoulos, H., Andreou, A.S., Bramer, M. (eds.) AIAI 2010. IFIP Advances in Information and Communication Technology, vol. 339, pp. 162–169. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Ariely, D., Berns, G.: Neuromarketing: the hope and hype of neuroimaging in business. Nature Reviews Neuroscience 11, 284–292 (2010)

    Article  Google Scholar 

  5. Bach, F.: Bolasso: model consistent lasso estimation through the bootstrap. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008 (2008)

    Google Scholar 

  6. Bautin, M., Ward, C., Patil, A., Skiena, S.: Access: News and blog analysis for the social sciences. In: 19th Int. World Wide Web Conference, WWW 2010 (2010)

    Google Scholar 

  7. Chang, C., Lin, C.: LIBSVM : A library for support vector machines. Software available at (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  8. Coyle, K.: Mass digitization of books. The Journal of Academic Librarianship 32, 641–645 (2006)

    Article  Google Scholar 

  9. Crane, G.: What do you do with a million books? D-Lib Magazine 12 (2006)

    Google Scholar 

  10. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-based learning methods. Cambridge University Press, Cambridge (2000)

    Book  MATH  Google Scholar 

  11. Cristianini, N.: Scientific method and patterns in data. In: Samalam, V. (ed.) Procs. of the 5th UK BCS Knowledge Discovery and Data Mining Symposium, University of Salford (2009)

    Google Scholar 

  12. Cristianini, N.: Are we there yet? Neural Networks (2010)

    Google Scholar 

  13. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: Gate: A framework and graphical development environment for robust nlp tools and applications. In: Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, Philadelphia, USA, pp. 168–175 (2002)

    Google Scholar 

  14. Greenbaum, D., Luscombe, N.M., Janson, R., et al.: Interrelating different types of genomic data, from proteome to secretome: ’oming in on function. Genome Research

    Google Scholar 

  15. Editorial: Defining the scientific method. Nature Methods 6, 237 (2009)

    Google Scholar 

  16. Esuli, A., Sebastiani, F.: Sentiwordnet: A publicly available lexical resource for opinion mining. In: Proceedings of LREC, pp. 417–422 (2006)

    Google Scholar 

  17. Flaounas, I., Ali, O., Turchi, M., Snowsill, T., Nicart, F., Bie, T.D., Cristianini, N.: Noam: News outlets analysis and monitoring system. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM, New York (2011)

    Google Scholar 

  18. Flaounas, I., Turchi, M., Ali, O., Fyson, N., Bie, T.D., Mosdell, N., Lewis, J., Cristianini, N.: The structure of eu mediasphere. PLoS ONE  e14243 (2010)

    Google Scholar 

  19. Flaounas, I., Ali, O., Bie, T.D., Mosdell, N., Lewis, J., Cristianini, N.: Massive-scale automated analysis of news-content: Topics, style and gender (2011) (submitted for publication)

    Google Scholar 

  20. Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32(3), 221–233 (1948)

    Article  Google Scholar 

  21. González, M., Barabási, A.L.: Complex networks: From data to models. Nature Physics 3, 224–225 (2007)

    Article  Google Scholar 

  22. Grivell, L.: Mining the bibliome: searching for a needle in a haystack? EMBO Reports 3(3), 200–203 (2002), http://www.nature.com/embor/journal/v3/n3/full/embor199.html

    Article  Google Scholar 

  23. Janes, K., Yaffe, M.: Data-driven modelling of signal-transduction networks. Nature Reviews Molecular Cell Biology 7, 820–828 (2006)

    Article  Google Scholar 

  24. Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: Machine Translation Summit X, pp. 79–86 (2005)

    Google Scholar 

  25. Koehn, P., Hoang, H., et al.: Moses: Open source toolkit for statistical machine translation. In: Annual Meeting-Association for Computational Linguistics ACL 2007, demonstration session, vol. 45 (2007)

    Google Scholar 

  26. Lampos, V., De Bie, T., Cristianini, N.: Flu detector-tracking epidemics on twitter. In: Machine Learning and Knowledge Discovery in Databases, pp. 599–602 (2010)

    Google Scholar 

  27. Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., et al.: Computational Social Science. Science 323(5915), 721–723 (2009)

    Article  Google Scholar 

  28. Lewis, D., Yang, Y., Rose, T., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)

    Google Scholar 

  29. Lloyd, L., Kechagias, D., Skiena, S.: Lydia: A system for large-scale news analysis. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 161–166. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  30. Michel, J., Shen, Y., Aiden, A., Veres, A., Gray, M., Pickett, J., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., et al.: Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331(6014), 176 (2011)

    Article  Google Scholar 

  31. Potthast, T.: Paradigm shifts versus fashion shifts? EMBO Reports 10, S42–S45 (2009)

    Article  Google Scholar 

  32. Sandhaus, E.: The new york times annotated corpus. In: Linguistic Data Consortium, Philadelphia (2008)

    Google Scholar 

  33. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  34. Snowsill, T., Flaounas, I., Bie, T.D., Cristianini, N.: Detecting events in a million new york times articles. In: Balcátar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) (ECML/PKDD 2010). LNCS (LNAI), vol. 6321, pp. 615–618. Springer, Heidelberg (2010)

    Google Scholar 

  35. Snowsill, T., Nicart, F., Stefani, M., de Bie, T., Cristianini, N.: Finding surprising patterns in textual data streams. In: 2nd International Workshop on Cognitive Information Processing, pp. 405–410 (2010)

    Google Scholar 

  36. Steinberger, R., Pouliquen, B., der Goot, E.V.: An Introduction to the Europe Media Monitor Family of Applications. In: Information Access in a Multilingual World-Proceedings of the SIGIR 2009 Workshop (SIGIR-CLIR 2009), pp. 1–8 (2009)

    Google Scholar 

  37. Turchi, M., Flaounas, I., Ali, O., De Bie, T., Snowsill, T., Cristianini, N.: Found in translation. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 746–749. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  38. Watts, D.: A twenty-first century science. Nature 445(7127), 489 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cristianini, N. (2011). Automatic Discovery of Patterns in Media Content. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21458-5_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21457-8

  • Online ISBN: 978-3-642-21458-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics