skip to main content
10.1145/2740908.2742009acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

The Computable News project: Research in the Newsroom

Published:18 May 2015Publication History

ABSTRACT

We report on a four year academic research project to build a natural language processing platform in support of a large media company. The Computable News platform processes news stories, producing a layer of structured data that can be used to build rich applications. We describe the underlying platform and the research tasks that we explored building it. The platform supports a wide range of prototype applications designed to support different newsroom functions. We hope that this qualitative review provides some insight into the challenges involved in this type of project.

References

  1. T. Dawborn and J. R. Curran. docrep: A lightweight and efficient document representation framework. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 762--771, Dublin, Ireland, August 2014. Dublin City University and Association for Computational Linguistics.Google ScholarGoogle Scholar
  2. B. Hachey, J. Nothman, and W. Radford. Cheap and easy entity evaluation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 464--469, Baltimore, Maryland, June 2014.Google ScholarGoogle ScholarCross RefCross Ref
  3. B. Hachey, W. Radford, J. Nothman, M. Honnibal, and J. R. Curran. Evaluating entity linking with Wikipedia. Artificial Intelligence, 194:130--150, January 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Nothman. Grounding event references in news. PhD thesis, School of Information Technologies, University of Sydney, Sydney, Australia, 2014.Google ScholarGoogle Scholar
  5. J. Nothman, T. Dawborn, and J. R. Curran. Command-line utilities for managing and exploring annotated corpora. In Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies, Dublin, Ireland, August 2014.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Nothman, M. Honnibal, B. Hachey, and J. R. Curran. Event linking: grounding event reference in a news archive. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 228--232, Jeju, Korea, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. O'Keefe. Extracting and Attributing Quotes in Text and Assessing them as Opinions. PhD thesis, School of Information Technologies, University of Sydney, Sydney, Australia, 2014.Google ScholarGoogle Scholar
  8. T. O'Keefe, J. R. Curran, P. Ashwell, and I. Koprinska. An annotated corpus of quoted opinions in news articles. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 516--520, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.Google ScholarGoogle Scholar
  9. T. O'Keefe, S. Pareti, J. R. Curran, I. Koprinska, and M. Honnibal. A sequence labelling approach to quote attribution. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 790--799, Jeju, Korea, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Pink, W. Radford, W. Cannings, A. Naoum, J. Nothman, D. Tse, and J. R. Curran. SYDNEY-CMCRC at TAC 2013. In Proceedings of the Text Analysis Conference, Gaithersburg, MD USA, November 2013. National Institute of Standards and Technology.Google ScholarGoogle Scholar
  11. W. Radford. Linking Named Entities to Wikipedia. PhD thesis, School of Information Technologies, University of Sydney, Sydney, Australia, 2015.Google ScholarGoogle Scholar
  12. W. Radford, W. Cannings, A. Naoum, J. Nothman, G. Pink, D. Tse, and J. R. Curran. (Almost) Total Recall -- SYDNEY-CMCRC at TAC 2012. In Proceedings of the Text Analysis Conference, Gaithersburg, MD USA, November 2012. National Institute of Standards and Technology.Google ScholarGoogle Scholar
  13. W. Radford and J. R. Curran. Joint apposition extraction with syntactic and semantic constraints. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 671--677, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.Google ScholarGoogle Scholar
  14. W. Radford, B. Hachey, M. Honnibal, J. Nothman, and J. R. Curran. Naive but effective NIL clustering baselines -- CMCRC at TAC 2011. In Proceedings of the Text Analysis Conference, Gaithersburg, MD USA, November 2011. National Institute of Standards and Technology.Google ScholarGoogle Scholar
  15. W. Radford, B. Hachey, J. Nothman, M. Honnibal, and J. R. Curran. Document-level entity linking: CMCRC at TAC 2010. In Proceedings of the Text Analysis Conference, Gaithersburg, MD USA, November 2010. National Institute of Standards and Technology.Google ScholarGoogle Scholar

Index Terms

  1. The Computable News project: Research in the Newsroom

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web
      May 2015
      1602 pages
      ISBN:9781450334730
      DOI:10.1145/2740908

      Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 May 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader