skip to main content
10.1145/2740908.2742009acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

The Computable News project: Research in the Newsroom

Published: 18 May 2015 Publication History

Abstract

We report on a four year academic research project to build a natural language processing platform in support of a large media company. The Computable News platform processes news stories, producing a layer of structured data that can be used to build rich applications. We describe the underlying platform and the research tasks that we explored building it. The platform supports a wide range of prototype applications designed to support different newsroom functions. We hope that this qualitative review provides some insight into the challenges involved in this type of project.

References

[1]
T. Dawborn and J. R. Curran. docrep: A lightweight and efficient document representation framework. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 762--771, Dublin, Ireland, August 2014. Dublin City University and Association for Computational Linguistics.
[2]
B. Hachey, J. Nothman, and W. Radford. Cheap and easy entity evaluation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 464--469, Baltimore, Maryland, June 2014.
[3]
B. Hachey, W. Radford, J. Nothman, M. Honnibal, and J. R. Curran. Evaluating entity linking with Wikipedia. Artificial Intelligence, 194:130--150, January 2013.
[4]
J. Nothman. Grounding event references in news. PhD thesis, School of Information Technologies, University of Sydney, Sydney, Australia, 2014.
[5]
J. Nothman, T. Dawborn, and J. R. Curran. Command-line utilities for managing and exploring annotated corpora. In Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies, Dublin, Ireland, August 2014.
[6]
J. Nothman, M. Honnibal, B. Hachey, and J. R. Curran. Event linking: grounding event reference in a news archive. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 228--232, Jeju, Korea, July 2012.
[7]
T. O'Keefe. Extracting and Attributing Quotes in Text and Assessing them as Opinions. PhD thesis, School of Information Technologies, University of Sydney, Sydney, Australia, 2014.
[8]
T. O'Keefe, J. R. Curran, P. Ashwell, and I. Koprinska. An annotated corpus of quoted opinions in news articles. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 516--520, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.
[9]
T. O'Keefe, S. Pareti, J. R. Curran, I. Koprinska, and M. Honnibal. A sequence labelling approach to quote attribution. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 790--799, Jeju, Korea, July 2012.
[10]
G. Pink, W. Radford, W. Cannings, A. Naoum, J. Nothman, D. Tse, and J. R. Curran. SYDNEY-CMCRC at TAC 2013. In Proceedings of the Text Analysis Conference, Gaithersburg, MD USA, November 2013. National Institute of Standards and Technology.
[11]
W. Radford. Linking Named Entities to Wikipedia. PhD thesis, School of Information Technologies, University of Sydney, Sydney, Australia, 2015.
[12]
W. Radford, W. Cannings, A. Naoum, J. Nothman, G. Pink, D. Tse, and J. R. Curran. (Almost) Total Recall -- SYDNEY-CMCRC at TAC 2012. In Proceedings of the Text Analysis Conference, Gaithersburg, MD USA, November 2012. National Institute of Standards and Technology.
[13]
W. Radford and J. R. Curran. Joint apposition extraction with syntactic and semantic constraints. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 671--677, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.
[14]
W. Radford, B. Hachey, M. Honnibal, J. Nothman, and J. R. Curran. Naive but effective NIL clustering baselines -- CMCRC at TAC 2011. In Proceedings of the Text Analysis Conference, Gaithersburg, MD USA, November 2011. National Institute of Standards and Technology.
[15]
W. Radford, B. Hachey, J. Nothman, M. Honnibal, and J. R. Curran. Document-level entity linking: CMCRC at TAC 2010. In Proceedings of the Text Analysis Conference, Gaithersburg, MD USA, November 2010. National Institute of Standards and Technology.

Cited By

View all
  • (2024)Extraction and attribution of public figures statements for journalism in Indonesia using deep learningKnowledge-Based Systems10.1016/j.knosys.2024.111558289:COnline publication date: 8-Apr-2024
  • (2022)PFSA-ID: an annotated Indonesian corpus and baseline model of public figures statements attributionsGlobal Knowledge, Memory and Communication10.1108/GKMC-04-2022-009173:6/7(853-870)Online publication date: 8-Nov-2022
  • (2020)Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in IndonesiaGlobal Knowledge, Memory and Communication10.1108/GKMC-07-2020-0098ahead-of-print:ahead-of-printOnline publication date: 2-Dec-2020
  • Show More Cited By

Index Terms

  1. The Computable News project: Research in the Newsroom

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web
    May 2015
    1602 pages
    ISBN:9781450334730
    DOI:10.1145/2740908

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. event linking
    2. named entity linking
    3. online news applications
    4. quotation extraction and attribution

    Qualifiers

    • Research-article

    Funding Sources

    • Australian Research Council Discovery
    • Capital Markets Cooperative Research Centre

    Conference

    WWW '15
    Sponsor:
    • IW3C2

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Extraction and attribution of public figures statements for journalism in Indonesia using deep learningKnowledge-Based Systems10.1016/j.knosys.2024.111558289:COnline publication date: 8-Apr-2024
    • (2022)PFSA-ID: an annotated Indonesian corpus and baseline model of public figures statements attributionsGlobal Knowledge, Memory and Communication10.1108/GKMC-04-2022-009173:6/7(853-870)Online publication date: 8-Nov-2022
    • (2020)Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in IndonesiaGlobal Knowledge, Memory and Communication10.1108/GKMC-07-2020-0098ahead-of-print:ahead-of-printOnline publication date: 2-Dec-2020
    • (2016)An Interface for Assisted Curation of Knowledge Bases from Unstructured TextProceedings of the 2016 49th Hawaii International Conference on System Sciences (HICSS)10.1109/HICSS.2016.545(4386-4393)Online publication date: 5-Jan-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media