Skip to main content

Indirect Supervised Learning of Content Selection Logic

  • Conference paper
  • 469 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3123))

Abstract

I investigate the automatic acquisition of Content Selection (CS) rules; a desirable goal, as the CS problem is quite domain dependent. My learning uses a loosely aligned Text-Data corpus, a resource increasingly popular in learning for NLG because they are readily available and do not require expensive hand labelling. However, they only provide indirect information about the selected or not selected status of each semantic datum. Indirect Supervised Learning is my proposed solution to this problem, a solution common to other learning from loosely aligned Text-Data corpora problems in NLG. It has two steps; in the first step, the loosely aligned Text-Data corpus is transformed into a data set with classification labels. In the second step, supervised learning machinery acquires the CS rules from this data set. I evaluate the approach by comparing the output of my system with the information selected by human authors in unseen texts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cox, R., O’Donnell, M., Oberlander, J.: Dynamic versus static hypermedia in museum education: an evaluation of ILEX, the intelligent labelling explorer. In: Proc. of AI-ED 1999 (1999)

    Google Scholar 

  2. Duboue, P.A., McKeown, K.R.: Statistical acquisition of content selection rules for natural language generation. In: Proc. EMNLP, Sapporo, Japan (2003)

    Google Scholar 

  3. Kim, S., Alani, H., Hall, W., Lewis, P., Millard, D., Shadbolt, N., Weal, M.: Artequakt: Generating tailored biographies with automatically annotated fragments from the web. In: Proc. of the Semantic Authoring, Annotation and Knowledge Markup Workshop in the 15th European Conf. on Artificial Intelligence (2002)

    Google Scholar 

  4. Schiffman, B., Mani, I., Conception, K.: Producing biographical summaries: Combining linguistic knowledge with corpus statistics. In: Proc. of ACL-EACL (2001)

    Google Scholar 

  5. Radev, D., McKeown, K.R.: Building a generation knowledge source using internet-accessible newswire. In: Proc. of the 5th ANLP (1997)

    Google Scholar 

  6. Teich, E., Bateman, J.A.: Towards an application of text generation in an integrated publication system. In: Proc. of 7th IWNLG (1994)

    Google Scholar 

  7. Duboue, P.A., McKeown, K.R.: ProGenIE: Biographical descriptions for intelligence analysis. In: Proc. 1st Symp. on Intelligence and Security Informatics, Tucson, AZ, Springer, Heidelberg (2003)

    Google Scholar 

  8. Knott, A., O’Donnell, M., Oberlander, J., Mellish, C.: Defeasible rules in content selection and text structuring. In: Proc. of EWNLG, Duisburg, Germany (1997)

    Google Scholar 

  9. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  10. Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explor. Newsl. 5, 59–68 (2003)

    Article  Google Scholar 

  11. Sripada, S.G., Reiter, E., Hunter, J., Yu, J.: A two-stage model for content determination. In: ACL-EWNLG 2001, Toulouse, France, pp. 3–10 (2001)

    Google Scholar 

  12. Bontcheva, K., Wilks, Y.: Dealing with dependencies between content planning and surface realisation in a pipeline generation architecture. In: Proc. IJCAI (2001)

    Google Scholar 

  13. Reiter, E., Robertson, R., Osman, L.: Knowledge acquisition for natural language generation. In: Proc. of INLG 2000 (2000)

    Google Scholar 

  14. Lester, J., Porter, B.: Developing and empirically evaluating robust explanation generators: The knight experiments. Comp. Ling. (1997)

    Google Scholar 

  15. Duboue, P.A., McKeown, K.R.: Statistical acquisition of content selection rules for natural language generation. Technical report, Columbia Univ. CS Dept. (2003)

    Google Scholar 

  16. Nenkova, A., Passoneau, R.: Evaluating content selection in summarization: The pyramid method. In: Proc. of HLT-NAACL, Boston, MA (2004)

    Google Scholar 

  17. Duboue, P.A., McKeown, K.R.: Content planner construction via evolutionary algorithms and a corpus-based fitness function. In: Proc. of INLG (2002)

    Google Scholar 

  18. Barzilay, R., Lee, L.: Bootstrapping lexical choice via multiple-sequence alignment. In: EMNLP 2002, Philadelphia, PA (2002)

    Google Scholar 

  19. Sripada, S., Reiter, E., Hunter, J., Yu, J.: Exploiting a parallel text-data corpus. In: Proceedings of Corpus Linguistics 2003 (2003)

    Google Scholar 

  20. Workshop on Learning Word Meaning from Non-Linguistic Data. In: Barzilay, R., Reiter, E., Siskind, J.M. (eds.) HLTNAACL03, Edmonton, Canada ACL (2003)

    Google Scholar 

  21. Kashima, H., Inokuchi, A.: Kernels for graph classification. In: Proc. of Int.Workshop on Active Mining, pp. 31–35 (2002)

    Google Scholar 

  22. Niu, C., Li, W., Ding, Jihong Srihari, R.K.: Bootstrapping for named entity tagging using concept-based seeds. In: Proc. of HLT-NAACL, Edmonton, Canada (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Duboue, P.A. (2004). Indirect Supervised Learning of Content Selection Logic. In: Belz, A., Evans, R., Piwek, P. (eds) Natural Language Generation. INLG 2004. Lecture Notes in Computer Science(), vol 3123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27823-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27823-8_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22340-5

  • Online ISBN: 978-3-540-27823-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics