Skip to main content

Applications of Reinforcement Learning to Structured Prediction

  • Conference paper
Recent Advances in Reinforcement Learning (EWRL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5323))

Included in the following conference series:

  • 1098 Accesses

Abstract

Supervised learning is about learning functions given a set of input and corresponding output examples. A recent trend in this field is to consider structured outputs such as sequences, trees or graphs. When predicting such structured data, learning models have to select solutions within very large discrete spaces. The combinatorial nature of this problem has recently led to learning models integrating a search component.

In this paper, we show that Structured Prediction (SP) can be seen as a sequential decision problem. We introduce SP-MDP: a Markov Decision Process based formulation of Structured Prediction. Learning the optimal policy in SP-MDP is shown to be equivalent as solving the SP problem. This allows us to apply classical Reinforcement Learning (RL) algorithms to SP. We present experiments on two tasks. The first, sequence labeling, has been extensively studied and allows us to compare the RL approach with traditional SP methods. The second, tree transformation, is a challenging SP task with numerous large-scale real-world applications. We show successful results with general RL algorithms on this task on which traditional SP models fail.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Collins, M.: Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In: EMNLP (2002)

    Google Scholar 

  2. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001 (2001)

    Google Scholar 

  3. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML 2004 (2004)

    Google Scholar 

  4. Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: NIPS (2003)

    Google Scholar 

  5. Collins, M., Roark, B.: Incremental parsing with the perceptron algorithm. In: ACL 2004, Barcelona, Spain, pp. 111–118 (July, 2004)

    Google Scholar 

  6. Daumé III, H., Marcu, D.: Learning as search optimization: Approximate large margin methods for structured prediction. In: ICML, Bonn, Germany (2005)

    Google Scholar 

  7. Daumé III, H., Langford, J., Marcu, D.: Search-based structured prediction (2006)

    Google Scholar 

  8. Sutton, R., Barto, A.: Reinforcement learning: an introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  9. Berger, A., Della Pietra, S., Della Pietra, V.: A maximum entropy approach to natural language processing. In: Computational Linguistics (1996)

    Google Scholar 

  10. Maes, F., Denoyer, L., Gallinari, P.: Sequence labelling with reinforcement learning and ranking algorithms. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 648–657. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Yarovsky, D., Church, K. (eds.) Proceedings of the Third Workshop on Very Large Corpora, Somerset, New Jersey, ACL, pp. 82–94 (1995)

    Google Scholar 

  12. Kassel, R.H.: A comparison of approaches to on-line handwritten character recognition. Ph.D thesis, Cambridge, MA, USA (1995)

    Google Scholar 

  13. Maes, F., Denoyer, L., Gallinari, P.: Xml structure mapping application to the pascal/inex 2006 xml document mining track. In: INEX, Dagstuhl, Germany (2007)

    Google Scholar 

  14. Maes, F., Denoyer, L., Gallinari, P.: Apprentissage de conversions de documents semi-structures a partir d’exemples. In: CORIA, Tregastel, France (2008)

    Google Scholar 

  15. Denoyer, L., Gallinari, P.: Report on the xml mining track at inex 2005 and inex 2006. SIGIR Forum, 79–90 (2007)

    Google Scholar 

  16. Denoyer, L., Gallinari, P.: The wikipedia xml corpus. SIGIR Forum (2006)

    Google Scholar 

  17. Chidlovskii, B., Fuselier, J.: A probabilistic learning method for xml annotation of documents. In: IJCAI (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maes, F., Denoyer, L., Gallinari, P. (2008). Applications of Reinforcement Learning to Structured Prediction. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89722-4_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89721-7

  • Online ISBN: 978-3-540-89722-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics