Skip to main content
Log in

A learning-to-rank method for information updating task

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Our paper addresses the information updating task which is to determine the most appropriate location in an existing document to place a new piece of related information. We propose a new learning-to-rank method for the information updating task. The updating task is formalized as a learning-to-rank problem, and in training, a heuristic method of automatically assigning labels for training examples is proposed to exploit structural information of documents. With the proposed formulation, state-of-the-art learning-to-rank algorithms can be applied to the task. We deal with the problem of the lack of semantic information by incorporating semantic features derived from word clusters to further improve the performance of information updating. The proposed method is applied in updating Wikipedia biographical articles and Legal documents. Experimental results achieved on both Wikipedia biographical data set and Legal data set showed that our proposed learning-to-rank method with cluster-based features outperforms previously reported methods for information updating task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://stats.wikimedia.org/EN/TablesWikipediaEN.htm.

  2. Data and Code used in experiments are available to download on: www.jaist.ac.jp/~s1020010/update_task.tar.gz.

References

  1. Baker LD, McCallum AK (1998) Distributional clustering of words for text classification. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp 96–103

    Chapter  Google Scholar 

  2. Bekkerman R, El-Yaniv R, Tishby N, Winter Y (2003) Distributional word clusters vs. words for text categorization. J Mach Learn Res 3:1183–1208

    MATH  Google Scholar 

  3. Brown PF, Della Pietra VJ, deSouza PV, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479

    Google Scholar 

  4. Burges C, Shaked T, Renshaw E, Lazier A, Deed M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on machine learning (ICML 2005), pp 89–96

    Chapter  Google Scholar 

  5. Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: From pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning (ICML 2007), pp 129–136

    Chapter  Google Scholar 

  6. Caruana R, Baluja S, Mitchell T (1996) Using the future to “sort out” the present: rankprop and multitask learning for medical risk prediction. In: Advances in neural information processing systems 8 (proceedings of NIPS-95), pp 959–965

    Google Scholar 

  7. Chen E, Snyder B, Barzilay R (2007) Incremental text structuring with online hierarchical ranking. In: Proceedings of the 2007 conference on empirical methods in natural language processing (EMNLP 2007), pp 83–91

    Google Scholar 

  8. Chen E (2008) Discourse models for collaboratively edited corpora. Master’s thesis, Massachusetts Institute of Technology

  9. Crammer K, Singer Y (2002) Pranking with ranking. In: Advances on neural information processing systems 14, vol 14, pp 641–647

    Google Scholar 

  10. Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296

    Article  MATH  Google Scholar 

  11. Gonzalo J, Verdejo F, Chugur I, Cigarran J (1998) Indexing with Wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL98 workshop on usage of WordNet for NLP

    Google Scholar 

  12. Harrington EF (2003) Online ranking/collaborative filtering using the perceptron algorithm. In: Proceedings of the 20th international conference on machine learning (ICML 2003), pp 250–257

    Google Scholar 

  13. Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. In: Advances in large margin classifiers. MIT Press, Cambridge, pp 115–132

    Google Scholar 

  14. Jiang D, Hu Y, Li H (2009) A ranking approach to keyphrase extraction. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval (SIGIR 2009), pp 756–757

    Chapter  Google Scholar 

  15. Jurafsky D, Martin JH (2008) Speech and language processing. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  16. Katayama T (2007) Legal engineering—an engineering approach to laws in e-Society age. In: Proceedings of the 1st intl workshop on JURISIN

    Google Scholar 

  17. Koo T, Carreras X, Collins M (2008) Simple semi-supervised dependency parsing. In: Proceedings of the 46th annual meeting of the association for computational linguistics (ACL 2008), pp 595–603

    Google Scholar 

  18. Li H (2009) Learning to rank. Tutorial given at ACL-IJCNLP, August. Retrieved from http://research.microsoft.com/en-us/people/hangli/li-acl-ijcnlp-2009-tutorial.pdf

  19. Li W, McCallum A (2005) Semi-supervised sequence modelling with syntactic topic models. In: Proceedings of twentieth national conference on artificial intelligence, pp 813–818

    Google Scholar 

  20. Liang P (2005) Semi-supervised learning for natural language. Master’s thesis, Massachusetts Institute of Technology

  21. Mihalcea R, Moldovan D (2000) Semantic indexing using WordNet senses. In: Proceedings of the ACL-2000 workshop on recent advances in natural language processing and information retrieval: held in conjunction with the 38th annual meeting of the association for computational linguistics, October 08, 2000, Hong Kong

    Google Scholar 

  22. Miller S, Guinness J, Zamanian A (2004) Name tagging with word clusters and discriminative training. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics (HLT/NAACL 2004), pp 337–342

    Google Scholar 

  23. Ogawa Y, Inagaki S, Toyama K (2008) Automatic consolidation of Japanese statutes based on formalization of amendment sentences. In: Satoh K, Inokuchi A, Nagao K, Kawamura T (eds) JSAI 2007. LNCS, vol 4914. Springer, Heidelberg, pp 349–362

    Google Scholar 

  24. Pham MQN, Nguyen ML, Shimazu A (2009) Incremental text structuring with word clusters. In: Proceedings of the conference of the pacific association for computational linguistics, 2009, Hokkaido, Japan, pp 109–114

    Google Scholar 

  25. Pham MQN, Nguyen ML, Shimazu A (2010) Update legal documents using hierarchical ranking models and word clustering. In: Proceedings of the 23rd international conference on legal knowledge and information systems, Liverpool, UK

    Google Scholar 

  26. Qiu Y, Frei H (1993) Concept based query expansion. In: Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval, pp 160–169

    Chapter  Google Scholar 

  27. Xia F, Liu T-Y, Wang J, Wang J, Zhang W, Li H (2008) Listwise approach to learning to rank: theory and algorithm. In: Proceedings of the 25th international conference on machine learning (ICML 2008), pp 1192–1199

    Chapter  Google Scholar 

  28. Preliminary Recommendations on Semantic Encoding Interim Report (1998) Retrieved January 2010 from the Website of Expert Advisory Group on Language Engineering Standards (2012). http://www.ilc.cnr.it/EAGLES96/rep2/node37.html

  29. e-Government (2012) Retrieved from the Wikipedia. http://en.wikipedia.org/wiki/E_government

  30. United States Code (2012) Retrieved from the Website of the US Government Printing Office. http://www.gpoaccess.gov/uscode/about.html

  31. Website of Office of the Law Revision Counsel (2012) The United States Code. Retrieved from http://uscode.house.gov/lawrevisioncounsel.shtml

Download references

Acknowledgements

This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Young Scientific Research 22700139.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minh Quang Nhat Pham.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pham, M.Q.N., Nguyen, M.L., Ngo, B.X. et al. A learning-to-rank method for information updating task. Appl Intell 37, 499–510 (2012). https://doi.org/10.1007/s10489-012-0343-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-012-0343-2

Keywords

Navigation