Abstract
Our paper addresses the information updating task which is to determine the most appropriate location in an existing document to place a new piece of related information. We propose a new learning-to-rank method for the information updating task. The updating task is formalized as a learning-to-rank problem, and in training, a heuristic method of automatically assigning labels for training examples is proposed to exploit structural information of documents. With the proposed formulation, state-of-the-art learning-to-rank algorithms can be applied to the task. We deal with the problem of the lack of semantic information by incorporating semantic features derived from word clusters to further improve the performance of information updating. The proposed method is applied in updating Wikipedia biographical articles and Legal documents. Experimental results achieved on both Wikipedia biographical data set and Legal data set showed that our proposed learning-to-rank method with cluster-based features outperforms previously reported methods for information updating task.
Similar content being viewed by others
Notes
Data and Code used in experiments are available to download on: www.jaist.ac.jp/~s1020010/update_task.tar.gz.
References
Baker LD, McCallum AK (1998) Distributional clustering of words for text classification. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp 96–103
Bekkerman R, El-Yaniv R, Tishby N, Winter Y (2003) Distributional word clusters vs. words for text categorization. J Mach Learn Res 3:1183–1208
Brown PF, Della Pietra VJ, deSouza PV, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479
Burges C, Shaked T, Renshaw E, Lazier A, Deed M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on machine learning (ICML 2005), pp 89–96
Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: From pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning (ICML 2007), pp 129–136
Caruana R, Baluja S, Mitchell T (1996) Using the future to “sort out” the present: rankprop and multitask learning for medical risk prediction. In: Advances in neural information processing systems 8 (proceedings of NIPS-95), pp 959–965
Chen E, Snyder B, Barzilay R (2007) Incremental text structuring with online hierarchical ranking. In: Proceedings of the 2007 conference on empirical methods in natural language processing (EMNLP 2007), pp 83–91
Chen E (2008) Discourse models for collaboratively edited corpora. Master’s thesis, Massachusetts Institute of Technology
Crammer K, Singer Y (2002) Pranking with ranking. In: Advances on neural information processing systems 14, vol 14, pp 641–647
Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296
Gonzalo J, Verdejo F, Chugur I, Cigarran J (1998) Indexing with Wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL98 workshop on usage of WordNet for NLP
Harrington EF (2003) Online ranking/collaborative filtering using the perceptron algorithm. In: Proceedings of the 20th international conference on machine learning (ICML 2003), pp 250–257
Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. In: Advances in large margin classifiers. MIT Press, Cambridge, pp 115–132
Jiang D, Hu Y, Li H (2009) A ranking approach to keyphrase extraction. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval (SIGIR 2009), pp 756–757
Jurafsky D, Martin JH (2008) Speech and language processing. Prentice-Hall, Englewood Cliffs
Katayama T (2007) Legal engineering—an engineering approach to laws in e-Society age. In: Proceedings of the 1st intl workshop on JURISIN
Koo T, Carreras X, Collins M (2008) Simple semi-supervised dependency parsing. In: Proceedings of the 46th annual meeting of the association for computational linguistics (ACL 2008), pp 595–603
Li H (2009) Learning to rank. Tutorial given at ACL-IJCNLP, August. Retrieved from http://research.microsoft.com/en-us/people/hangli/li-acl-ijcnlp-2009-tutorial.pdf
Li W, McCallum A (2005) Semi-supervised sequence modelling with syntactic topic models. In: Proceedings of twentieth national conference on artificial intelligence, pp 813–818
Liang P (2005) Semi-supervised learning for natural language. Master’s thesis, Massachusetts Institute of Technology
Mihalcea R, Moldovan D (2000) Semantic indexing using WordNet senses. In: Proceedings of the ACL-2000 workshop on recent advances in natural language processing and information retrieval: held in conjunction with the 38th annual meeting of the association for computational linguistics, October 08, 2000, Hong Kong
Miller S, Guinness J, Zamanian A (2004) Name tagging with word clusters and discriminative training. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics (HLT/NAACL 2004), pp 337–342
Ogawa Y, Inagaki S, Toyama K (2008) Automatic consolidation of Japanese statutes based on formalization of amendment sentences. In: Satoh K, Inokuchi A, Nagao K, Kawamura T (eds) JSAI 2007. LNCS, vol 4914. Springer, Heidelberg, pp 349–362
Pham MQN, Nguyen ML, Shimazu A (2009) Incremental text structuring with word clusters. In: Proceedings of the conference of the pacific association for computational linguistics, 2009, Hokkaido, Japan, pp 109–114
Pham MQN, Nguyen ML, Shimazu A (2010) Update legal documents using hierarchical ranking models and word clustering. In: Proceedings of the 23rd international conference on legal knowledge and information systems, Liverpool, UK
Qiu Y, Frei H (1993) Concept based query expansion. In: Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval, pp 160–169
Xia F, Liu T-Y, Wang J, Wang J, Zhang W, Li H (2008) Listwise approach to learning to rank: theory and algorithm. In: Proceedings of the 25th international conference on machine learning (ICML 2008), pp 1192–1199
Preliminary Recommendations on Semantic Encoding Interim Report (1998) Retrieved January 2010 from the Website of Expert Advisory Group on Language Engineering Standards (2012). http://www.ilc.cnr.it/EAGLES96/rep2/node37.html
e-Government (2012) Retrieved from the Wikipedia. http://en.wikipedia.org/wiki/E_government
United States Code (2012) Retrieved from the Website of the US Government Printing Office. http://www.gpoaccess.gov/uscode/about.html
Website of Office of the Law Revision Counsel (2012) The United States Code. Retrieved from http://uscode.house.gov/lawrevisioncounsel.shtml
Acknowledgements
This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Young Scientific Research 22700139.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pham, M.Q.N., Nguyen, M.L., Ngo, B.X. et al. A learning-to-rank method for information updating task. Appl Intell 37, 499–510 (2012). https://doi.org/10.1007/s10489-012-0343-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-012-0343-2