A Supervised Learning Approach to Search of Definitions

Xu, Jun; Cao, Yun-Bo; Li, Hang; Zhao, Min; Huang, Ya-Lou

doi:10.1007/s11390-006-0439-4

A Supervised Learning Approach to Search of Definitions

Artificial Intelligence
Published: May 2006

Volume 21, pages 439–449, (2006)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Jun Xu¹,
Yun-Bo Cao²,
Hang Li²,
Min Zhao³ &
…
Ya-Lou Huang¹

37 Accesses
1 Citation
Explore all metrics

Abstract

This paper addresses the issue of search of definitions. Specifically, for a given term, we are to find out its definition candidates and rank the candidates according to their likelihood of being good definitions. This is in contrast to the traditional methods of either generating a single combined definition or outputting all retrieved definitions. Definition ranking is essential for tasks. A specification for judging the goodness of a definition is given. In the specification, a definition is categorized into one of the three levels: good definition, indifferent definition, or bad definition. Methods of performing definition ranking are also proposed in this paper, which formalize the problem as either classification or ordinal regression. We employ SVM (Support Vector Machines) as the classification model and Ranking SVM as the ordinal regression model respectively, and thus they rank definition candidates according to their likelihood of being good definitions. Features for constructing the SVM and Ranking SVM models are defined, which represent the characteristics of terms, definition candidate, and their relationship. Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods such as heuristic rules, the conventional information retrieval—Okapi, or SVM regression. This is true when both the answers are paragraphs and they are sentences. Experimental results also show that SVM or Ranking SVM models trained in one domain can be adapted to another domain, indicating that generic models for definition ranking can be constructed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LLM-Based SPARQL Generation with Selected Schema from Large Scale Knowledge Base

Dataset search: a survey

Article Open access 24 August 2019

A brief survey on recent advances in coreference resolution

Article 26 May 2023

References

Salton G, McGill M. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
Voorhees E. Evaluating answers to definition questions. In Proc. Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, Edmonton, Canada, 2003, pp.109–111.
Voorhees E. Overview of the TREC 2003 question answering track. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.54–68.
Blair-Goldensohn S, McKeown K R, Schlaikjer A H. A hybrid approach for QA track definitional questions. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.185–192.
Harabagiu S, Moldovan D et al. Answer mining by combining extraction techniques with abductive reasoning. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.375–382.
Xu J, Licuanan A, Weischedel R. TREC 2003 QA at BBN: Answering definitional questions. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.98–106.
Echihabi A, Hermjakob U, Hovy E et al. Multiple-engine question answering in TextMap. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.772–781.
Yang H, Cui H, Kan M Y et al. QUALIFIER in TREC-12 QA main task. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.480–488.
Klavans J, Muresan S. DEFINDER: Rule-based methods for the extraction of medical terminology and their associated definitions from on-line text. In Proc. American Medical Informatics Association Symposium, Los Angeles, CA, USA, 2000, pp.201–202.
Liu B, Chin C, Ng H. Mining topic-specific concepts and definitions on the web. In Proc. 12th Int. Conf. World Wide Web, Budapest, Hungary, May 20–24, 2003, pp.251–260.
Agichtein E, Lawrence S, Gravano L. Learning search engine specific query transformations for question answering. In Proc. 10th Int. Conf. World Wide Web, Hong Kong, May 1–5, 2001, pp.169–178.
Blair-Goldensohn S, McKeown K R, Schlaikjer A H. Answering Definitional Questions: A Hybrid Approach (Chapter 5). New Directions in Question Answering, Maybury M (ed.), AAAI Press, 2004.
Blair-Goldensohn S, McKeown K R, Schlaikjer A H. DefScriber: A hybrid system for definitional QA. In Proc. 26th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Toronto, Canada, July 28–August 1, 2003, pp.462–462.
Cui H, Kan M, Chua T. Unsupervised learning of soft patterns for definitional question answering. In Proc. 13th Int. Conf. World Wide Web, New York, USA, May 17–20, 2004, pp.90–99.
Kwok K L. Experiments with a component theory of probabilistic information retrieval based on single terms as document components. In ACM Trans. Information Systems (TOIS), 1990, 8(4): 363–386.
Article MathSciNet Google Scholar
Kwok K L. A network approach to probabilistic information retrieval. In ACM Trans. Information Systems (TOIS), 1995, 13(3): pp.325–354.
Google Scholar
Radev D R, Libner K, Fan W. Getting answers to natural language questions on the web. Journal of the American Society for Information Science and Technology (JASIST), 2002, 53(5): 359–364.
Article Google Scholar
Radev D R, Fan W, Qi H, Wu H, Grewal A. Probabilistic question answering on the web. Journal of the American Society for Information Science and Technology (JASIST), 2005, 56(6): 571–583.
Article Google Scholar
Sarner M H, Carberry S. A new strategy for providing definitions in task-oriented dialogues. In Proc. 12th Conf. Computational Linguistics, Budapest, Hungry, August 22–27, 1988, pp.567–572.
Xu J, Weischedel R, Licuanan A. Evaluation of an extraction-based approach to answering definitional questions. In Proc. 27th Annual Int. Conf. Research and Development in Information Retrieval, Sheffield, UK,
Herbrich R, Graepel T, Obermayer K. Support vector learning for ordinal regression. In Proc. 9th Int. Conf. Artificial Neural Networks, Edinburgh, UK,
Joachims T. Optimizing search engines using clickthrough data. In Proc. 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 23–26, 2002, pp.133–142.
Crammer K, Singer Y. Pranking with Ranking. Advances in Neural Information Processing Systems 14, Dietterich T G, Becker S, Ghahramani Z (eds.), Cambridge, MA: MIT Press, 2002.
Google Scholar
Fan W, Gordon M D, Pathak P. Genetic programming based discovery of ranking functions for effective web search. Journal of Management Information Systems (JMIS), 2005, 21(4): 37–56.
Google Scholar
Fürnkranz J, Hüllermeier E. Pairwise preference learning and ranking. In Proc. 14th European Conf. Machine Learning (ECML2003), Cavtat-Dubrovnik, Croatia, Sept. 22–26, 2003, pp.145–156.
Frank E, Hall M. A simple approach to ordinal classification. In Proc. 12th European Conf. Machine Learning (ECML2001), Freiburg, Germany, Sept. 5–7, 2001, pp.145–156.
Shashua A, Levin A. Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems. Advances in Neural Information Processing Systems 15, Becker S, Thrun S, Obermayer K (eds.), Cambridge, MA: MIT Press, 2003, pp.937–944.
Google Scholar
Tan Q, Chai X, Ng W, Lee D L. Applying co-training to clickthrough data for search engine adaptation. In Proc. the 9th Int. Conf. Database Systems for Advanced Applications, Lecture Notes in Computer Science 2973, Jeju Island, Korea, March 17–19, 2004, pp.519–532.
Harrington E F. Online ranking/collaborative filtering using the perceptron algorithm. In Proc. 12th Int. Conf. Machine Learning (ICML2003), Washington, USA, August 21–24, 2003, pp.250–257.
Kramer S, Widmer G, Pfahringer B, Groeve M. Prediction of ordinal classes using regression trees. In Proc. 12th Int. Symp. Methodologies for Intelligent Systems, Charlotte, NC, USA, Oct. 11–14, 2000, pp.426–434.
Lakoff G. Women, Fire, and Dangerous Things. Chicago: Chicago University Press, Ill, 1987.
Xun E, Huang C, Zhou M. A unified statistical model for the identification of English BaseNP. In Proc. 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, 2000, pp.547–554.
Vapnik V N. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
Levenshtein V I. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys., 1966, 10(8): 707–710.
MathSciNet Google Scholar
Robertson S E, Walker S, HancockBeaulieu M M et al. Okapi at TREC-4. In Proc. 4th Text Retrieval Conference, National Institute of Standards and Technology, Special Publication 500-236, 1995, pp.73–96.

Download references

Author information

Authors and Affiliations

College of Software, Nankai University, Tianjin, 300071, P.R. China
Jun Xu & Ya-Lou Huang
Microsoft Research Asia, Beijing, 100080, P.R. China
Yun-Bo Cao & Hang Li
Institute of Automation, Chinese Academy of Sciences, Beijing, 100080, P.R. China
Min Zhao

Authors

Jun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yun-Bo Cao
View author publications
You can also search for this author in PubMed Google Scholar
Hang Li
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ya-Lou Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Xu.

Additional information

The work was conducted when Xu and Zhao were visiting Microsoft Research Asia.

Jun Xu is a Ph.D. candidate at College of Information Science and Technology, Nankai University, China. He received his B.S. degree in computer science and techno-logy, from Nankai University in June 2001. His main research interests include text mining, information retrieval, and natural language processing.

Yun-Bo Cao is a researcher at Microsoft Research Asia. He received his M.S. degree in computer science, from Peking University in June 1997. His main research interests include statistical learning, natural language processing, data mining, and information retrieval.

Hang Li is a researcher at Microsoft Research Asia. He is also an adjunct professor of Xian Jiaotong University and Nankai University. He joined Microsoft Research in June 2001. Prior to that, he worked at the Research Laboratories of NEC Corporation. He obtained the B.S. degree in electrical engineering from Kyoto University in 1988 and the M.S. degree in computer science from Kyoto University in 1990. He earned his Ph.D. degree in computer science from the University of Tokyo in 1998. His research interests include statistical learning, natural language processing, data mining, and information retrieval.

Min Zhao is currently a researcher at NEC Laboratories China. She received her Ph.D. degree from Institute of Automation, Chinese Academy of Sciences. Her main research interests include rough set, data mining, and information retrieval.

Ya-Lou Huang is a professor at Nankai University. He received his Ph.D. degree in control theory and control engineering from Nankai University in 1993. His research interests include intelligent robot, intelligent information processing, and data mining.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J., Cao, YB., Li, H. et al. A Supervised Learning Approach to Search of Definitions. J Comput Sci Technol 21, 439–449 (2006). https://doi.org/10.1007/s11390-006-0439-4

Download citation

Received: 29 May 2005
Revised: 14 February 2006
Issue Date: May 2006
DOI: https://doi.org/10.1007/s11390-006-0439-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Supervised Learning Approach to Search of Definitions

Abstract

Access this article

Similar content being viewed by others

LLM-Based SPARQL Generation with Selected Schema from Large Scale Knowledge Base

Dataset search: a survey

A brief survey on recent advances in coreference resolution

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Supervised Learning Approach to Search of Definitions

Abstract

Access this article

Similar content being viewed by others

LLM-Based SPARQL Generation with Selected Schema from Large Scale Knowledge Base

Dataset search: a survey

A brief survey on recent advances in coreference resolution

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation