Skip to main content
Log in

A Supervised Learning Approach to Search of Definitions

  • Artificial Intelligence
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

This paper addresses the issue of search of definitions. Specifically, for a given term, we are to find out its definition candidates and rank the candidates according to their likelihood of being good definitions. This is in contrast to the traditional methods of either generating a single combined definition or outputting all retrieved definitions. Definition ranking is essential for tasks. A specification for judging the goodness of a definition is given. In the specification, a definition is categorized into one of the three levels: good definition, indifferent definition, or bad definition. Methods of performing definition ranking are also proposed in this paper, which formalize the problem as either classification or ordinal regression. We employ SVM (Support Vector Machines) as the classification model and Ranking SVM as the ordinal regression model respectively, and thus they rank definition candidates according to their likelihood of being good definitions. Features for constructing the SVM and Ranking SVM models are defined, which represent the characteristics of terms, definition candidate, and their relationship. Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods such as heuristic rules, the conventional information retrieval—Okapi, or SVM regression. This is true when both the answers are paragraphs and they are sentences. Experimental results also show that SVM or Ranking SVM models trained in one domain can be adapted to another domain, indicating that generic models for definition ranking can be constructed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Salton G, McGill M. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.

  2. Voorhees E. Evaluating answers to definition questions. In Proc. Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, Edmonton, Canada, 2003, pp.109–111.

  3. Voorhees E. Overview of the TREC 2003 question answering track. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.54–68.

  4. Blair-Goldensohn S, McKeown K R, Schlaikjer A H. A hybrid approach for QA track definitional questions. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.185–192.

  5. Harabagiu S, Moldovan D et al. Answer mining by combining extraction techniques with abductive reasoning. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.375–382.

  6. Xu J, Licuanan A, Weischedel R. TREC 2003 QA at BBN: Answering definitional questions. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.98–106.

  7. Echihabi A, Hermjakob U, Hovy E et al. Multiple-engine question answering in TextMap. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.772–781.

  8. Yang H, Cui H, Kan M Y et al. QUALIFIER in TREC-12 QA main task. In Proc. 12th Text Retrieval Conference, Washington, 2003, pp.480–488.

  9. Klavans J, Muresan S. DEFINDER: Rule-based methods for the extraction of medical terminology and their associated definitions from on-line text. In Proc. American Medical Informatics Association Symposium, Los Angeles, CA, USA, 2000, pp.201–202.

  10. Liu B, Chin C, Ng H. Mining topic-specific concepts and definitions on the web. In Proc. 12th Int. Conf. World Wide Web, Budapest, Hungary, May 20–24, 2003, pp.251–260.

  11. Agichtein E, Lawrence S, Gravano L. Learning search engine specific query transformations for question answering. In Proc. 10th Int. Conf. World Wide Web, Hong Kong, May 1–5, 2001, pp.169–178.

  12. Blair-Goldensohn S, McKeown K R, Schlaikjer A H. Answering Definitional Questions: A Hybrid Approach (Chapter 5). New Directions in Question Answering, Maybury M (ed.), AAAI Press, 2004.

  13. Blair-Goldensohn S, McKeown K R, Schlaikjer A H. DefScriber: A hybrid system for definitional QA. In Proc. 26th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Toronto, Canada, July 28–August 1, 2003, pp.462–462.

  14. Cui H, Kan M, Chua T. Unsupervised learning of soft patterns for definitional question answering. In Proc. 13th Int. Conf. World Wide Web, New York, USA, May 17–20, 2004, pp.90–99.

  15. Kwok K L. Experiments with a component theory of probabilistic information retrieval based on single terms as document components. In ACM Trans. Information Systems (TOIS), 1990, 8(4): 363–386.

    Article  MathSciNet  Google Scholar 

  16. Kwok K L. A network approach to probabilistic information retrieval. In ACM Trans. Information Systems (TOIS), 1995, 13(3): pp.325–354.

    Google Scholar 

  17. Radev D R, Libner K, Fan W. Getting answers to natural language questions on the web. Journal of the American Society for Information Science and Technology (JASIST), 2002, 53(5): 359–364.

    Article  Google Scholar 

  18. Radev D R, Fan W, Qi H, Wu H, Grewal A. Probabilistic question answering on the web. Journal of the American Society for Information Science and Technology (JASIST), 2005, 56(6): 571–583.

    Article  Google Scholar 

  19. Sarner M H, Carberry S. A new strategy for providing definitions in task-oriented dialogues. In Proc. 12th Conf. Computational Linguistics, Budapest, Hungry, August 22–27, 1988, pp.567–572.

  20. Xu J, Weischedel R, Licuanan A. Evaluation of an extraction-based approach to answering definitional questions. In Proc. 27th Annual Int. Conf. Research and Development in Information Retrieval, Sheffield, UK,

  21. Herbrich R, Graepel T, Obermayer K. Support vector learning for ordinal regression. In Proc. 9th Int. Conf. Artificial Neural Networks, Edinburgh, UK,

  22. Joachims T. Optimizing search engines using clickthrough data. In Proc. 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 23–26, 2002, pp.133–142.

  23. Crammer K, Singer Y. Pranking with Ranking. Advances in Neural Information Processing Systems 14, Dietterich T G, Becker S, Ghahramani Z (eds.), Cambridge, MA: MIT Press, 2002.

    Google Scholar 

  24. Fan W, Gordon M D, Pathak P. Genetic programming based discovery of ranking functions for effective web search. Journal of Management Information Systems (JMIS), 2005, 21(4): 37–56.

    Google Scholar 

  25. Fürnkranz J, Hüllermeier E. Pairwise preference learning and ranking. In Proc. 14th European Conf. Machine Learning (ECML2003), Cavtat-Dubrovnik, Croatia, Sept. 22–26, 2003, pp.145–156.

  26. Frank E, Hall M. A simple approach to ordinal classification. In Proc. 12th European Conf. Machine Learning (ECML2001), Freiburg, Germany, Sept. 5–7, 2001, pp.145–156.

  27. Shashua A, Levin A. Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems. Advances in Neural Information Processing Systems 15, Becker S, Thrun S, Obermayer K (eds.), Cambridge, MA: MIT Press, 2003, pp.937–944.

    Google Scholar 

  28. Tan Q, Chai X, Ng W, Lee D L. Applying co-training to clickthrough data for search engine adaptation. In Proc. the 9th Int. Conf. Database Systems for Advanced Applications, Lecture Notes in Computer Science 2973, Jeju Island, Korea, March 17–19, 2004, pp.519–532.

  29. Harrington E F. Online ranking/collaborative filtering using the perceptron algorithm. In Proc. 12th Int. Conf. Machine Learning (ICML2003), Washington, USA, August 21–24, 2003, pp.250–257.

  30. Kramer S, Widmer G, Pfahringer B, Groeve M. Prediction of ordinal classes using regression trees. In Proc. 12th Int. Symp. Methodologies for Intelligent Systems, Charlotte, NC, USA, Oct. 11–14, 2000, pp.426–434.

  31. Lakoff G. Women, Fire, and Dangerous Things. Chicago: Chicago University Press, Ill, 1987.

  32. Xun E, Huang C, Zhou M. A unified statistical model for the identification of English BaseNP. In Proc. 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, 2000, pp.547–554.

  33. Vapnik V N. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.

  34. Levenshtein V I. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys., 1966, 10(8): 707–710.

    MathSciNet  Google Scholar 

  35. Robertson S E, Walker S, HancockBeaulieu M M et al. Okapi at TREC-4. In Proc. 4th Text Retrieval Conference, National Institute of Standards and Technology, Special Publication 500-236, 1995, pp.73–96.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Xu.

Additional information

The work was conducted when Xu and Zhao were visiting Microsoft Research Asia.

Jun Xu is a Ph.D. candidate at College of Information Science and Technology, Nankai University, China. He received his B.S. degree in computer science and techno-logy, from Nankai University in June 2001. His main research interests include text mining, information retrieval, and natural language processing.

Yun-Bo Cao is a researcher at Microsoft Research Asia. He received his M.S. degree in computer science, from Peking University in June 1997. His main research interests include statistical learning, natural language processing, data mining, and information retrieval.

Hang Li is a researcher at Microsoft Research Asia. He is also an adjunct professor of Xian Jiaotong University and Nankai University. He joined Microsoft Research in June 2001. Prior to that, he worked at the Research Laboratories of NEC Corporation. He obtained the B.S. degree in electrical engineering from Kyoto University in 1988 and the M.S. degree in computer science from Kyoto University in 1990. He earned his Ph.D. degree in computer science from the University of Tokyo in 1998. His research interests include statistical learning, natural language processing, data mining, and information retrieval.

Min Zhao is currently a researcher at NEC Laboratories China. She received her Ph.D. degree from Institute of Automation, Chinese Academy of Sciences. Her main research interests include rough set, data mining, and information retrieval.

Ya-Lou Huang is a professor at Nankai University. He received his Ph.D. degree in control theory and control engineering from Nankai University in 1993. His research interests include intelligent robot, intelligent information processing, and data mining.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J., Cao, YB., Li, H. et al. A Supervised Learning Approach to Search of Definitions. J Comput Sci Technol 21, 439–449 (2006). https://doi.org/10.1007/s11390-006-0439-4

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-006-0439-4

Keywords

Navigation