Skip to main content

A Domain Independent Approach for Extracting Terms from Research Papers

  • Conference paper
  • First Online:
Book cover Databases Theory and Applications (ADC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9093))

Included in the following conference series:

Abstract

We study the problem of extracting terms from research papers, which is an important step towards building knowledge graphs in research domain. Existing terminology extraction approaches are mostly domain dependent. They use domain specific linguistic rules, supervised machine learning techniques or a combination of the two to extract the terms. Using domain knowledge requires much human effort, e.g., manually composing a set of linguistic rules or labeling a large corpus, and hence limits the applicability of the existing approaches. To overcome this limitation, we propose a new terminology extraction approach that makes use of no knowledge from any specific domain. In particular, we use the title words and the keywords in research papers as the seeding terms and word2vec to identify similar terms from an open-domain corpus as the candidate terms, which are then filtered by checking their occurrence in the research papers. We repeat this process using the newly found terms until no new candidate term can be found. We conduct extensive experiments on the proposed approach. The results show that our approach can extract the terms effectively, while being domain independent.

Birong Jiang—This work is done when Birong is a visiting student at the University of Melbourne.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bu, F., Zhu, X., Li, M.: Measuring the non-compositionality of multiword expressions. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 116–124 (2010)

    Google Scholar 

  2. Daille, B., Habert, B., Jacquemin, C., Royauté, J.: Empirical Observation of Term Variations and Principles for their Description. Terminology 3(2), 197–258 (1996)

    Article  Google Scholar 

  3. Dennis, S.F.: The construction of a thesaurus automatically from a sample of text. In: Proceedings of the Symposium on Statistical Association Methods for Mechanized Documentation, pp. 61–148 (1965)

    Google Scholar 

  4. Earl, L.L.: Experiments in automatic extracting and indexing. Information Storage and Retrieval 6(4), 313–330 (1970)

    Article  Google Scholar 

  5. Evans, D.A., Zhai, C.: Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL), pp. 17–24 (1996)

    Google Scholar 

  6. Frantzi, K.T., Ananiadou, S.: Extracting nested collocations. In: Proceedings of the 16th Conference on Computational Linguistics (COLING), pp. 41–46 (1996)

    Google Scholar 

  7. Gianluca, R.B., Rossi, G.D., Pazienza, M.T.: Inducing terminology for lexical acquisition. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (1997)

    Google Scholar 

  8. Hoffart, J., Suchanek, F.M., Berberich, K., Lewis-Kelham, E., de Melo, G., Weikum, G.: Yago2: exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th International Conference Companion on World Wide Web (WWW), pp. 229–232 (2011)

    Google Scholar 

  9. Jones, L.P., Gassie Jr., E.W., Radhakrishnan, S.: Index: The statistical basis for an automatic conceptual phrase-indexing system. Journal of the American Society for Information Science 41(2), 87–97 (1990)

    Article  Google Scholar 

  10. Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1, 9–27 (1995)

    Article  Google Scholar 

  11. Krenn, B.: Empirical implications on lexical association measures. In: Proceedings of the 9th EURALEX International Congress (2000)

    Google Scholar 

  12. Maynard, D., Ananiadou, S.: Identifying contextual information for multi-word term extraction. In: 5th International Congress on Terminology and Knowledge Engineering, pp. 212–221 (1999)

    Google Scholar 

  13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS), pp. 3111–3119 (2013)

    Google Scholar 

  14. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics on Human Language Technologies (HLT-NAACL), pp. 746–751 (2013)

    Google Scholar 

  15. Pazienza, M., Pennacchiotti, M., Zanzotto, F.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis, S. (ed.) Knowledge Mining, Studies in Fuzziness and Soft Computing, vol. 185, pp. 255–279. Springer, Heidelberg (2005)

    Google Scholar 

  16. Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. Journal of the American Society for Information Science 26(1), 33–44 (1975)

    Article  Google Scholar 

  17. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL), pp. 173–180 (2003)

    Google Scholar 

  18. Velardi, P., Missikoff, M., Basili, R.: Identification of relevant terms to support the construction of domain ontologies. In: Proceedings of the ACL Workshop on Human Language Technology and Knowledge Management, pp. 5:1–5:8 (2001)

    Google Scholar 

  19. Xun, E., Li, C.: Applying terminology definition pattern and multiple features to identify technical new term and its definition. Journal of Computer Research and Development 46(1), 62–68 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianzhong Qi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Jiang, B., Xun, E., Qi, J. (2015). A Domain Independent Approach for Extracting Terms from Research Papers. In: Sharaf, M., Cheema, M., Qi, J. (eds) Databases Theory and Applications. ADC 2015. Lecture Notes in Computer Science(), vol 9093. Springer, Cham. https://doi.org/10.1007/978-3-319-19548-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19548-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19547-6

  • Online ISBN: 978-3-319-19548-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics