Skip to main content

Language Modeling for Effective Construction of Domain Specific Thesauri

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3136))

Abstract

In this paper we present an approach for effective construction of domain specific thesauri. We assume that the collection is partitioned into document categories. By taking advantage of these pre-defined categories, we are able to conceptualize a new topical language model to weight term topicality more accurately. With the help of information theory, interesting relationships among thesaurus elements are discovered deductively. Based on the “Layer-Seeds” clustering algorithm, topical terms from documents in a certain category will be organized according to their relationships in a tree-like hierarchical structure — a thesaurus. Experimental results show that the thesaurus contains satisfactory structures, although it differs to some extent from a manually created thesaurus. A first evaluation of the thesaurus in a query expansion task yields evidence that an increase of recall can be achieved without loss of precision.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Crouch, C.J., Yang, B.: Experiments in automatic statistical thesaurus construction. In: SIGIR 1992, 15th Int. ACM/SIGIR Conf. on R&D in Information Retrieval, Copenhagen, Denmark, June 1992, pp. 77–87 (1992)

    Google Scholar 

  2. Fuhr, N., Roelleke, T.: HySpirit — A probabilistic inference engine for hypermedia retrieval in large databases. In: International Conference on Extending Database Technology, Valencia, Spain (1998)

    Google Scholar 

  3. Gelbukh, A., Sidorov, G., Guzman-Arenas, A.: Use of a weighted topic hierarchy for document classification. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, p. 133. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  4. Jing, Y.F., Croft, W.B.: An Association Thesaurus for Information Retrieval. In: RIAO 94 Conference Proceedings, New York, October 1994, pp. 146–160 (1994)

    Google Scholar 

  5. Lawrie, D.: Language Models for Hierarchical Summarization. Dissertation. University of Massachusetts, Amherst (2003)

    Google Scholar 

  6. Qiu, Y., Frei, H.P.: Concept based query expansion. In: Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 160–170. ACM Press, New York (1993)

    Google Scholar 

  7. Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill Book Company, New York (1968)

    Google Scholar 

  8. Sanderson, M., Croft, B.: Deriving concept hierarchies from text. In: The Proceedings of the 22nd ACM SIGIR Conference, pp. 206–213 (1999)

    Google Scholar 

  9. Sparck-Jones, K.: Automatic Keyword Classification for Information Retrieval. Butterworth, London (1971)

    Google Scholar 

  10. Thiel, U., L’Abbate, M., Paradiso, A., Stein, A., Semeraro, G., Abbattista, F., Lops, P.: The COGITO Project. In: e-Business applications: results of applied research on e-Commerce, Supply Chain Management and Extended Enterprises. Section 2: eCommerce, Springer, Heidelberg (2002)

    Google Scholar 

  11. Kilgariff, A.: Thesauruses for Natural Language Processing. Technical Report Series: ITRI- 03-15, ITRI, Univ. of Brighton

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, L., Thiel, U. (2004). Language Modeling for Effective Construction of Domain Specific Thesauri. In: Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2004. Lecture Notes in Computer Science, vol 3136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27779-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27779-8_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22564-5

  • Online ISBN: 978-3-540-27779-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics