Skip to main content

Domain Relevance on Term Weighting

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4592))

Abstract

The TFxIDF term weighting scheme is the standard approach on vectorization of textual data. For a data set where textual data stemming from web document structure is to be vectorized the need for a enhanced term weighting scheme arose. In this publication we introduce a term weighting scheme which improves the behavior compared to the traditional TFxIDF scheme by adding a component which is based on the linguistically inspired notion of domain relevance. Domain relevance measures the degree to which a term is regarded as more relevant within a data set compared to a reference data set. By means of this external component a potential weakness of TFxIDF on non standard distributed data sets is overcome. This weighting scheme favours domain relevant terms, which can be regarded as more useful in settings where the clustering is performed to be consumed by an human supervisor e.g for semi-automatic ontology learning.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aston, G., Burnard, L.: The BNC Handbook. Edinburgh University Press, Edinburgh (1998)

    Google Scholar 

  2. Brunzel, M., Spiliopoulou, M.: Discovering multi terms and co-hyponymy from xhtml documents with XTREEM. In: Nayak, R., Zaki, M.J. (eds.) Knowledge Discovery from XML Documents. LNCS, vol. 3915, pp. 22–32. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Brunzel, M., Spiliopoulou, M.: Discovering semantic sibling groups from web documents with XTREEM-SG. In: Staab, S., Svátek, V. (eds.) Managing Knowledge in a World of Networks. LNCS (LNAI), vol. 4248, pp. 141–157. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Chung, T.M.: A corpus comparison approach for terminology extraction. Terminology 9(2), 221–246 (2003)

    Article  Google Scholar 

  5. Cimiano, P., Staab, S.: Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In: Biemann, C., Paas, G. (eds.) Proceedings of the ICML 2005 Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods, Bonn, Germany (August 2005)

    Google Scholar 

  6. Damerau, F.J.: Generating and evaluating domain-oriented multi-word terms from texts. Inf. Process. Manage. 29(4), 433–447 (1993)

    Article  Google Scholar 

  7. Drouin, P.: Detection of domain specific terminology using corpora comparison. In: Proceedings of the fourth international Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal (2004)

    Google Scholar 

  8. Faure, D., Nedellec, C.: Knowledge acquisition of predicate argument structures from technical texts using machine learning: The system asium. In: Fensel, D., Studer, R. (eds.) Knowledge Acquisition, Modeling and Management. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  9. Kilgarriff, A.: Comparing corpora. International Journal of Corpus Linguistics 6(1), 97–133 (2001)

    Article  Google Scholar 

  10. Pierre, L.: Sur la variabiliti de la friquence des formes dans un corpus. M.O.T.S 1, 127–165 (1980)

    Google Scholar 

  11. Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA (1987)

    Google Scholar 

  12. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  13. Schaal, M., Müller, R.M., Brunzel, M., Spiliopoulou, M.: Relfin - topic discovery for ontology enhancement and annotation. In: Gómez-Pérez, A., Euzenat, J. (eds.) The Semantic Web: Research and Applications. LNCS, vol. 3532, pp. 608–622. Springer, Heidelberg (2005)

    Google Scholar 

  14. Velardi, P., Missikoff, M., Basili, R.: Identification of relevant terms to support the construction of domain ontologies. In: Proceedings of the workshop on Human Language Technology and Knowledge Management, Morristown, NJ, USA, Association for Computational Linguistics, pp. 1–8 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zoubida Kedad Nadira Lammari Elisabeth Métais Farid Meziane Yacine Rezgui

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brunzel, M., Spiliopoulou, M. (2007). Domain Relevance on Term Weighting. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73351-5_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73350-8

  • Online ISBN: 978-3-540-73351-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics