Domain Relevance on Term Weighting

Brunzel, Marko; Spiliopoulou, Myra

doi:10.1007/978-3-540-73351-5_41

Domain Relevance on Term Weighting

Marko Brunzel^1,2 &
Myra Spiliopoulou²

Conference paper

1015 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4592))

Abstract

The TFxIDF term weighting scheme is the standard approach on vectorization of textual data. For a data set where textual data stemming from web document structure is to be vectorized the need for a enhanced term weighting scheme arose. In this publication we introduce a term weighting scheme which improves the behavior compared to the traditional TFxIDF scheme by adding a component which is based on the linguistically inspired notion of domain relevance. Domain relevance measures the degree to which a term is regarded as more relevant within a data set compared to a reference data set. By means of this external component a potential weakness of TFxIDF on non standard distributed data sets is overcome. This weighting scheme favours domain relevant terms, which can be regarded as more useful in settings where the clustering is performed to be consumed by an human supervisor e.g for semi-automatic ontology learning.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aston, G., Burnard, L.: The BNC Handbook. Edinburgh University Press, Edinburgh (1998)
Google Scholar
Brunzel, M., Spiliopoulou, M.: Discovering multi terms and co-hyponymy from xhtml documents with XTREEM. In: Nayak, R., Zaki, M.J. (eds.) Knowledge Discovery from XML Documents. LNCS, vol. 3915, pp. 22–32. Springer, Heidelberg (2006)
Chapter Google Scholar
Brunzel, M., Spiliopoulou, M.: Discovering semantic sibling groups from web documents with XTREEM-SG. In: Staab, S., Svátek, V. (eds.) Managing Knowledge in a World of Networks. LNCS (LNAI), vol. 4248, pp. 141–157. Springer, Heidelberg (2006)
Chapter Google Scholar
Chung, T.M.: A corpus comparison approach for terminology extraction. Terminology 9(2), 221–246 (2003)
Article Google Scholar
Cimiano, P., Staab, S.: Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In: Biemann, C., Paas, G. (eds.) Proceedings of the ICML 2005 Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods, Bonn, Germany (August 2005)
Google Scholar
Damerau, F.J.: Generating and evaluating domain-oriented multi-word terms from texts. Inf. Process. Manage. 29(4), 433–447 (1993)
Article Google Scholar
Drouin, P.: Detection of domain specific terminology using corpora comparison. In: Proceedings of the fourth international Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal (2004)
Google Scholar
Faure, D., Nedellec, C.: Knowledge acquisition of predicate argument structures from technical texts using machine learning: The system asium. In: Fensel, D., Studer, R. (eds.) Knowledge Acquisition, Modeling and Management. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999)
Chapter Google Scholar
Kilgarriff, A.: Comparing corpora. International Journal of Corpus Linguistics 6(1), 97–133 (2001)
Article Google Scholar
Pierre, L.: Sur la variabiliti de la friquence des formes dans un corpus. M.O.T.S 1, 127–165 (1980)
Google Scholar
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA (1987)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Schaal, M., Müller, R.M., Brunzel, M., Spiliopoulou, M.: Relfin - topic discovery for ontology enhancement and annotation. In: Gómez-Pérez, A., Euzenat, J. (eds.) The Semantic Web: Research and Applications. LNCS, vol. 3532, pp. 608–622. Springer, Heidelberg (2005)
Google Scholar
Velardi, P., Missikoff, M., Basili, R.: Identification of relevant terms to support the construction of domain ontologies. In: Proceedings of the workshop on Human Language Technology and Knowledge Management, Morristown, NJ, USA, Association for Computational Linguistics, pp. 1–8 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

DFKI GmbH - German Research Center for AI,
Marko Brunzel
Otto-von-Guericke Universität Magdeburg, Germany
Marko Brunzel & Myra Spiliopoulou

Authors

Marko Brunzel
View author publications
You can also search for this author in PubMed Google Scholar
Myra Spiliopoulou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zoubida Kedad Nadira Lammari Elisabeth Métais Farid Meziane Yacine Rezgui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brunzel, M., Spiliopoulou, M. (2007). Domain Relevance on Term Weighting. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_41

Download citation

DOI: https://doi.org/10.1007/978-3-540-73351-5_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73350-8
Online ISBN: 978-3-540-73351-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics