Comparison of New Simple Weighting Functions for Web Documents against Existing Methods

Hyusein, Byurhan; Patel, Ahmed; Zyulkyarov, Ferad

doi:10.1007/978-3-540-39737-3_30

Comparison of New Simple Weighting Functions for Web Documents against Existing Methods

Byurhan Hyusein⁶,
Ahmed Patel⁶ &
Ferad Zyulkyarov⁶

Conference paper

667 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2869))

Abstract

Term weighting is one of the most important aspects of modern Web retrieval systems. The weight associated with a given term in a document shows the importance of the term for the document, i.e. its usefulness for distinguishing documents in a document collection. In search engines operating in a dynamic environment such as the Internet, where many documents are deleted from and added to the database, the usual formula involving the inverse document frequency is too costly to be computed each time the document collection is updated. This paper proposes two new simple and effective weighting functions. These weighting functions have been tested and compared with results obtained for the PIVOT, SMART and INQUERY methods using the WT10g collection of documents.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Salton, G., Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), 513–523 (1988)
Article Google Scholar
Khoussainov, R., O’Meara, T., Patel, A.: Independent Proprietorship and Competition in Distributed Web Search Architectures. In: Proceeding of the Seventh IEEE International Conference on Engineering of Complex Computer Systems (ICECCS 2001), pp. 191–199. IEEE Computer Society Press, Los Alamitos (2001)
Chapter Google Scholar
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Buckley, C., Walz, J.: SabIR Research at TREC 9. In: Proceeding of the 9th Text REtrieval Conference (TREC-9), pp. 475–477. The National Institute of Standards and Technology (2000)
Google Scholar
Larson, R.: Term Weighting in Smart (October 1998), Available from http://www.sims.berkeley.edu/courses/is202/f98/Lecture18/sld021.htm (Accessed July 14, 2003)
Broglio, J., Callan, J.P., Croft, W.B., Nachbar, D.W.: Document Retrieval and Routing Using the Inquery System. In: Proceeding of the Third Text Retrieval Conference (TREC-3), pp. 29–38. The National Institute of Standards and Technology (1995)
Google Scholar
Singhal, A., Buckley, C., Mitra, M.: Pivoted Document Length Normalization. In: Frei, H.-P., Harman, D., Schäuble, P., Wilkinson, R. (eds.) Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 21–29. ACM Press, New York (1996)
Google Scholar
Bailey, P., Craswell, N., Hawking, D.: Engineering a Multi-Purpose Test Collection for Web Retrieval Experiments. Information Processing and Management (2002)
Google Scholar
Hawking, D.: CSIRO Mathematical, and Information Sciences. Overview of the TREC-9 Web Track. In: Proceeding of the 9th Text REtrieval Conference (TREC- 9), pp. 87–102. The National Institute of Standards and Technology (2000)
Google Scholar
Internet Archive: Building an Internet Library, http://www.archive.org
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Networks and Distributed Systems Research Group, Department of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
Byurhan Hyusein, Ahmed Patel & Ferad Zyulkyarov

Authors

Byurhan Hyusein
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Patel
View author publications
You can also search for this author in PubMed Google Scholar
Ferad Zyulkyarov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Engineering, Middle East Technical University, Ankara, Turkey
Adnan Yazıcı
Department of Computer Engineering, Middle East Technical University, 06531, Ankara, Turkey
Cevat Şener

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hyusein, B., Patel, A., Zyulkyarov, F. (2003). Comparison of New Simple Weighting Functions for Web Documents against Existing Methods. In: Yazıcı, A., Şener, C. (eds) Computer and Information Sciences - ISCIS 2003. ISCIS 2003. Lecture Notes in Computer Science, vol 2869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39737-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-39737-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20409-1
Online ISBN: 978-3-540-39737-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics