ABSTRACT
In this article we present an interactive automatic document indexing software together with various index tuning/optimization strategies. After stems are generated from the raw text, the initial index vocabulary is narrowed down and tuned with the use of indexing versus clustering theory relationships. The narrowed down vocabulary is further optimized with the inclusion of term phrases and virtual terms corresponding to high and low frequency terms respectively. The results of performance experimentation which proved significant improvements of index vocabulary optimization are presented. The exploitation of the term discrimination value concept in index and retrieval system tuning and optimization is discussed.
- ALGH86.Cbtmr Coefficient Based I#zfo#t#o#z Re#uat S#lstem, MSc Thesis, Dept. of Computer Science, Arizona State University, Tempe, Arizon# 1986.Google Scholar
- BECK85.BE CKER, A.L. Tes# l#tabase 3o#are and I#pleme# l#fo# Ret#'#yo# Reseorch. MSc Thesis. Dept. of Computer Science, Arizona State University, Tempe, Arizona, 1985.Google Scholar
- CAN84.CAN, F., OZKARAHAN, E.A. 7his #@/mt#_ng - u#at## #c#t#n.#. Journal d the #ean SociEty -i'm" }z#orznaUcm Science. 35(5): 268-276; 19#.Google Scholar
- CAN85a.CAN, F., OZ_WARAHAN, E.A. Cbncepts oy the Eb#r Cbe##e#-BcLsed CLuste# Methodologg, # of ACt{ SIGIR Conference. June 1985. Montreal. Canada: 204-211. Google ScholarDigital Library
- CAN85b.CSA#, F., OZKARAHAN, E.A. #i/a+rtt# and Stoc#tcs#ng #gortt/m#. Journal of the #nv;ri.- can Society }or Information Science. 36(1): #t4; t98#.Google Scholar
- CAN85c.#}#, F. A Netu Cluste#n# Schem# for I#J'o#'m#- tion RePrieu#d b'#sterns Inco#x)v'# the SUP" #o'rt of ~# Dctcd#se MachO. Ph.D. dissertalion, Middle East Technical University, Ankara, January, 1985.Google Scholar
- CAN86.CsA, F., OZKARAHAN, E.A. Cowtpt#_ atimt of Te#/L;bcume# D/scr/minab/on Vo/.#aes b#/ of the mmeri~mn Society for m, ormatiom Sciea,ee, to appear.Google Scholar
- HOLL84.J # HO#, L The UTAH Tezt Reb#ev# project- A Stab# Rep0#, Proceedings of ACM #erence on Re.arch and Development in InformaUon Retrieval. July 1984. Cambridge, England. 123-132. Google ScholarDigital Library
- OZKA84.0ZKARAHAN, E.A., CAN, F. AR I#tteg#'wted Fact/Document I#J'o#w#f-io# 5#lstem yo#- #e A#om#z#, Information Technol# Research and DevelopmenL 3(3): 142-156, 1984. Google ScholarDigital Library
- OZKA86.OZY#RAHAN, E. Database Machines and Database-Management'IEnglew~~d986 Cliffs, New Jersey: Frentice-HaU; Google ScholarDigital Library
- SALT75a.SALTON, G. Dynamic Information and Library Processing. Englewood Cliffs, New #ersey: Prentice Hall; 19175. Google ScholarDigital Library
- SALT75b.SALTON; G. A TAeo#j of_ In#e#. l#ionM Conference Series m _Applied Mathematics No.18, Society for inddstrial and Applied Mathematics, Philadelphia, Pennsylvania; 1975.Google Scholar
- SALT83.SALTON, G., McGILL M.J., In#ucUon to Modem information Retrieval, New York: McGraw Hill, 1983. Google ScholarDigital Library
- An automatic and tunable document indexing system
Recommendations
Practical indexing XML Document For Twig query
ASIAN'05: Proceedings of the 10th Asian Computing Science conference on Advances in computer science: data management on the webAnswering structural queries of XML with index is an important approach of efficient XML query processing. Among existing structural indexes for XML data, F&B index is the smallest index that can answer all branching queries. However, an F&B index for ...
Document Similarity Using a Phrase Indexing Graph Model
Document clustering techniques mostly rely on single term analysis of text, such as the vector space model. To better capture the structure of documents, the underlying data model should be able to represent the phrases in the document as well as single ...
Document indexing: a concept-based approach to term weight estimation
Traditional index weighting approaches for information retrieval from texts depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is ...
Comments