An automatic and tunable document indexing system

Authors:
Esen Ozkarahan

Dept. of Computer Science, Arizona State University, Tempe, Arizona

Dept. of Computer Science, Arizona State University, Tempe, Arizona
View Profile

,
Fazli Can

Dept. of Electrical & Electronics Engineering, Middle East Technical Univ., Ankara, Turkey

Dept. of Electrical & Electronics Engineering, Middle East Technical Univ., Ankara, Turkey
View Profile

SIGIR '86: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrievalSeptember 1986Pages 234–243https://doi.org/10.1145/253168.253218

Published:01 September 1986Publication History

SIGIR '86: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 234–243

ABSTRACT

In this article we present an interactive automatic document indexing software together with various index tuning/optimization strategies. After stems are generated from the raw text, the initial index vocabulary is narrowed down and tuned with the use of indexing versus clustering theory relationships. The narrowed down vocabulary is further optimized with the inclusion of term phrases and virtual terms corresponding to high and low frequency terms respectively. The results of performance experimentation which proved significant improvements of index vocabulary optimization are presented. The exploitation of the term discrimination value concept in index and retrieval system tuning and optimization is discussed.

References

ALGH86.Cbtmr Coefficient Based I#zfo#t#o#z Re#uat S#lstem, MSc Thesis, Dept. of Computer Science, Arizona State University, Tempe, Arizon# 1986.Google Scholar
BECK85.BE CKER, A.L. Tes# l#tabase 3o#are and I#pleme# l#fo# Ret#'#yo# Reseorch. MSc Thesis. Dept. of Computer Science, Arizona State University, Tempe, Arizona, 1985.Google Scholar
CAN84.CAN, F., OZKARAHAN, E.A. 7his #@/mt#_ng - u#at## #c#t#n.#. Journal d the #ean SociEty -i'm" }z#orznaUcm Science. 35(5): 268-276; 19#.Google Scholar
CAN85a.CAN, F., OZ_WARAHAN, E.A. Cbncepts oy the Eb#r Cbe##e#-BcLsed CLuste# Methodologg, # of ACt{ SIGIR Conference. June 1985. Montreal. Canada: 204-211. Google ScholarDigital Library
CAN85b.CSA#, F., OZKARAHAN, E.A. #i/a+rtt# and Stoc#tcs#ng #gortt/m#. Journal of the #nv;ri.- can Society }or Information Science. 36(1): #t4; t98#.Google Scholar
CAN85c.#}#, F. A Netu Cluste#n# Schem# for I#J'o#'m#- tion RePrieu#d b'#sterns Inco#x)v'# the SUP" #o'rt of ~# Dctcd#se MachO. Ph.D. dissertalion, Middle East Technical University, Ankara, January, 1985.Google Scholar
CAN86.CsA, F., OZKARAHAN, E.A. Cowtpt#_ atimt of Te#/L;bcume# D/scr/minab/on Vo/.#aes b#/ of the mmeri~mn Society for m, ormatiom Sciea,ee, to appear.Google Scholar
HOLL84.J # HO#, L The UTAH Tezt Reb#ev# project- A Stab# Rep0#, Proceedings of ACM #erence on Re.arch and Development in InformaUon Retrieval. July 1984. Cambridge, England. 123-132. Google ScholarDigital Library
OZKA84.0ZKARAHAN, E.A., CAN, F. AR I#tteg#'wted Fact/Document I#J'o#w#f-io# 5#lstem yo#- #e A#om#z#, Information Technol# Research and DevelopmenL 3(3): 142-156, 1984. Google ScholarDigital Library
OZKA86.OZY#RAHAN, E. Database Machines and Database-Management'IEnglew~~d986 Cliffs, New Jersey: Frentice-HaU; Google ScholarDigital Library
SALT75a.SALTON, G. Dynamic Information and Library Processing. Englewood Cliffs, New #ersey: Prentice Hall; 19175. Google ScholarDigital Library
SALT75b.SALTON; G. A TAeo#j of_ In#e#. l#ionM Conference Series m _Applied Mathematics No.18, Society for inddstrial and Applied Mathematics, Philadelphia, Pennsylvania; 1975.Google Scholar
SALT83.SALTON, G., McGILL M.J., In#ucUon to Modem information Retrieval, New York: McGraw Hill, 1983. Google ScholarDigital Library

An automatic and tunable document indexing system
1. Information systems
  1. Information retrieval

Recommendations

Practical indexing XML Document For Twig query
ASIAN'05: Proceedings of the 10th Asian Computing Science conference on Advances in computer science: data management on the web

Answering structural queries of XML with index is an important approach of efficient XML query processing. Among existing structural indexes for XML data, F&B index is the smallest index that can answer all branching queries. However, an F&B index for ...
Read More
Document Similarity Using a Phrase Indexing Graph Model

Document clustering techniques mostly rely on single term analysis of text, such as the vector space model. To better capture the structure of documents, the underlying data model should be able to represent the phrases in the document as well as single ...
Read More
Document indexing: a concept-based approach to term weight estimation

Traditional index weighting approaches for information retrieval from texts depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '86: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
September 1986
283 pages
ISBN:0897911873
DOI:10.1145/253168
Chairman:
Luigi Rossi Bernardi
CNR, Italy
,
Editor:
Fausto Rabitti
IEI-CNR, Italy
Copyright © 1986 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 1986
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 361
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An automatic and tunable document indexing system

SIGIR '86: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Recommendations

Practical indexing XML Document For Twig query

Document Similarity Using a Phrase Indexing Graph Model

Document indexing: a concept-based approach to term weight estimation