Abstract
Large collections of full-text document are now commonly used in automated information retrieval Readers generally identify the subject of a text when they notice specific terms, calledField Association (FA) terms, in that text. Previous researches showed that evidence from passage can improve retrieval results by dividing documents into coherent units with each unit corresponding to a subtopic. Moreover, many current researchers are extracting FA terms candidates from the whole documents to build FA term dictionary automatically. This paper proposes a method for automatically building new FA term dictionary from documents after using passage retrieval. A WWW search engine is used to extract FA terms candidates from passage document corpora. Then, new FA terms candidates in each field are automatically compared with previously determined FA terms dictionary. Finally, new FA terms from extracted term candidates are appended automatically to the existence FA terms dictionary. From experimental results the new technique using passage documents can automatically append about 15% of FA terms from terms candidates to the existence FA term dictionary over the old method. Moreover, Recall and Precision significantly improved by 20% and 32% over the traditional method. The proposed methods are applied to 38,372 articles from the large tagged corpus.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aoe, J., Morita, K., Mochizuki, H.: An Efficient Retrieval Algorithm of Collocate Information Using Tree Structure. Transaction of the IPSJ 39(9), 2563–2571 (1989)
Atlam, E.-S., Morita, K., Fuketa, M., Aoe, J.: A New Method for Selecting English Compound Terms and its Knowledge Representation. Information Processing & Management Journal 38(6), 807–821 (2002)
Atlam, E.-S., Fuketa, M., Morita, K., Aoe, J.: Document Similarity measurement using Field association terms. Information Processing & Management Journal 39(6), 809–824 (2003)
Atlam, E.-S., Elmarhomy, G., Fuketa, M., Morita, K., Aoe, J.: Automatic Building of New Field Association Word Candidates Using Search Engine. Information Processing & Management Journal 42(4), 951–962 (2006)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, Boca Raton (1984)
Callen, J.P.: Passage and level evidence in document retrieval. In: Proc. of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 302–310 (1994)
Dozawa, T.: Innovative Multi Information Dictionary Imidas 1999. Annual Series. Zueisha Publication Co., Japan (1999) (In Japanese)
Fuhr, N.: Models for retrieval with probabilistic indexing. Information Processing and Retrieval 25(1), 55–72 (1989)
Fukumoto, F., Suzuki, Y.: Automatic Clustering of Articles using Dictionary definitions. In: Proceeding of the 16th International Conference on Computional Linguistic (COLING 1996), pp. 406–411 (1996)
Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: Korfhage, R., Rasmussen, E., Willet, P. (eds.) Proceedings of the 16th annual international ACM-SIGIR conference on research and development in information retrieval, pp. 59–68. ACM, New York (1993)
Hearst, M.A.: TextTiling, a quantitative approach to discourse segmentation. Technical Report 93/24 Sequoia 2000 Technical Report, University of California, Berkeley (2000)
Iwayama, M., Tokunaga, T.: Probabilistic Passage Categorization and Its Application. Journal of Natural language Processing 6(3), 181–198 (1999)
Jiang, J., Zhai, C.X.: UIUC in HARD 2004-Passage Retrieval Using HMMs, University of Illinois at Urbana-Champaign. TREC 2004 (2004)
Jones, K.S.: Automatic summarizing: factors and directions, Computer Laboratory, University of Cambridge (1998)
Kaszkiel, M., Zobel, J.: Passage retrieval revised. In: Proc. of the 20th Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pp. 178–185 (1997)
Kawabe, K., Matsumoto, Y.: Acquisition of normal lexical knowledge based on basic level category. Information Processing Society of Japan, SIG note NL125-9, 87–92 (1998) (in Japanese)
Melucii, M.: Passage Retrieval and a Probabilistic technique. Information Processing and Management 34(1), 43–68 (1998)
Ohkubo, M., Sugizaki, M., Inoue, T., Tanaka, K.: Extracting Information Demand by Analyzing a WWW Search Login. Trans. of Information Processing Society of Japan 39(7), 2250–2258 (1998)
Salton, G., McGill, M.J.: Introduction of Modern Information Retrieval. McGraw-Hill, New York (1983)
Salton, G., Allan, J., Singhal, A.K.: Automatic text decomposition and structuring. Information Processing and Management 32(2), 127–138 (1996)
Salton, G.: Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of the 16th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58 (1993)
Tsuji, T., Nigazawa, H., Okada, M., Aoe, J.: Early Field Recognition by Using Field Association Words. In: The Proceeding of the 18th International Conference on Computer Processing of Oriental Language, vol. 2, pp. 301–304 (1999)
Tsuji, T., Fuketa, M., Morita, K., Aoe, J.: An Efficient Method of Determining Field Association Terms of Compound Words. Journal of Natural Language Processing 7(2), 3–26 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Morita, K., Atlam, ES., Ghada, E., Fuketa, M., Aoe, Ji. (2006). A New Approach for Improving Field Association Term Dictionary Using Passage Retrieval. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893004_39
Download citation
DOI: https://doi.org/10.1007/11893004_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46537-9
Online ISBN: 978-3-540-46539-3
eBook Packages: Computer ScienceComputer Science (R0)