Building New Field Association Term Candidates Automatically by Search Engine

Fuketa, Masao; Atlam, El-Sayed; Ghada, Elmarhomy; Morita, Kazuhiro; Aoe, Jun-ichi

doi:10.1007/11893004_42

Masao Fuketa²¹,
El-Sayed Atlam²¹,
Elmarhomy Ghada²¹,
Kazuhiro Morita²¹ &
…
Jun-ichi Aoe²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4252))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

2433 Accesses

Abstract

With increasing popularity of the Internet and tremendous amount of on-line text, automatic document classification is important for organizing huge amounts of data. Readers can know the subject of many document fields by reading only some specific Field Association (FA) words. Document fields can be decided efficiently if there are many FA words and if the frequency rate is high. This paper proposes a method for automatically building new FA words. A WWW search engine is used to extract FA word candidates from document corpora. New FA word candidates in each field are automatically compared with previously determined FA words. Then new FA words are appended to an FA word dictionary. From the experiential results, our new system can automatically appended around 44% of new FA words to the existence FA word Dictionary. Moreover, the concentration ratio 0.9 is also effective for extracting relevant FA words that needed for the system design to build FA words automatically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Field Based Weighting Information Retrieval on Document Field of Ad Hoc Dataset

Dynamic Document Localization for Efficient Mining

An FAQ Search Method Using a Document Classifier Trained with Automatically Generated Training Data

References

Aoe, J., Morita, K., Mochizuki, H.: An Efficient Retrieval Algorithm of Collocate Information Using Tree Structure. Transaction of The IPSJ 39(9), 2563–2571 (1989)
Google Scholar
Atlam, E.-S., Elmarhomy, G., Morita, K., Fuketa, M., Aoe, J.: A New Algorithm for Construction Specific Field Terms Using Co-occurrence Words Information. In: 8th International Conference on Knowledge-Based Intelligent Information & Engineering Systems, Wellington, New Zealand, Part 1, pp. 530–540 (2004)
Google Scholar
Atlam, E.-S., Aoe, J.: A new algorithm for automatic extracting FA word candidates from document corpora. The Interim Report of Tokushima University, 25-27 (2004)
Google Scholar
Atlam, E.-S., Morita, K., Fuketa, M., Aoe, J.: A New Method for Selecting English Compound Terms and its Knowledge Representation. Information Processing & Management Journal 38(6), 807–821 (2002)
Article MATH Google Scholar
Atlam, E.-S., Fuketa, M., Morita, K., Aoe, J.: Document Similarity measurement using Field association terms. Information Processing & Management 39(6), 809–824 (2003)
Article Google Scholar
Callen, J.P.: Passage and level evidence in document retrieval. In: Proc. of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 302–310 (1994)
Google Scholar
Dozawa, T.: Innovative Multi Information Dictionary Imidas 1999. Annual Series. Zueisha Publication Co, Japan (1999) (In Japanese)
Google Scholar
Fuhr, N.: Models for retrieval with probabilistic indexing. Information Processing and Retrieval 25(1), 55–72 (1989)
MathSciNet Google Scholar
Fukumoto, F., Suzuki, Y.: Automatic Clustering of Articles using Dictionary definitions. In: Proceeding of the 16th International Conference on Computional Linguistic (COLING 1996), pp. 406–411 (1996)
Google Scholar
Iwayama, M., Tokunaga, T.: Probabilistic Passage Categorization and Its Application. Journal of Natural language Processing 6(3), 181–198 (1999)
Google Scholar
Kawabe, K., Matsumoto, Y.: Acquisition of normal lexical knowledge based on basic level category. Information Processing Society of Japan, SIG note NL125-9, 87–92 (1998)
Google Scholar
Melucii, M.: Passage Retrieval and a Probabilistic technique. Information Processing and Management 34(1), 43–68 (1998)
Article Google Scholar
Ohkubo, M., Sugizaki, M., Inoue, T., Tanaka, K.: Extracting Information Demand by Analyzing a WWW Search Login. Trans. of Information Processing Society of Japan 39(7), 2250–2258 (1998)
Google Scholar
Salton, G., McGill, M.J.: Introduction of Modern Information Retrieval. McGraw-Hill, New York (1983)
Google Scholar
Tsuji, T., Fuketa, M., Morita, K., Aoe, J.: An Efficient Method of Determining FA Terms of Compound Words. Journal of Natural Language Processing 7(2), 3–26 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima, 770-8506, Japan
Masao Fuketa, El-Sayed Atlam, Elmarhomy Ghada, Kazuhiro Morita & Jun-ichi Aoe

Authors

Masao Fuketa
View author publications
You can also search for this author in PubMed Google Scholar
El-Sayed Atlam
View author publications
You can also search for this author in PubMed Google Scholar
Elmarhomy Ghada
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiro Morita
View author publications
You can also search for this author in PubMed Google Scholar
Jun-ichi Aoe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Design, Engineering and Computing, Bournemouth University, UK
Bogdan Gabrys
Centre for SMART Systems, School of Environment and Technology, University of Brighton, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, SA, 5095, Mawson Lakes, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fuketa, M., Atlam, ES., Ghada, E., Morita, K., Aoe, Ji. (2006). Building New Field Association Term Candidates Automatically by Search Engine. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893004_42

Download citation

DOI: https://doi.org/10.1007/11893004_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46537-9
Online ISBN: 978-3-540-46539-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Building New Field Association Term Candidates Automatically by Search Engine

Abstract

Access this chapter

Preview

Similar content being viewed by others

Field Based Weighting Information Retrieval on Document Field of Ad Hoc Dataset

Dynamic Document Localization for Efficient Mining

An FAQ Search Method Using a Document Classifier Trained with Automatically Generated Training Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Building New Field Association Term Candidates Automatically by Search Engine

Abstract

Access this chapter

Preview

Similar content being viewed by others

Field Based Weighting Information Retrieval on Document Field of Ad Hoc Dataset

Dynamic Document Localization for Efficient Mining

An FAQ Search Method Using a Document Classifier Trained with Automatically Generated Training Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation