Machine Discovery Based on the Co-occurrence of References in a Search Engine

Murata, Tsuyoshi

doi:10.1007/3-540-46846-3_20

Tsuyoshi Murata³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1721))

Included in the following conference series:

International Conference on Discovery Science

404 Accesses
6 Citations

Abstract

This paper describes a new method of discovering clusters of related Web pages. By clustering Web pages and visualizing them in the form of graph, users can easily access to related pages. Since related Web pages are often referred from the same Web page, the number of co-occurrence of references in a search engine is used for discovering relation among pages. Two URLs are given to a search engine as keywords, and the value of the number of pages searched from both URLs divided by the number of pages searched from either URL, which is called Jaccard coefficient, is calculated as the criteria for evaluating the relation between the two URLs. The value is used for deciding the length of an edge in a graph so that vertices of related pages will be located close to each other. Our system based on the method succeeds in discovering clusters of various genres, although the system does not interpret the contents of the pages. The method of calculating Jaccard coefficient is easily processed by computer systems, and it is suitable for the discovery from the data acquired through the internet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AltaVista Discovery 1.1: http://discovery.altavista.com.
Google Scholar
Balabanovic M., Shoham Y.: Content-Based, Collaborative Recommendation. Comm. of the ACM. 40(3) (1997) 66–72
Article Google Scholar
Craven M., DiPasquo D., Freitag D., McCallum A., Mitchell T., Nigam K., Slattery S.: Learning to Extract Symbolic Knowledge from the World Wide Web. Proc. of AAAI-98. (1998) 509–516.
Google Scholar
Fruchterman T. M. J., Reingold E. M.: Graph Drawing by Force-directed Placement. Software — Practice and Experience. 21(11) (1991) 1129–1164
Article Google Scholar
Howe A. E., Dreilinger D.: SavvySearch — A Metasearch Engine That Learns Which Search Engines to Query. AI Magazine. 18(2) (1997) 19–25
Google Scholar
Inxight: http://www.inxight.com.
Google Scholar
Kauts H., Selman B., Shah M.: The HiddenWeb. AI Magazine. 18(2) (1997) 27–36
Google Scholar
Koedinger K. R: Emergent Properties and Structural Constraints: Advantages of Diagrammatic Representations for Reasoning and Learning. Technical Report of AAAI Spring Symposium, Reasoning with Diagrammatic Representations SS-92-02 (1992) 151–156
Google Scholar
Larkin J. H., Simon H. A.: Why a Diagram is (Sometimes) Worth Ten Thousand Words. Cognitive Science, 11(1) (1987) 65–99
Article Google Scholar
Murata, T., Mizutani, M., Shimura, M.: A Discovery System for Trigonometric Functions. Proc. of the Twelfth National Conf. on Artificial Intelligence(AAAI-94) (1994) 645–650
Google Scholar
Murata, T., Shimura, M.: Machine Discovery Based on Numerical Data Generated in Computer Experiments. Proc. of the Thirteenth National Conf. on Artificial Intelligence(AAAI-96) (1996) 737–742
Google Scholar
Sawai H., Ohwada H., Mizoguchi F.: Incorporating a Navigation Tool into aWWW Browser. Proc. of Discovery Science’ 98 (1998) 453–454
Google Scholar
Shibayama E., Yabe J., Takahashi S., Matsuda M.: Visualizing Semantic Clusters in the Internet Information Space. Proc. of Discovery Science’ 98 (1998) 409–410
Google Scholar
Shiozawa H., Matsushita Y.: WWW visualization giving meanings to interactive manipulations. HCI International’ 97 (1997)
Google Scholar
Sumi Y., Nichimoto K., Mase K.: Facilitating Human Communications in Personalized Information Spaces. working notes of AAAI-96 workshop Internet-Based Information Systems (1996) 123–129
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Gunma University, 1-5-1 Tenjin-cho, Kiryu, Gunma, 376-8515, Japan
Tsuyoshi Murata

Authors

Tsuyoshi Murata
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, Fukuoka, 812-8581, Japan
Setsuo Arikawa
Graduate School of Media and Governance, Keio University, 5322 Endoh, Fujisawa-shi, Kanagawa, 252-8520, Japan
Koichi Furukawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Murata, T. (1999). Machine Discovery Based on the Co-occurrence of References in a Search Engine. In: Arikawa, S., Furukawa, K. (eds) Discovery Science. DS 1999. Lecture Notes in Computer Science(), vol 1721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46846-3_20

Download citation

DOI: https://doi.org/10.1007/3-540-46846-3_20
Published: 22 October 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66713-1
Online ISBN: 978-3-540-46846-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics