Specific-Purpose Web Searches on the Basis of Structure and Contents

Kudo, Mineichi; Nakamura, Atsuyoshi

doi:10.1007/11605126_5

Mineichi Kudo²² &
Atsuyoshi Nakamura²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3847))

278 Accesses

Abstract

We introduce methods for two specific-purpose Web searches. One is a search for Web communities related to given keywords, and the other is a search for texts having a certain relation to given keywords. Our methods are based on both structure and contents of WWW. Our method of Web community search uses global structure of WWW to discover communities, and uses content information to label found communities, where global structure means Web graph composed of Web pages and hyperlinks between them. On the other hand, our method of related text search uses local structure of WWW to extract candidate texts, and uses content information to filter out wrongly extracted ones, where local structure means DOM-tree structure of each page. We report the latest results on these Web search methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

State-of-the-Art Survey on Web Search

The Journey is the Reward - Towards New Paradigms in Web Search

Comparing Topic Coverage in Breadth-First and Depth-First Crawls Using Anchor Texts

References

Agrawal, R., Srikant, R.: First algorithms for mining association rules. In: Proc. 20th Int’l Conf. on VLDB, pp. 487–499 (1994)
Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. 11th Int’l Conf. on Data Eng., pp. 3–14 (1995)
Google Scholar
Baeza-Yates, R., Ribriro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Google Scholar
Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in html documents. In: Proc. of 11th Int’l World Wide Web Conf., pp. 232–241 (2002)
Google Scholar
Flake, G., Lawrence, S., Giles, C.: Efficient identification of web communities. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–160 (2000)
Google Scholar
Flake, G., Tarjan, R., Tsioutsiouliklis, K.: Graph clustering and mining cut trees. Internet Mathematics 1(3), 355–378 (2004)
MathSciNet Google Scholar
Girvan, M., Newman, M.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002)
Article MATH MathSciNet Google Scholar
Hasagawa, H., Kudo, M., Nakamura, A.: Empirical study on usefulness of algorithm sacwrapper for reputation extraction from the www. In: Proceedings of the 9th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (2005) (to appear)
Google Scholar
Hasagawa, H., Kudo, M., Nakamura, A.: Reputation extraction using both structural and content information. Technical Report TCS-TR-A-05-2, Division of Computer Science, Hokkaido university (2005), http://www-alg.ist.hokudai.ac.jp/tra.html
Ikeda, D., Yamada, Y., Hirokawa, S.: Expressive power of tree and string based wrappers. In: Proc. of IJCAI-03 Workshop on Information Integration on the Web (IIWeb 2003), pp. 21–26 (2003)
Google Scholar
Ino, H., Kudo, M., Nakamura, A.: Partitioning of web graphs by community topology. In: Proceedings of WWW 2005, pp. 661–669 (2005)
Google Scholar
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MATH MathSciNet Google Scholar
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Computer Networks 31(11-16), 1481–1493 (1999)
Article Google Scholar
Kushmerick, N.: Wrapper induction:efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)
Article MATH MathSciNet Google Scholar
Mitton, R.: A description of a computer-usable dictionary file based on the oxford advanced learner’s dictionary of current english (June 1992), Downloaded from, ftp://sable.ox.ac.uk/pub/ota/public/dicts/710/
Murakami, Y., Sakamoto, H., Arimura, H., Arikawa, S.: Extracting text data from html documents. The Information Processing Society of Japan (IPSJ) Transactions on Mathematical Modeling and its Applications (TOM) 42(SIG 14(TOM 5)), 39–49 (2001) (In Japanese)
Google Scholar
Nakamura, A., Shigezumi, T., Yamamoto, M.: On nk-community problem. In: Proceedings of the Winter LA Symposium, pp. 12.1–12.8 (2005)
Google Scholar
Sugibuchi, T., Tanaka, Y.: Interactive web-wrapper construction for extracting relational information from web documents. In: Proceedings of WWW 2005, pp. 968–969 (2005)
Google Scholar
Tarjan, R.: Data Structure and Network Algorithm. Society for Industrial and Applied Mathematics (1983)
Google Scholar
Tateishi, K., Ishiguro, Y., Fukushima, T.: A reputation search engine that collects people’s opinions by information extraction technology. The Information Processing Society of Japan (IPSJ) Transactions on Databases (TOD) 45(SIG 07) (2004) (In Japanese)
Google Scholar
Uno, T., Asai, T., Uchida, Y., Arimura, H.: Efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of FIMI 2004 (2004)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. SIGKDD 2002, pp. 71–80 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, Sapporo, 060-0814, Japan
Mineichi Kudo & Atsuyoshi Nakamura

Authors

Mineichi Kudo
View author publications
You can also search for this author in PubMed Google Scholar
Atsuyoshi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Meme Media Laboratory, Hokkaido University Sapporo, Kita 13, Nishi 8, Kita-ku, 060-8628, Sapporo, Japan
Klaus P. Jantke
Meme Media Laboratory, Hokkaido University, 060-8628, Sapporo, Japan
Aran Lunzer
Laboratoire de Recherche en Informatique, Université Paris-Sud, Orsay Cedex, France
Nicolas Spyratos
Meme Media Laboratory, Hokkaido University, N13 W8, 0608628, Sapporo, Japan
Yuzuru Tanaka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kudo, M., Nakamura, A. (2006). Specific-Purpose Web Searches on the Basis of Structure and Contents. In: Jantke, K.P., Lunzer, A., Spyratos, N., Tanaka, Y. (eds) Federation over the Web. Lecture Notes in Computer Science(), vol 3847. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11605126_5

Download citation

DOI: https://doi.org/10.1007/11605126_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31018-1
Online ISBN: 978-3-540-32587-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics