Abstract
We introduce methods for two specific-purpose Web searches. One is a search for Web communities related to given keywords, and the other is a search for texts having a certain relation to given keywords. Our methods are based on both structure and contents of WWW. Our method of Web community search uses global structure of WWW to discover communities, and uses content information to label found communities, where global structure means Web graph composed of Web pages and hyperlinks between them. On the other hand, our method of related text search uses local structure of WWW to extract candidate texts, and uses content information to filter out wrongly extracted ones, where local structure means DOM-tree structure of each page. We report the latest results on these Web search methods.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: First algorithms for mining association rules. In: Proc. 20th Int’l Conf. on VLDB, pp. 487–499 (1994)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. 11th Int’l Conf. on Data Eng., pp. 3–14 (1995)
Baeza-Yates, R., Ribriro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in html documents. In: Proc. of 11th Int’l World Wide Web Conf., pp. 232–241 (2002)
Flake, G., Lawrence, S., Giles, C.: Efficient identification of web communities. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–160 (2000)
Flake, G., Tarjan, R., Tsioutsiouliklis, K.: Graph clustering and mining cut trees. Internet Mathematics 1(3), 355–378 (2004)
Girvan, M., Newman, M.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002)
Hasagawa, H., Kudo, M., Nakamura, A.: Empirical study on usefulness of algorithm sacwrapper for reputation extraction from the www. In: Proceedings of the 9th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (2005) (to appear)
Hasagawa, H., Kudo, M., Nakamura, A.: Reputation extraction using both structural and content information. Technical Report TCS-TR-A-05-2, Division of Computer Science, Hokkaido university (2005), http://www-alg.ist.hokudai.ac.jp/tra.html
Ikeda, D., Yamada, Y., Hirokawa, S.: Expressive power of tree and string based wrappers. In: Proc. of IJCAI-03 Workshop on Information Integration on the Web (IIWeb 2003), pp. 21–26 (2003)
Ino, H., Kudo, M., Nakamura, A.: Partitioning of web graphs by community topology. In: Proceedings of WWW 2005, pp. 661–669 (2005)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Computer Networks 31(11-16), 1481–1493 (1999)
Kushmerick, N.: Wrapper induction:efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)
Mitton, R.: A description of a computer-usable dictionary file based on the oxford advanced learner’s dictionary of current english (June 1992), Downloaded from, ftp://sable.ox.ac.uk/pub/ota/public/dicts/710/
Murakami, Y., Sakamoto, H., Arimura, H., Arikawa, S.: Extracting text data from html documents. The Information Processing Society of Japan (IPSJ) Transactions on Mathematical Modeling and its Applications (TOM) 42(SIG 14(TOM 5)), 39–49 (2001) (In Japanese)
Nakamura, A., Shigezumi, T., Yamamoto, M.: On nk-community problem. In: Proceedings of the Winter LA Symposium, pp. 12.1–12.8 (2005)
Sugibuchi, T., Tanaka, Y.: Interactive web-wrapper construction for extracting relational information from web documents. In: Proceedings of WWW 2005, pp. 968–969 (2005)
Tarjan, R.: Data Structure and Network Algorithm. Society for Industrial and Applied Mathematics (1983)
Tateishi, K., Ishiguro, Y., Fukushima, T.: A reputation search engine that collects people’s opinions by information extraction technology. The Information Processing Society of Japan (IPSJ) Transactions on Databases (TOD) 45(SIG 07) (2004) (In Japanese)
Uno, T., Asai, T., Uchida, Y., Arimura, H.: Efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of FIMI 2004 (2004)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. SIGKDD 2002, pp. 71–80 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kudo, M., Nakamura, A. (2006). Specific-Purpose Web Searches on the Basis of Structure and Contents. In: Jantke, K.P., Lunzer, A., Spyratos, N., Tanaka, Y. (eds) Federation over the Web. Lecture Notes in Computer Science(), vol 3847. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11605126_5
Download citation
DOI: https://doi.org/10.1007/11605126_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31018-1
Online ISBN: 978-3-540-32587-1
eBook Packages: Computer ScienceComputer Science (R0)