Skip to main content

Specific-Purpose Web Searches on the Basis of Structure and Contents

  • Conference paper
Federation over the Web

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3847))

  • 269 Accesses

Abstract

We introduce methods for two specific-purpose Web searches. One is a search for Web communities related to given keywords, and the other is a search for texts having a certain relation to given keywords. Our methods are based on both structure and contents of WWW. Our method of Web community search uses global structure of WWW to discover communities, and uses content information to label found communities, where global structure means Web graph composed of Web pages and hyperlinks between them. On the other hand, our method of related text search uses local structure of WWW to extract candidate texts, and uses content information to filter out wrongly extracted ones, where local structure means DOM-tree structure of each page. We report the latest results on these Web search methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: First algorithms for mining association rules. In: Proc. 20th Int’l Conf. on VLDB, pp. 487–499 (1994)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. 11th Int’l Conf. on Data Eng., pp. 3–14 (1995)

    Google Scholar 

  3. Baeza-Yates, R., Ribriro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)

    Google Scholar 

  4. Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in html documents. In: Proc. of 11th Int’l World Wide Web Conf., pp. 232–241 (2002)

    Google Scholar 

  5. Flake, G., Lawrence, S., Giles, C.: Efficient identification of web communities. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–160 (2000)

    Google Scholar 

  6. Flake, G., Tarjan, R., Tsioutsiouliklis, K.: Graph clustering and mining cut trees. Internet Mathematics 1(3), 355–378 (2004)

    MathSciNet  Google Scholar 

  7. Girvan, M., Newman, M.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  8. Hasagawa, H., Kudo, M., Nakamura, A.: Empirical study on usefulness of algorithm sacwrapper for reputation extraction from the www. In: Proceedings of the 9th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (2005) (to appear)

    Google Scholar 

  9. Hasagawa, H., Kudo, M., Nakamura, A.: Reputation extraction using both structural and content information. Technical Report TCS-TR-A-05-2, Division of Computer Science, Hokkaido university (2005), http://www-alg.ist.hokudai.ac.jp/tra.html

  10. Ikeda, D., Yamada, Y., Hirokawa, S.: Expressive power of tree and string based wrappers. In: Proc. of IJCAI-03 Workshop on Information Integration on the Web (IIWeb 2003), pp. 21–26 (2003)

    Google Scholar 

  11. Ino, H., Kudo, M., Nakamura, A.: Partitioning of web graphs by community topology. In: Proceedings of WWW 2005, pp. 661–669 (2005)

    Google Scholar 

  12. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  13. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Computer Networks 31(11-16), 1481–1493 (1999)

    Article  Google Scholar 

  14. Kushmerick, N.: Wrapper induction:efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  15. Mitton, R.: A description of a computer-usable dictionary file based on the oxford advanced learner’s dictionary of current english (June 1992), Downloaded from, ftp://sable.ox.ac.uk/pub/ota/public/dicts/710/

  16. Murakami, Y., Sakamoto, H., Arimura, H., Arikawa, S.: Extracting text data from html documents. The Information Processing Society of Japan (IPSJ) Transactions on Mathematical Modeling and its Applications (TOM) 42(SIG 14(TOM 5)), 39–49 (2001) (In Japanese)

    Google Scholar 

  17. Nakamura, A., Shigezumi, T., Yamamoto, M.: On nk-community problem. In: Proceedings of the Winter LA Symposium, pp. 12.1–12.8 (2005)

    Google Scholar 

  18. Sugibuchi, T., Tanaka, Y.: Interactive web-wrapper construction for extracting relational information from web documents. In: Proceedings of WWW 2005, pp. 968–969 (2005)

    Google Scholar 

  19. Tarjan, R.: Data Structure and Network Algorithm. Society for Industrial and Applied Mathematics (1983)

    Google Scholar 

  20. Tateishi, K., Ishiguro, Y., Fukushima, T.: A reputation search engine that collects people’s opinions by information extraction technology. The Information Processing Society of Japan (IPSJ) Transactions on Databases (TOD) 45(SIG 07) (2004) (In Japanese)

    Google Scholar 

  21. Uno, T., Asai, T., Uchida, Y., Arimura, H.: Efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of FIMI 2004 (2004)

    Google Scholar 

  22. Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. SIGKDD 2002, pp. 71–80 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kudo, M., Nakamura, A. (2006). Specific-Purpose Web Searches on the Basis of Structure and Contents. In: Jantke, K.P., Lunzer, A., Spyratos, N., Tanaka, Y. (eds) Federation over the Web. Lecture Notes in Computer Science(), vol 3847. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11605126_5

Download citation

  • DOI: https://doi.org/10.1007/11605126_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31018-1

  • Online ISBN: 978-3-540-32587-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics