Skip to main content

Implementation of a Web Robot and Statistics on the Korean Web

  • Conference paper
  • First Online:
Web and Communication Technologies and Internet-Related Social Issues — HSI 2003 (HSI 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2713))

Included in the following conference series:

Abstract

A web robot is a program that downloads and stores web pages. Implementation issues of web robots have been studied widely and various web statistics are reported in the literature. First, this paper describes the overall architecture of our robot and the implementation decisions on several important issues. Second, we show empirical statistics on approximately 73 million Korean web pages. We also identify what factors of web pages could affect the page changes. The factors may be used for the selection of web pages to be updated incrementally.

This work was supported by grant number R-01-2000-00403 from the Korea Science and Engineering Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brewington, B., Cybenko, G.: How Dynamic is the Web?. Proc. 9th WWW Conf., Amsterdam (2000) 257–276

    Google Scholar 

  2. Burner, M.: Crawling Towards Eternity: Building an Archive of the World Wide Web. Web Techniques Magazine, Vol. 2. No. 5. (1997) 37–40

    Google Scholar 

  3. Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. Proc. 26th VLDB Conf., Cairo (2000) 200–209

    Google Scholar 

  4. Cho, J., Garcia-Molina, H.: Synchronizing a Database to Improve Freshness. Proc. 26th SIGMOD Conf., Dallas (2000) 117–128

    Google Scholar 

  5. Cho, J., Garcia-Molina, H.: Parallel Crawlers. Proc. 11th WWW Conf., Honolulu (2002) 124–135

    Google Scholar 

  6. Cho, J., Garcia-Molina, H., Page, L.: Efficient Crawling through URL Ordering. Proc. 7th WWW Conf., Brisbane (1998) 161–172

    Google Scholar 

  7. Diligenti, M., Coetzee, F.M., Lawrence, S., Giles, C.L., Gori, M.: Focused Crawling using Context Graphs. Proc. 26th VLDB Conf., Cairo (2000) 527–534

    Google Scholar 

  8. Heydon, A., Najork, M.: Mercator: A Scalable, Extensible Web Crawler. International Journal of WWW, Vol. 2. No. 4. (1999) 219–229

    Google Scholar 

  9. Heydon, A., Najork, M.: Performance Limitations of the Java Core Libraries. Proc. 1st Java Grande Conf., San Francisco (1999) 35–41

    Google Scholar 

  10. Najork, M., Wiener, J.L.: Breadth-first Crawling Yields High-quality Pages. Proc. 10th WWW Conf., Hong Kong (2001) 114–118

    Google Scholar 

  11. Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. Proc. 27th VDLB Conf., Roma (2001) 129–138

    Google Scholar 

  12. Suel, T., Yuan, J.: Compressing the Graph Structure of the Web. Proc. 11th Data Compression Conf., Snowbird (2001) 213–222

    Google Scholar 

  13. Shkapenyuk, V., Suel, T.: Design and Implementation of a High-performance Distributed Web Crawler. Proc. 18th Data Engineering Conf., San Jose (2002) 357–368

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, S.J., Lee, S.H. (2003). Implementation of a Web Robot and Statistics on the Korean Web. In: Chung, CW., Kim, CK., Kim, W., Ling, TW., Song, KH. (eds) Web and Communication Technologies and Internet-Related Social Issues — HSI 2003. HSI 2003. Lecture Notes in Computer Science, vol 2713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45036-X_35

Download citation

  • DOI: https://doi.org/10.1007/3-540-45036-X_35

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40456-9

  • Online ISBN: 978-3-540-45036-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics