Skip to main content

Connectivity of the Thai Web Graph

  • Conference paper
Progress in WWW Research and Development (APWeb 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4976))

Included in the following conference series:

Abstract

The study of a national Web graph is challenging and can provide insight into social phenomena specific to a country. However, because there is no country border in the Web, deciding whether a web page belongs to that country or not is difficult. In this paper we aim at studying the characteristics of the Thai Web graph. We first address the challenge of gathering Thailand-related web pages from the borderless Web by proposing a set of criteria for defining Thailand-related web pages. Three Thai web snapshots have been collected during July 2004 (18M web pages), January 2007 (550K web pages), and May 2007 (1.4M web pages) respectively. We then analyze and report various statistical properties related to connectivity of the associated Thai Web graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Albert, R., Jeong, H., Barabasi, A.: The diameter of the world wide web. Nature 401, 130 (1999)

    Article  Google Scholar 

  2. Baeza-Yates, R., Castillo, C.: Relating web characteristics with link based web page ranking. In: Proc. of the 8th Int’l Symposium on String Processing and Information Retrieval (SPIRE 2001), pp. 21–32 (2001)

    Google Scholar 

  3. Baeza-Yates, R., Castillo, C., Lopez, V.: Characteristics of the web of spain. International Journal of Scientometrics, Informetrics and Bibliometrics 9(1) (2005)

    Google Scholar 

  4. Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web: Probabilistic Methods and Algorithms. John Wiley & Sons, Ltd., Chichester (2003)

    Google Scholar 

  5. Barabsi, A., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    Article  MathSciNet  Google Scholar 

  6. Barabsi, A., Albert, R., Jeong, H., Bianconi, G.: Power-law distribution of the world wide web. Science 287(5461), 2115 (2000)

    Article  Google Scholar 

  7. Bharat, K., Chang, B.-W., Henzinger, M.R., Ruhl, M.: Who links to whom: Mining linkage between web sites. In: Proc. of the 2001 IEEE Int’l Conf. on Data Mining (ICDM 2001), pp. 51–58 (2001)

    Google Scholar 

  8. Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Structural properties of the african web. In: Poster Proc. of the 11th Int’l Conf. on World Wide Web (WWW 2002) (2002)

    Google Scholar 

  9. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proc. of the 7th Int’l Conf. on World Wide Web (WWW 1998), pp. 107–117 (1998)

    Google Scholar 

  10. Broder, A.Z., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.L.: Graph structure in the web. Computer Networks 33(1–6), 309–320 (2000)

    Article  Google Scholar 

  11. Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific web resource discovery. In: Proc. of the 8th Int’l Conf. on World Wide Web (WWW 1999), pp. 1623–1640 (1999)

    Google Scholar 

  12. Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through url ordering. In: Proc. of the 7th Int’l Conf. on World Wide Web (WWW 1998), pp. 161–172 (1998)

    Google Scholar 

  13. Davison, B.D.: Topical locality in the web. In: Proc. 23rd Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2000), pp. 272–279 (2000)

    Google Scholar 

  14. Dill, S., Kumar, S.R., McCurley, K.S., Rajagopalan, S., Sivakumar, D., Tomkins, A.: Self-similarity in the web. In: Proc. of 27th Int’l Conf. on Very Large Data Bases (VLDB 2001), pp. 69–78 (2001)

    Google Scholar 

  15. Fetterly, D., Manasse, M., Najork, M.: Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In: Proc. of the 7th Int’l Workshop on the Web and Databases (WebDB 2004), pp. 1–6 (2004)

    Google Scholar 

  16. Han, I.K., Lee, S.H., Lee, S.: Graph structure of the korea web. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 930–935. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  18. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. In: Proc. of the 8th Int’l Conf. on World Wide Web (WWW 1999), pp. 1481–1493 (1999)

    Google Scholar 

  19. Liu, G., Yu, Y., Han, J., Xue, G.-R.: China web graph measurements and evolution. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 668–679. Springer, Heidelberg (2005)

    Google Scholar 

  20. Menczer, F., Pant, G., Srinivasan, P.: Topical web crawlers: Evaluating adaptive algorithms. ACM Trans. Inter. Tech. 4(4), 378–419 (2004)

    Article  Google Scholar 

  21. Sanguanpong, S., Piamsa-nga, P., Poovarawan, Y., Warangrit, S.: Measuring and analysis of the thai world wide web. In: Proc. of the Asia Pacific Advance Network conference, pp. 225–330 (2000)

    Google Scholar 

  22. Somboonviwat, K., Tamura, T., Kitsuregawa, M.: Finding thai web pages in foreign web spaces. In: ICDE Workshops, p. 135 (2006)

    Google Scholar 

  23. Tamura, T., Somboonviwat, K., Kitsuregawa, M.: A method for language-specific web crawling and its evaluation. Systems and Computers in Japan 38(2), 10–20 (2007)

    Article  Google Scholar 

  24. Thelwall, M., Wilkinson, D.: Graph structure in three national academic webs: power laws with anomalies. J. Am. Soc. Inf. Sci. Technol. 54(8), 706–712 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Yanchun Zhang Ge Yu Elisa Bertino Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Somboonviwat, K., Suzuki, S., Kitsuregawa, M. (2008). Connectivity of the Thai Web Graph. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds) Progress in WWW Research and Development. APWeb 2008. Lecture Notes in Computer Science, vol 4976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78849-2_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78849-2_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78848-5

  • Online ISBN: 978-3-540-78849-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics