Abstract
The study of a national Web graph is challenging and can provide insight into social phenomena specific to a country. However, because there is no country border in the Web, deciding whether a web page belongs to that country or not is difficult. In this paper we aim at studying the characteristics of the Thai Web graph. We first address the challenge of gathering Thailand-related web pages from the borderless Web by proposing a set of criteria for defining Thailand-related web pages. Three Thai web snapshots have been collected during July 2004 (18M web pages), January 2007 (550K web pages), and May 2007 (1.4M web pages) respectively. We then analyze and report various statistical properties related to connectivity of the associated Thai Web graphs.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Albert, R., Jeong, H., Barabasi, A.: The diameter of the world wide web. Nature 401, 130 (1999)
Baeza-Yates, R., Castillo, C.: Relating web characteristics with link based web page ranking. In: Proc. of the 8th Int’l Symposium on String Processing and Information Retrieval (SPIRE 2001), pp. 21–32 (2001)
Baeza-Yates, R., Castillo, C., Lopez, V.: Characteristics of the web of spain. International Journal of Scientometrics, Informetrics and Bibliometrics 9(1) (2005)
Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web: Probabilistic Methods and Algorithms. John Wiley & Sons, Ltd., Chichester (2003)
Barabsi, A., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Barabsi, A., Albert, R., Jeong, H., Bianconi, G.: Power-law distribution of the world wide web. Science 287(5461), 2115 (2000)
Bharat, K., Chang, B.-W., Henzinger, M.R., Ruhl, M.: Who links to whom: Mining linkage between web sites. In: Proc. of the 2001 IEEE Int’l Conf. on Data Mining (ICDM 2001), pp. 51–58 (2001)
Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Structural properties of the african web. In: Poster Proc. of the 11th Int’l Conf. on World Wide Web (WWW 2002) (2002)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proc. of the 7th Int’l Conf. on World Wide Web (WWW 1998), pp. 107–117 (1998)
Broder, A.Z., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.L.: Graph structure in the web. Computer Networks 33(1–6), 309–320 (2000)
Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific web resource discovery. In: Proc. of the 8th Int’l Conf. on World Wide Web (WWW 1999), pp. 1623–1640 (1999)
Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through url ordering. In: Proc. of the 7th Int’l Conf. on World Wide Web (WWW 1998), pp. 161–172 (1998)
Davison, B.D.: Topical locality in the web. In: Proc. 23rd Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2000), pp. 272–279 (2000)
Dill, S., Kumar, S.R., McCurley, K.S., Rajagopalan, S., Sivakumar, D., Tomkins, A.: Self-similarity in the web. In: Proc. of 27th Int’l Conf. on Very Large Data Bases (VLDB 2001), pp. 69–78 (2001)
Fetterly, D., Manasse, M., Najork, M.: Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In: Proc. of the 7th Int’l Workshop on the Web and Databases (WebDB 2004), pp. 1–6 (2004)
Han, I.K., Lee, S.H., Lee, S.: Graph structure of the korea web. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 930–935. Springer, Heidelberg (2007)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. In: Proc. of the 8th Int’l Conf. on World Wide Web (WWW 1999), pp. 1481–1493 (1999)
Liu, G., Yu, Y., Han, J., Xue, G.-R.: China web graph measurements and evolution. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 668–679. Springer, Heidelberg (2005)
Menczer, F., Pant, G., Srinivasan, P.: Topical web crawlers: Evaluating adaptive algorithms. ACM Trans. Inter. Tech. 4(4), 378–419 (2004)
Sanguanpong, S., Piamsa-nga, P., Poovarawan, Y., Warangrit, S.: Measuring and analysis of the thai world wide web. In: Proc. of the Asia Pacific Advance Network conference, pp. 225–330 (2000)
Somboonviwat, K., Tamura, T., Kitsuregawa, M.: Finding thai web pages in foreign web spaces. In: ICDE Workshops, p. 135 (2006)
Tamura, T., Somboonviwat, K., Kitsuregawa, M.: A method for language-specific web crawling and its evaluation. Systems and Computers in Japan 38(2), 10–20 (2007)
Thelwall, M., Wilkinson, D.: Graph structure in three national academic webs: power laws with anomalies. J. Am. Soc. Inf. Sci. Technol. 54(8), 706–712 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Somboonviwat, K., Suzuki, S., Kitsuregawa, M. (2008). Connectivity of the Thai Web Graph. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds) Progress in WWW Research and Development. APWeb 2008. Lecture Notes in Computer Science, vol 4976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78849-2_61
Download citation
DOI: https://doi.org/10.1007/978-3-540-78849-2_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78848-5
Online ISBN: 978-3-540-78849-2
eBook Packages: Computer ScienceComputer Science (R0)