ABSTRACT
The blogosphere - the totality of blog-related Web sites - has become a great source of trend analysis in areas such as product survey, customer relationship, and marketing. Existing approaches are based on simple counts, such as the number of entries or the number of links. In this paper, we introduce a novel concept, coined eigen-trend, to represent the temporal trend in a group of blogs with common interests and propose two new techniques for extracting eigen-trends in blogs. First, we propose a trend analysis technique based on the singular value decomposition. Extracted eigen-trends provide new insights into multiple trends on the same keyword. Second, we propose another trend analysis technique based on a higher-order singular value decomposition. This analyzes the blogosphere as a dynamic graph structure and extracts eigen-trends that reflect the structural changes of the blogosphere over time. Experimental studies based on synthetic data sets and a real blog data set show that our new techniques can reveal a lot of interesting trend information and insights in the blogosphere that are not obtainable from traditional count-based methods.
- J. Cho and H. Garcia-Molina. Effective page refresh policies for web crawlers. ACM Tran. on Database Systems, 28(4), 2003. Google ScholarDigital Library
- L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM J. on Matrix Analysis and Applications, 21(4), 2000. Google ScholarDigital Library
- L. De Lathauwer, B. De Moor, and J. Vandewalle. On the best rank-1 and rank-(r1, r2,.., rn) approximation of higher-order tensors. SIAM J. on Matrix Analysis and Applications, 21(4), 2000. Google ScholarDigital Library
- S. Deerwester, S. Dumais, T. Landauer, G. Furnas, and R. Harshman. Indexing by latent semantic analysis. J. American Soc. Info. Sci., 41, 1990.Google Scholar
- F. Douglis, A. Feldmann, and B. Krishnamurthy. Rate of change and other metrics: a live study of the World Wide Web. In Proc. of the USENIX Symposium on Internet Technologies and Systems, 1997. Google ScholarDigital Library
- D. Fetterly, M. Manasse, M. Najork, and J. L. Wiener. A large-scale study of the evolution of web pages. In Proc. of the 12th WWW Conference, 2003. Google ScholarDigital Library
- N. S. Glance, M. Hurst, and T. Tomokiyo. BlogPulse: Automated trend discovery for weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.Google Scholar
- G. Golub and C. V. Loan. Matrix Computations. Johns Hopkins University Press, third edition, 1996.Google Scholar
- D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In Proc. of the 13th WWW Conference, 2004. Google ScholarDigital Library
- I. Jolliffe. Principal Component Analysis. Springer, second edition, 2002.Google Scholar
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. of the ACM, 46(5), 1999. Google ScholarDigital Library
- T. G. Kolda, B. W. Bader, and J. P. Kenny. Higher-order web link analysis using multilinear algebra. In Proc. of the 5th ICDM Conf., 2005. Google ScholarDigital Library
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proc. of the 12th WWW Conference, 2003. Google ScholarDigital Library
- D. Lai. Temporal analysis of the human development indicators: Principal component approach. Social Indicators Research, 51, 2000.Google Scholar
- A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E. D. Kolaczyk, and N. Taft. Structural analysis of network traffic flows. In Proc. of the 2004 SIGMETRICS Conf., 2004. Google ScholarDigital Library
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proc. of the 11th ACM SIGKDD Conference, 2005. Google ScholarDigital Library
- Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proc. of the 15th WWW Conference, 2006. Google ScholarDigital Library
- B. H. Murray. Sizing the internet. In White paper, Cyveillance, Inc., 2000.Google Scholar
- A. Ntoulas, J. Cho, , and C. Olston. What's new on the Web? the evolution of the web from a search engine perspective. In Proc. of the 13th WWW Conference, 2004. Google ScholarDigital Library
- X. Song, B. L. Tseng, C.-Y. Lin, and M.-T. Sun. ExpertiseNet: Relational and evolutionary expert modeling. In Int. Conf. on User Modeling, 2005. Google ScholarDigital Library
Index Terms
- Eigen-trend: trend analysis in the blogosphere based on singular value decompositions
Recommendations
Identifying the influential bloggers: a modular approach based on sentiment analysis
The social web provides an easy and quick medium for public communication and online social interactions. In the web log, short as a blog, the bloggers share their views in the form of creating and commenting on blog posts. The bloggers who influence ...
Recommending blog articles based on popular event trend analysis
Web 2.0 has become a popular social media on the Internet due to the fast evolution of Internet technologies, as well as increasing resources and users. Among the applications of Web 2.0, blogospheres are a new Internet social media for users to express ...
Subject-based extraction of a latent blog community
In the blogosphere, there exist posts relevant to a particular subject and blogs that show interest in the subject. In this paper, we define a set of such posts and blogs as a blog community and propose a method for extracting the blog community ...
Comments