Abstract
The structure of linked documents is dynamic and keeps on changing. Even though different methods have been proposed to exploit the link structure in identifying hubs and authorities in a set of linked documents, no existing approach can effectively deal with its changing situation. This paper explores changes in linked documents and proposes an incremental link probabilistic framework, which we call IPHITS. The model deals with online document streams in a faster, scalable way and uses a novel link updating technique that can cope with dynamic changes. Experimental results on two different sources of online information demonstrate the time saving strength of our method. Besides, we make analysis of the stable rankings under small perturbations to the linkage patterns.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bharat, K., Henzinger, M.R.: Improved algorithms for topic distillation in a hyperlinked environment. In: 21st annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 104–111 (1998)
Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link Analysis Ranking: Algorithms, Theory, and Experiments. ACM Transactions on Internet Technology 5(1), 231–297 (2005)
Brandes, U.: A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25(2), 163–177 (2001)
Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., Rajagopalan, S.: Automatic resource list compilation by analyzing hyperlink structure and associated text. In: 7th International World Wide Web Conference, Brisbane, Austrilia, pp. 65–74 (1998)
Chou, T.C., Chen, M.C.: Using incremental PLSA for threshold resilient online event analysis. IEEE Trans. Knowledge and Data Engineering 20(3), 289–299 (2008)
Cohn, D., Chang, H.: Learning to probabilistically identify authoritative documents. In: 7th International Conference on Machine Learning, Austin, Texas, pp. 167–174 (2000)
Cohn, D., Hofmann, T.: The missing link - a probabilistic model of document content and hypertext connectivity. Neural Information Processing Systems 13 (2001)
Ding, C., He, X., Husbands, P., Zha, H., Simon, H.D.: PageRank, HITS and a unified framework for link analysis. In: 25th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 353–354 (2002)
Doan, A., Domingos, P., Halevy, A.Y.: Learning to match the schemas of data sources: A multistrategy approach. Machine Learning 50(3), 279–301 (2003)
Getoor, L., Diehl, C.P.: Link mining: a survey. ACM SIGKDD Explorations Newsletter 7(2), 2–12 (2005)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Maching Learning 42(1), 177–196 (2001)
Jeh, G., Widom, J.: Scaling personalized web search. In: 12th International World Wide Web Conference, Budapest, Hungary, pp. 271–279 (2003)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Madadhain, J.O’., Hutchins, J., Smyth, P.: Prediction and ranking algorithms for even-based network data. SIGKDD Explorations 7(2) (2005)
Madadhain, J.O’., Smyth, P.: EventRank: A framework for ranking time-varying networks. In: 3rd KDD Workshop on Link Discovery LinkKDD, Issues, Approaches and Applications, Chicago, Illinois, pp. 9–16 (2005)
Ng, A.Y., Zheng, A.X., Jordan, M.I.: Link analysis, eigenvectors and stability. In: 17th International Joint Conference on Artificial Intelligence, Seattle, USA, pp. 903–910 (2001)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford University (1998)
Richardson, M., Domingos, P.: The intelligent surfer: probabilistic combination of link and content information in PageRank. Advances Neural Information Processing Systems 14 (2002)
Richardson, M., Prakash, A., Brill, E.: Beyond PageRank: machine learning for static ranking. In: 15th International World Wide Web Conference, Edinburth, Scotland, pp. 707–715 (2006)
Seeley, J.: The net of reciprocal influence: A problem in treating sociometric data. Canadian Journal of Psychology 3, 234–240 (1949)
Wu, F., Huberman, B.: Discovering communities in linear time: A physics approach. Europhysics Letters 38, 331–338 (2004)
Xu, G.: Building implicit links from content for forum search. In: 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington, pp. 300–207 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ma, H., Zhao, W., Li, Z., Shi, Z. (2009). IPHITS: An Incremental Latent Topic Model for Link Structure. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-04769-5_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04768-8
Online ISBN: 978-3-642-04769-5
eBook Packages: Computer ScienceComputer Science (R0)