Abstract
As their popularity as dynamic platforms for information dissemination and sharing increases, the use of Weblogs (blogs) which track and comment on real world (political, news, entertainment) events is also growing. The success of the blog as a popular medium for information sharing, on the other hand, is also its weakest spot in that there is little support beyond keyword based searches for blog entries. Consequently, there is impending need for navigational support, which can help users relate a large, diverse, and inherently distributed collection of blogosphere. In this paper, we first note that the existence of large degrees of content overlaps in the form of quotation/commentary pairs (as well as content borrowings across media outlets) can be leveraged for tracking the topic development patterns within the blogosphere. Relying on this observation, we first propose focus and flow analysis techniques that rely on reuse detection and focus and flow to help place blog entries into logical organizations. We then show that these implicit or explicit quotations as well as focus analysis could be leveraged to identify the contexts in which entries occur; thus, resulting in more effective tagging. Thus, we propose CDIP (a collection-driven, yet individuality-preserving tagging system) which relies on relationships provided by quotation/reuse detection and semantic-focus analysis to automatically tag the blogs in such a way that, not-only the related blogs share tags, but also individuality of the entries is preserved for discriminating tag-based accesses.
Similar content being viewed by others
References
All Experts (2008). All Experts homepage. http://www.allexperts.com/central/expert.htm.
dmoz (2009). Open Directory Project. http://www.dmoz.org/.
Sifry, D. (2009). David Sifry’s Blog. http://www.sifry.com/alerts/.
Adar, E., & Adamic, L. A. (2005). Tracking information epidemics in blogspace. In Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence.
Allan, J., Wade, C., & Bolivar, A. (2003). Retrieval and novelty detection at the sentence level. In Proceedings of international ACM SIGIR conference.
Brooks, C. H., & Montanez, N. (2006). Improved annotation of the blogosphere via autotagging and hierarchical clustering. In Proceedings of international conference on World Wide Web.
Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of uncertainty in artificial intelligence.
Kim, J. W., Candan, K. S., & Mehmet Dönderler, E. (2005). Topic segmentation of message hierarchies for indexing and navigation support. In Proceedings of international conference on World Wide Web.
Kim, J. W., & Candan, K. S. (2006). CP/CV: Concept similarity mining without frequency information from domain describing taxonomies. In Proceedings of international conference on information and knowledge management.
Kim, J. W., Candan, K. S., & Tatemura, J. (2007). CDIP: Collection-driven, yet individuality-preserving automated blog tagging. In Proceedings of the international conference on semantic computing.
Kim, J. W., Candan, K. S., & Tatemura, J. (2009). Efficient overlap and content reuse detection in blogs and online news articles. In Proceedings of the 18th international conference on World Wide Web.
Mei, Q., Liu, C., Su, H., & Zhai, C. (2006). A Probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of international conference on World Wide Web.
Metzler, D., Bernstein, Y., Croft, W. B., Moffat, A., & Zobel, J. (2005). Similarity measures for tracking information flow. In Proceedings of international conference on information and knowledge management.
Mishne, G. (2006). AutoTag: A collaborative approach to automated tag assignment for weblog posts. In Proceedings of international conference on World Wide Web.
Nakajima, S., Tatemura, J., Hino, Y., Hara, Y., & Tanaka, K. (2005). Discovering important bloggers based on a blog thread analysis. In Workshop on the Weblogging Ecosystem.
Qi, Y., & Candan, K. S. (2006). CUTS: Curvature-based development pattern analysis and segmentation for blogs and other text streams. In Proceedings of international conference on hypertext and hypermedia series.
Qin, T., Liu, T., Zhang, X., Chen, Z., & Ma, W. (2005). A study of relevance propagation for web search. In Proceedings of international ACM SIGIR conference.
Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1), 17–30.
Resnik, P. (1999). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research (JAIR), 11, 95–130.
Salton, G., Fox, E. A., & Wu, H. (1983). Extended boolean information retrieval. Communications of the ACM, 26(11), 1022–1036.
Shakery, A., & Zhai, C. (2003). Relevance propagation for topic distillation UIUC TREC-2003 web track experiments. In Text retrieval conference.
Song, R., Wen, J. R., Shi, S., Xin, G., Liu, T. Y., Qin, T., et al. (2004). Microsoft research Asia at web track and terabyte track of TREC 2004. In Text retrieval conference.
Tseng, B., Tatemura, J., & Wu, Y. (2005). Tomographic clustering to visualize blog communities as mountain views. In WWW’04 workshop on the weblogging ecosystem.
Sanderson, M., & Zobel, J. (2005). Information retrieval system evaluation: Effort, sensitivity, and reliability. In Proceedings of international ACM SIGIR conference.
Yang, Y., Zhang, J., Carbonell, J., & Jin, C. (2002). Picconditioned novelty detection. In Proceedings of international conference on knowledge discovery and data mining.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been partially supported by the NSF Grant “MAISON: Middleware for Accessible Information Spaces on NSDL”. This is an extended version of a work originally published at the IEEE International Conference on Semantic Computing 2007 [10].
Rights and permissions
About this article
Cite this article
Kim, J.W., Candan, K.S. & Tatemura, J. Organization and Tagging of Blog and News Entries Based on Content Reuse . J Sign Process Syst Sign Image Video Technol 58, 407–421 (2010). https://doi.org/10.1007/s11265-009-0384-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-009-0384-x