skip to main content
10.1145/1183614.1183627acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Mining blog stories using community-based and temporal clustering

Published: 06 November 2006 Publication History

Abstract

In recent years, weblogs, or blogs for short, have become an important form of online content. The personal nature of blogs, online interactions between bloggers, and the temporal nature of blog entries, differentiate blogs from other kinds of Web content. Bloggers interact with each other by linking to each other's posts, thus forming online communities. Within these communities, bloggers engage in discussions of certain issues, through entries in their blogs. Since these discussions are often initiated in response to online or offline events, a discussion typically lasts for a limited time duration. We wish to extract such temporal discussions, or stories, occurring within blogger communities, based on some query keywords. We propose a Content-Community-Time model that can leverage the content of entries, their timestamps, and the community structure of the blogs, to automatically discover stories. Doing so also allows us to discover hot stories. We demonstrate the effectiveness of our model through several case studies using real-world data collected from the blogosphere.

References

[1]
Lada A. Adamic and Natalie Glance. The political blogosphere and the 2004 u.s. election: Divided they blog. Proceedings of KDD Workshop on Link Analysis and Group Detection LinkKDD, 2005.
[2]
E. Adar and L. A. Adamic. Tracking information epidemics in blogspace. In Web Intelligence, 2005.
[3]
D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal on Machine Learning Research, 3:993--1022, 2003.
[4]
Blogger. www.blogger.com.
[5]
Blogpulse. www.blogpulse.com.
[6]
Douglass Cutting, David Karger, Jan Pedersen, and John W. Tukey. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of 15th Annual International ACM SIGIR Conference on Information Retrieval, 1992.
[7]
Natalie Glance, Matthew Hurst, Kamal Nigam, Matthew Siegler, Robert Stockton, and Takashi Tomokiyo. Deriving market intelligence from online discussion. In ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, 2005.
[8]
D. Gruhl, R. V. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. SIGKDD Explorations, 6(2):43--52, December 2004.
[9]
T. Hoffman. Probabalistic latent semantic analysis. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 1999.
[10]
iBoogie. www.iboogie.com.
[11]
K. Ishida. Extracting latent weblog communities: A partitioning algorithm for bipartite graphs. In Proceedings of 2nd Annual Workshop on the Weblogging Ecosystem, 2005.
[12]
X. Jhu, Z. Ghahramani, and J. Lafferty. Time-sensitive dirichlet process mixture models. Technical Report, CMU-CALD-05-104, 2005.
[13]
C. Kemp, T. L. Griffiths, and J. Tenenbaum. Discovering latent classes in relational data. Technical Report, MIT CSAIL, 2004.
[14]
Jon Kleinberg. Bursty and heirarchical structure in streams. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.
[15]
R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th International Conference on World Wide Web (WWW), pages 568--576, 2003.
[16]
Ravi Kumar, Uma Mahadevan, and D. Sivakumar. A graph-theoretic approach to extract storylines from search results. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005.
[17]
S. law, O. Jerzy, and S. Dawid. Lingo: Search results clustering algorithm based on singular value decomposition, 2004.
[18]
LiveJournal. www.livejournal.com.
[19]
Apache Lucene. lucene.apache.org.
[20]
M. Steyvers M. R.-Zvi, T. Griffiths and P. Smyth. The author-topic model for authors and documents. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), volume 21, 2004.
[21]
J. Ma and S. Perkins. Online novelty detection on temporal sequences. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
[22]
A. McCallum, A. Corrada-Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in social networks: Experiments with enron and academic email. Technical Report UM-CS-2004-096, 2004.
[23]
Qiaozhu Mei and ChengXiang Zhai. Discovering evolutionary theme patterns from text - an exploration of temporal text mining. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005.
[24]
K. Nowicki and T. A. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 2001.
[25]
Google Blog Search. blogsearch.google.com.
[26]
Xiaodan Song, Ching-Yung Lin, Belle L. Tseng, and Ming-Ting Sun. Modeling and predicting personal information dissemination behavior. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005.
[27]
Technorati. www.technorati.com.
[28]
B. L. Tseng, J. Tatemura, and Y. Wu. Tomographic clustering to visualize blog communities as mountain views. In Proceedings of 2nd Annual Workshop on the Weblogging Ecosystem, 2005.
[29]
Vivisimo. www.vivisimo.com.
[30]
X. Wang, N. Mohanty, and A. McCallum. Group and topic discovery from relations and text. In Proceedings of KDD Workshop on Link Analysis and Group Detection (LinkKDD), 2005.
[31]
Oren Zamir and Oren Etzioni. Grouper: a dynamic clustering interface to Web search results. Computer Networks (Amsterdam, Netherlands: 1999), 31(11--16):1361--1374, 1999.
[32]
H. Zeng, Q. He, Z. Chen, W. Ma, and J. Ma. Learning to cluster web search results. In Proceedings of 27th Annual ACM SIGIR, 2004.

Cited By

View all
  • (2021)StoryTracker: A Semantic-Oriented Tool for Automatic Tracking Events by Web DocumentsComputational Science and Its Applications – ICCSA 202110.1007/978-3-030-86970-0_10(126-140)Online publication date: 11-Sep-2021
  • (2019)Using blog‐like documents to investigate software practice: Benefits, challenges, and research directionsJournal of Software: Evolution and Process10.1002/smr.2197Online publication date: 29-Aug-2019
  • (2018)Unsupervised keyword extraction from microblog posts via hashtagsJournal of Web Engineering10.5555/3370048.337005317:1-2(93-120)Online publication date: 1-Mar-2018
  • Show More Cited By

Index Terms

  1. Mining blog stories using community-based and temporal clustering

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
    November 2006
    916 pages
    ISBN:1595934332
    DOI:10.1145/1183614
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. online-communities
    2. time-sensitive clustering
    3. weblogs

    Qualifiers

    • Article

    Conference

    CIKM06
    CIKM06: Conference on Information and Knowledge Management
    November 6 - 11, 2006
    Virginia, Arlington, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 21 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)StoryTracker: A Semantic-Oriented Tool for Automatic Tracking Events by Web DocumentsComputational Science and Its Applications – ICCSA 202110.1007/978-3-030-86970-0_10(126-140)Online publication date: 11-Sep-2021
    • (2019)Using blog‐like documents to investigate software practice: Benefits, challenges, and research directionsJournal of Software: Evolution and Process10.1002/smr.2197Online publication date: 29-Aug-2019
    • (2018)Unsupervised keyword extraction from microblog posts via hashtagsJournal of Web Engineering10.5555/3370048.337005317:1-2(93-120)Online publication date: 1-Mar-2018
    • (2016)EventMinerProceedings of the 2016 ACM International Conference on the Theory of Information Retrieval10.1145/2970398.2970411(261-270)Online publication date: 12-Sep-2016
    • (2016)Communities of co-commenting in the Russian LiveJournal and their topical coherenceInternet Research10.1108/IntR-03-2014-007926:3(710-732)Online publication date: 6-Jun-2016
    • (2016)A Proposal of Methods for Extracting Temporal Information of History-Related Web Document Based on Historical Objects Using Machine Learning TechniquesAdvanced Multimedia and Ubiquitous Engineering10.1007/978-981-10-1536-6_38(285-293)Online publication date: 30-Aug-2016
    • (2016)A Framework for Temporal Information Search and ExplorationProceedings of International Conference on ICT for Sustainable Development10.1007/978-981-10-0135-2_38(387-395)Online publication date: 26-Feb-2016
    • (2015)Where to go and what to playJournal of Information Science10.1177/016555151560332341:6(830-854)Online publication date: 1-Dec-2015
    • (2015)Joint photo stream and blog post summarization and exploration2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2015.7298927(3081-3089)Online publication date: Jun-2015
    • (2015)Data VisualizationBlogosphere and its Exploration10.1007/978-3-662-44409-2_13(135-158)Online publication date: 2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media