research-article

Blog topic analysis using TF smoothing and LDA

Authors:
Sungwoo Lee

Sungkyunkwan University, Suwon, Korea

Sungkyunkwan University, Suwon, Korea
View Profile

,
Jaedong Lee

Sungkyunkwan University, Suwon, Korea

Sungkyunkwan University, Suwon, Korea
View Profile

,
Chang-Yong Park

Sungkyunkwan University, Suwon, Korea

Sungkyunkwan University, Suwon, Korea
View Profile

,
Jee-Hyong Lee

Sungkyunkwan University, Suwon, Korea

Sungkyunkwan University, Suwon, Korea
View Profile

ICUIMC '13: Proceedings of the 7th International Conference on Ubiquitous Information Management and CommunicationJanuary 2013Article No.: 75Pages 1–6https://doi.org/10.1145/2448556.2448631

Published:17 January 2013Publication History

ICUIMC '13: Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication

Pages 1–6

ABSTRACT

In the era of Web 2.0, the number of blogs has explosively increased. With the appearance of social network services, blogs has become the places for sharing professional knowledge and personal branding. So, in order to understand the trends of topics or to analyze the content of blogs, the time sensitive topic extraction and topic change analysis is important and necessary. In the previous studies, most of topic extraction models extracted topic words independently from each time slice and tried to combine those. However, these methods did not show a good performance in analyzing topic trends because the topics extracted from time slices are independent. To cope with this problem, we propose a term frequency smoothing method which weaves time slices so that the more related topics are extracted from each time slice and a better topic trend analysis is generated. In order to extract topics from smoothed term frequencies, LDA, a generative topic model, is adopted. The evaluation of the proposed method on IT blogs shows that it can effectively discover quite meaningful topic patterns and topic words.

References

Aixin Sun, Maggy Anastasia Suryanto, and Ying Liu. 2007. Blog classification using tags: an empirical study. In Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers (ICADL'07). Springer-Verlag, Berlin, Heidelberg, 307--316. Google ScholarDigital Library
Aixin Sun, Ee-Peng Lim, and Wee-Keong Ng. 2002. Web classification using support vector machine. In Proceedings of the 4th international workshop on Web information and data management (WIDM '02). ACM, New York, NY, USA, 96--99. DOI=http://doi.acm.org/10.1145/584931.584952. Google ScholarDigital Library
ChengXiang Zhai, Atulya Velivelli, and Bei Yu. 2004. A cross-collection mixture model for comparative text mining. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '04). ACM, New York, NY, USA, 743--748. DOI=http://doi.acm.org/10.1145/1014052.1014150. Google ScholarDigital Library
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3 (March 2003), 993--1022. Google Scholar
Darren Rowse, Chris Garret. 2008. PROBLOGGER: Selects for Blogging Your Way to a Six-Figure Income. Google ScholarDigital Library
George Forman. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (March 2003), 1289--1305. Google ScholarDigital Library
Goose html parser, https://github.com/jiminoc/goose/wiki.Google Scholar
Google Trends Service, http://www.google.com/trends.Google Scholar
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. 1998. Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA.Google Scholar
James Allan, Ron Papka, and Victor Lavrenko. 1998. Online new event detection and tracking. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '98). ACM, New York, NY, USA, 37--45. DOI=http://doi.acm.org/10.1145/290941.290954 Google ScholarDigital Library
Jon Kleinberg. 2002. Bursty and hierarchical structure in streams. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '02). ACM, New York, NY, USA, 91--101. DOI=http://doi.acm.org/10.1145/775047.775061. Google ScholarDigital Library
Ravi Kumar, Jasmine Novak, Prabhakar Raghavan, and Andrew Tomkins. 2003. On the bursty evolution of blogspace. In Proceedings of the 12th international conference on World Wide Web(WWW '03). ACM, New York, NY, USA, 568--576. DOI=http://doi.acm.org/10.1145/775152.775233 Google ScholarDigital Library
Salton G. and McGill, M. J. 1983. Introduction to modern information retrieval. Google ScholarDigital Library
Technorati, http://www.technorati.com/.Google Scholar
Wikipedia, http://en.wikipedia.org/wiki/Tf%E2%80%93idf.Google Scholar
Xuanhui Wang, ChengXiang Zhai, Xiao Hu, and Richard Sproat. 2007. Mining correlated burstytopic patterns from coordinated text streams. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '07). ACM, New York, NY, USA, 784--793. DOI=http://doi.acm.org/10.1145/1281192.1281276. Google ScholarDigital Library
Yiming Yang, Tom Ault, Thomas Pierce, and Charles W. Lattimer. 2000. Improving text categorization methods for event tracking. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '00). ACM, New York, NY, USA, 65--72. DOI=http://doi.acm.org/10.1145/345508.345550. Google ScholarDigital Library

Index Terms

Blog topic analysis using TF smoothing and LDA
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
    2. Retrieval tasks and goals

Recommendations

Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Read More
LDA-based online topic detection using tensor factorization

In the information retrieval field, effective and efficient extraction of topics from large-scale online text streams is challenging because it is a fully unsupervised learning task without prior knowledge. Most previous studies have focused on how to ...
Read More
Multi-aspect Blog sentiment analysis based on LDA topic model and hownet lexicon
WISM'11: Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II

Blog is an important web2.0 application, which attracts many users to express their subjective reviews about financial events, political events and other objects. Usually a Blog page includes more than one theme. However the existing researches of multi-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICUIMC '13: Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
January 2013
772 pages
ISBN:9781450319584
DOI:10.1145/2448556

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 January 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
LDA
blog text mining
term frequency smoothing
topic trend change
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate251of941submissions,27%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 270
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Blog topic analysis using TF smoothing and LDA

ICUIMC '13: Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication

ABSTRACT

References

Cited By

Index Terms

Recommendations

Research on Multi-document Summarization Based on LDA Topic Model

LDA-based online topic detection using tensor factorization

Multi-aspect Blog sentiment analysis based on LDA topic model and hownet lexicon

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Blog topic analysis using TF smoothing and LDA

ICUIMC '13: Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication

ABSTRACT

References

Cited By

Index Terms

Recommendations

Research on Multi-document Summarization Based on LDA Topic Model

LDA-based online topic detection using tensor factorization

Multi-aspect Blog sentiment analysis based on LDA topic model and hownet lexicon

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media