ABSTRACT
Weibo is the premier microblog service in China, which is nicknamed as the "Chinese Twitter". Weibo messages consist of text messages, short links, images, audio and video. Its text is restricted to 140 Chinese characters. Since Twitter is blocked in the mainland of China, Weibo is the dominant microblog service in China. The dominance of Weibo in China makes it an obvious choice for capturing late breaking news. This paper describes the implementation of a system for capturing messages corresponding to late breaking news as well as a visualization tool that can display Weibo news messages on a map interface. There are several technical challenges to building this system. First, methods to automatically recognize and disambiguate geographical locations in messages written in Chinese. Second, due to the lack of a free accessible real-time streaming API as that similar to the Twitter Public Streaming API, a new strategy to collect the most recent news-related Weibo messages is devised. The system also uses news from Chinese news RSS feeds as complementary sources.
- I. Bensalem and M. K, Kholladi, Toponym disambiguation by arborescent relationships. Journal of Computer Science, 6(6), 653--659. 2010.Google ScholarCross Ref
- P. C. Chang, M. Galley and C. D. Manning. Optimizing Chinese Word Segmentation for Machine Translation Performance. In StatMT. pp. 224--232, June 2008. Google ScholarDigital Library
- China Internet Network Information Center. 34th Statistical Report on the Internet Development in China. Retrieved Sept. 2014 from http://www1.cnnic.cn/IDR/.Google Scholar
- CRFF++ http://sourceforge.jp/projects/sfnet_crfpp/. Retrieved Oct 10th, 2014Google Scholar
- C. Esperanca and H. Samet. Experience with SAND/Tcl: a scripting tool for spatial databases. Journal of Visual Languages and Computing, 13(2):229--255, Apr. 2002. Google ScholarDigital Library
- N. Gramsky and H. Samet. Seeder finder - identifying additional needles in the Twitter haystack. In LBSN, pp. 44--53, Nov. 2013. Google ScholarDigital Library
- S. T Huang,. 中国县级以上行政区划专名重名一览. 中国 方域- 行政区划与地名. (Translation: Chinese special administrative divisions above the county level were duplicate names list) 1997, volume 1, pp. 8--9.Google Scholar
- A. Jackoway, H. Samet, and J. Sankaranarayanan. Identification of live news events using Twitter. In LBSN, pp. 25--32, Nov. 2011. Google ScholarDigital Library
- B. Li and F. Fang, Single Chinese Character Country Name Recognition. Computer Engineering and Applications. 28 167--169, Oct. 2006.Google Scholar
- M. D. Lieberman and H. Samet. Multifaceted toponym recognition for streaming news. In SIGIR'11, pp. 843--852, July 2011. Google ScholarDigital Library
- M. D. Lieberman and H. Samet. Adaptive context features for toponym resolution in streaming news. In SIGIR'12, pp. 731--740, Aug. 2012. Google ScholarDigital Library
- M. D. Lieberman and H. Samet. Supporting rapid processing and interactive map-based exploration of streaming news. GIS, pp. 179--188, Nov. 2012. Google ScholarDigital Library
- M. D. Lieberman, H. Samet, and J. Sankaranarayanan. Geotagging with local lexicons to build indexes for textually-specified spatial data. In ICDE, pp. 201--212, Mar. 2010.Google ScholarCross Ref
- Z. D. Qu, 从汉英篇幅差异比较汉英语的信息密度 (translation: Compare Information Density of Chinese and English by Textural Length). Journal of Foreign Languages. 3, 23-26, May 1998.Google Scholar
- G. Quercini, H. Samet, J. Sankaranarayanan, and M. D. Lieberman. Determining the spatial reader scopes of news sources using local lexicons. In GIS, pp. 43--52, Nov. 2010. Google ScholarDigital Library
- H. Samet, M. D. Adelfio, B. C. Fruin, M. D. Lieberman, and B. E. Teitler. Porting a web-based mapping application to a smartphone app. In GIS pp. 525--528, Nov. 2011. Google ScholarDigital Library
- H. Samet, H. Alborzi, F. Brabec, C. Esperanca, G. R. Hjaltason, F. Morgan, and E. Tanin. Use of the SAND spatial browser for digital government applications. CACM, 46(1):63--66, Jan. 2003. Google ScholarDigital Library
- H. Samet, A. Rosenfeld, C. A. Shaffer, and R. E. Webber. A geographic information system using quadtrees. Pattern Recognition, 17(6):647--656, Nov/Dec 1984.Google ScholarCross Ref
- H. Samet, J. Sankaranarayanan, M. D. Lieberman, M. D. Adelfio, B. C. Fruin, J. M. Lotkowski, D. Panozzo, J. Sperling, and B. E. Teitler. Reading news with maps by exploiting spatial synonyms. CACM, 57(10):64--77, Oct. 2014. Google ScholarDigital Library
- H. Samet, B. E. Teitler, M. D. Adelfio, and M. D. Lieberman. Adapting a map query interface for a gesturing touch screen interface. In WWW, pp. 257--260, Mar.-Apr. 2011. Google ScholarDigital Library
- J. Sankaranarayanan, H. Samet, B. Teitler, M. D. Lieberman, and J. Sperling. TwitterStand: News in tweets. In GIS, pp. 42--51, Nov. 2009. Google ScholarDigital Library
- State Council of China. 地名管理条例 (Translated: Toponym Management Regulations of China), 2009.Google Scholar
- B. Teitler, M. D. Lieberman, D. Panozzo, J. Sankaranarayanan, H. Samet, and J. Sperling. NewsStand: A new view on news. In GIS, pp. 144--153, Nov. 2008. Google ScholarDigital Library
- H. Tseng, P. C. Chang, G. Andrew, D. Jurafsky, and C. Manning. A Conditional Random Field Word Segmenter for Sighan Bakeoff In SIGHAN Workshop on Chinese Language, pp. 68--171, Oct. 2005Google Scholar
- Weibo API. http://open.weibo.com/wiki/API. Retrieved Dec. 2013.Google Scholar
Index Terms
- WeiboStand: capturing Chinese breaking news using Weibo "tweets"
Recommendations
TwitterStand: news in tweets
GIS '09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsTwitter is an electronic medium that allows a large user populace to communicate with each other simultaneously. Inherent to Twitter is an asymmetrical relationship between friends and followers that provides an interesting social network like structure ...
Extraction of commentary tweets about news articles
iiWAS '17: Proceedings of the 19th International Conference on Information Integration and Web-based Applications & ServicesOn Twitter, vast numbers of tweets have been written about news articles. These tweets include not only opinions and sentiments, but also comments related to the news articles. However, tweets that include comments about news article are believed by ...
Topical differences between Chinese language Twitter and Sina Weibo
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide WebSina Weibo, China's most popular microblogging platform, is considered to be a proxy of Chinese social life. In this study, we contrast the discussions occurring on Sina Weibo and on Chinese language Twitter in order to observe two different strands of ...
Comments