skip to main content
10.1145/2755492.2755499acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

WeiboStand: capturing Chinese breaking news using Weibo "tweets"

Published:04 November 2014Publication History

ABSTRACT

Weibo is the premier microblog service in China, which is nicknamed as the "Chinese Twitter". Weibo messages consist of text messages, short links, images, audio and video. Its text is restricted to 140 Chinese characters. Since Twitter is blocked in the mainland of China, Weibo is the dominant microblog service in China. The dominance of Weibo in China makes it an obvious choice for capturing late breaking news. This paper describes the implementation of a system for capturing messages corresponding to late breaking news as well as a visualization tool that can display Weibo news messages on a map interface. There are several technical challenges to building this system. First, methods to automatically recognize and disambiguate geographical locations in messages written in Chinese. Second, due to the lack of a free accessible real-time streaming API as that similar to the Twitter Public Streaming API, a new strategy to collect the most recent news-related Weibo messages is devised. The system also uses news from Chinese news RSS feeds as complementary sources.

References

  1. I. Bensalem and M. K, Kholladi, Toponym disambiguation by arborescent relationships. Journal of Computer Science, 6(6), 653--659. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  2. P. C. Chang, M. Galley and C. D. Manning. Optimizing Chinese Word Segmentation for Machine Translation Performance. In StatMT. pp. 224--232, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. China Internet Network Information Center. 34th Statistical Report on the Internet Development in China. Retrieved Sept. 2014 from http://www1.cnnic.cn/IDR/.Google ScholarGoogle Scholar
  4. CRFF++ http://sourceforge.jp/projects/sfnet_crfpp/. Retrieved Oct 10th, 2014Google ScholarGoogle Scholar
  5. C. Esperanca and H. Samet. Experience with SAND/Tcl: a scripting tool for spatial databases. Journal of Visual Languages and Computing, 13(2):229--255, Apr. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Gramsky and H. Samet. Seeder finder - identifying additional needles in the Twitter haystack. In LBSN, pp. 44--53, Nov. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. T Huang,. 中国县级以上行政区划专名重名一览. 中国 方域- 行政区划与地名. (Translation: Chinese special administrative divisions above the county level were duplicate names list) 1997, volume 1, pp. 8--9.Google ScholarGoogle Scholar
  8. A. Jackoway, H. Samet, and J. Sankaranarayanan. Identification of live news events using Twitter. In LBSN, pp. 25--32, Nov. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Li and F. Fang, Single Chinese Character Country Name Recognition. Computer Engineering and Applications. 28 167--169, Oct. 2006.Google ScholarGoogle Scholar
  10. M. D. Lieberman and H. Samet. Multifaceted toponym recognition for streaming news. In SIGIR'11, pp. 843--852, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. D. Lieberman and H. Samet. Adaptive context features for toponym resolution in streaming news. In SIGIR'12, pp. 731--740, Aug. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. D. Lieberman and H. Samet. Supporting rapid processing and interactive map-based exploration of streaming news. GIS, pp. 179--188, Nov. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. D. Lieberman, H. Samet, and J. Sankaranarayanan. Geotagging with local lexicons to build indexes for textually-specified spatial data. In ICDE, pp. 201--212, Mar. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  14. Z. D. Qu, 从汉英篇幅差异比较汉英语的信息密度 (translation: Compare Information Density of Chinese and English by Textural Length). Journal of Foreign Languages. 3, 23-26, May 1998.Google ScholarGoogle Scholar
  15. G. Quercini, H. Samet, J. Sankaranarayanan, and M. D. Lieberman. Determining the spatial reader scopes of news sources using local lexicons. In GIS, pp. 43--52, Nov. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Samet, M. D. Adelfio, B. C. Fruin, M. D. Lieberman, and B. E. Teitler. Porting a web-based mapping application to a smartphone app. In GIS pp. 525--528, Nov. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Samet, H. Alborzi, F. Brabec, C. Esperanca, G. R. Hjaltason, F. Morgan, and E. Tanin. Use of the SAND spatial browser for digital government applications. CACM, 46(1):63--66, Jan. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Samet, A. Rosenfeld, C. A. Shaffer, and R. E. Webber. A geographic information system using quadtrees. Pattern Recognition, 17(6):647--656, Nov/Dec 1984.Google ScholarGoogle ScholarCross RefCross Ref
  19. H. Samet, J. Sankaranarayanan, M. D. Lieberman, M. D. Adelfio, B. C. Fruin, J. M. Lotkowski, D. Panozzo, J. Sperling, and B. E. Teitler. Reading news with maps by exploiting spatial synonyms. CACM, 57(10):64--77, Oct. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Samet, B. E. Teitler, M. D. Adelfio, and M. D. Lieberman. Adapting a map query interface for a gesturing touch screen interface. In WWW, pp. 257--260, Mar.-Apr. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Sankaranarayanan, H. Samet, B. Teitler, M. D. Lieberman, and J. Sperling. TwitterStand: News in tweets. In GIS, pp. 42--51, Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. State Council of China. 地名管理条例 (Translated: Toponym Management Regulations of China), 2009.Google ScholarGoogle Scholar
  23. B. Teitler, M. D. Lieberman, D. Panozzo, J. Sankaranarayanan, H. Samet, and J. Sperling. NewsStand: A new view on news. In GIS, pp. 144--153, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Tseng, P. C. Chang, G. Andrew, D. Jurafsky, and C. Manning. A Conditional Random Field Word Segmenter for Sighan Bakeoff In SIGHAN Workshop on Chinese Language, pp. 68--171, Oct. 2005Google ScholarGoogle Scholar
  25. Weibo API. http://open.weibo.com/wiki/API. Retrieved Dec. 2013.Google ScholarGoogle Scholar

Index Terms

  1. WeiboStand: capturing Chinese breaking news using Weibo "tweets"

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        LBSN '14: Proceedings of the 7th ACM SIGSPATIAL International Workshop on Location-Based Social Networks
        November 2014
        53 pages
        ISBN:9781450331401
        DOI:10.1145/2755492

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 November 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate8of15submissions,53%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader