Skip to main content

Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10177))

Abstract

Many Web applications need the accurate event detection technique on microblog stream. But the accuracy of existing methods is still challenged by microblog’s short length and high noise. We develop a novel category-level transfer learning method TransDetector to deal with the task. TransDetector bases on two facts, that microblog is short but can be enriched by knowledge base semantically with transfer learning; and events can be detected more accurately on microblogs with richer semantics. The following contributions are made in TransDetector. (1) We propose a structure-guided category-level topics extraction method, which exploits the knowledge base’s hierarchical structure to extract categories’ highly correlated topics. (2) We develop a probabilistic model CTrans-LDA for category-level transfer learning, which utilizes the word co-occurrences and transfers the knowledge base’s category-level topics into microblogs. (3) Events are detected accurately on category-level word time series, due to richer semantics and less noise. (4) Experiment verifies the quality of category-level topics extracted from knowledge base, and the further study on the benchmark Edinburgh twitter corpus validates the effectiveness of our proposed transfer learning method for event detection. TransDetector achieves high accuracy, promoting the precision by 9% without sacrificing the recall rate.

This research is supported by the Natural Science Foundation of China (Grant No. 61572043), and the National Key Research and Development Program (Grant No. 2016YFB1000704).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Portal:Current_events.

  2. 2.

    https://en.wikipedia.org/wiki/Category:Main_topic_classifications.

  3. 3.

    https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-categorylinks.sql.gz.

  4. 4.

    https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2.

  5. 5.

    http://demeter.inf.ed.ac.uk/cross/docs/fsd_corpus.tar.gz.

  6. 6.

    https://github.com/AKSW/Palmetto.

  7. 7.

    https://github.com/Falitokiniaina/EDCoW.

  8. 8.

    https://github.com/xiaohuiyan/BurstyBTM.

  9. 9.

    http://demeter.inf.ed.ac.uk/cross/docs/Newswire_Events.tar.gz.

  10. 10.

    https://en.wikipedia.org/wiki/Portal:Current_events.

References

  1. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment in twitter events. J. Am. Soc. Inf. Sci. Technol. 62(2), 406–418 (2011)

    Article  Google Scholar 

  2. Li, R., Lei, K.H., Khadiwala, R., Chang, K.C-C.: TEDAS: a Twitter-based event detection and analysis system. In: ICDE (2012)

    Google Scholar 

  3. Yin, J., Karimi, S., Robinson, B., Cameron, M.A.: ESA: emergency situation awareness via microbloggers. In: CIKM (2012)

    Google Scholar 

  4. Allan, J., Carbonell, J.G., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report (1998)

    Google Scholar 

  5. Atefeh, F., Khreich, W.: A survey of techniques for event detection in twitter. Comput. Intell. 31(1), 132–164 (2015)

    Article  MathSciNet  Google Scholar 

  6. Huang, J., Peng, M., Wang, H., Cao, J., Gao, W., Zhang, X.: A probabilistic method for emerging topic tracking in microblog stream. In: World Wide Web (2016)

    Google Scholar 

  7. Pan, S.J., Yang, Q.: A survey on transfer learning. TKDE 22(10), 1345–1359 (2010)

    Google Scholar 

  8. Petrović, S., Osborne, M., Lavrenko, V.: Using paraphrases for improving first story detection in news and twitter. In: NAACL-HLT (2012)

    Google Scholar 

  9. Klyne, G., Carroll, J.J.: Resource description framework (RDF): concepts and abstract syntax (2006)

    Google Scholar 

  10. Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. In: VLDB (2011)

    Google Scholar 

  11. Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: CIKM (2012)

    Google Scholar 

  12. Dai, W., Xue, G.-R., Yang, Q., Yu, Y.: Transferring naive bayes classifiers for text classification. In: AAAI (2007)

    Google Scholar 

  13. Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, bounds, timelines: UMass and TDT-3. In: Proceedings of Topic Detection and Tracking Workshopp (2000)

    Google Scholar 

  14. Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to twitter. In: HLT-NAACL (2010)

    Google Scholar 

  15. Wurzer, D., Lavrenko, V., Osborne, M.: Twitter-scale new event detection via K-term hashing. In: EMNLP (2015)

    Google Scholar 

  16. Mathioudakis, M., Koudas, N.: TwitterMonitor: trend detection over the twitter stream. In: SIGMOD (2010)

    Google Scholar 

  17. Weng, J., Yao, Y., Leonardi, E., Lee, F.: Event detection in twitter. In: ICWSM (2011)

    Google Scholar 

  18. Diao, Q., Jiang, J., Zhu, F., Lim, E.-P.: Finding bursty topics from microblogs. In: ACL (2012)

    Google Scholar 

  19. Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI (2015)

    Google Scholar 

  20. Yin, H., Cui, B., Lu, H., Huang, Y.: A unified model for stable and temporal topic detection from social media data. In: ICDE (2013)

    Google Scholar 

  21. Huang, W., Chen, W., Zhang, L., Wang, T.: An efficient online event detection method for microblogs via user modeling. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds.) APWeb 2016. LNCS, vol. 9931, pp. 329–341. Springer, Heidelberg (2016). doi:10.1007/978-3-319-45814-4_27

    Chapter  Google Scholar 

  22. Osborne, M., Petrovic, S., McCreadie, R., Macdonald, C., Ounis, I.: Bieber no more: first story detection using twitter and wikipedia. In: SIGIR Workshop on Time-aware Information Access (2012)

    Google Scholar 

  23. Steiner, T., Van Hooland, S., Summers, E.: MJ no more: using concurrent wikipedia edit spikes with social network plausibility checks for breaking news detection. In: WWW (2013)

    Google Scholar 

  24. Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD (2012)

    Google Scholar 

  25. Wang, F., Wang, Z., Li, Z., Wen, J.-R.: Concept-based short text classification and ranking. In: CIKM (2014)

    Google Scholar 

  26. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  27. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge unifying wordnet and wikipedia. In: WWW (2007)

    Google Scholar 

  28. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD (2008)

    Google Scholar 

  29. Kuzey, E., Weikum, G.: EVIN: building a knowledge base of events. In: WWW (2014)

    Google Scholar 

  30. Leban, G., Fortuna, B., Brank, J., Grobelnik, M.: Event registry: learning about world events from news. In: WWW (2014)

    Google Scholar 

  31. Wu, Z., Liang, C., Giles, C.L.: Storybase: towards building a knowledge base for news events. In: ACL-IJCNLP (2015)

    Google Scholar 

  32. Trampuš, M., Novak, B.: Internals of an aggregated web news feed. In: Proceedings of 15th Multiconference on Information Society, pp. 431–434 (2012)

    Google Scholar 

  33. Leetaru, K., Schrodt, P.A.: GDELT: global data on events, location, and tone, 1979–2012. In: ISA Annual Convention, vol. 2. Citeseer (2013)

    Google Scholar 

  34. Xiang, E.W., Cao, B., Hu, D.H., Yang, Q.: Bridging domains using world wide knowledge for transfer learning. TKDE 22, 770–783 (2010)

    Google Scholar 

  35. Faralli, S., Stilo, G., Velardi, P.: Large scale homophily analysis in twitter using a twixonomy. In: IJCAI (2015)

    Google Scholar 

  36. Yan, R., Song, Y., Li, C.-T., Zhang, M., Hu, X.: Opportunities or risks to reduce labor in crowdsourcing translation? Characterizing cost versus quality via a pagerank-HITS hybrid model. In: IJCAI (2015)

    Google Scholar 

  37. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML (1997)

    Google Scholar 

  38. Liu, Y., Loh, H.T., Sun, A.: Imbalanced text classification: a term weighting approach. Expert Syst. Appl. 36(1), 690–701 (2009)

    Article  Google Scholar 

  39. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)

    MATH  Google Scholar 

  40. Wallach, H.M.: Structured topic models for language. Ph.D. thesis, University of Cambridge (2008)

    Google Scholar 

  41. Tang, J., Meng, Z., Nguyen, X., Mei, Q., Zhang, M.: Understanding the limiting factors of topic modeling via posterior contraction analysis. In: ICML (2014)

    Google Scholar 

  42. Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  43. Petrović, S., Osborne, M., McCreadie, R., Macdonald, C., Ounis, I.: Can twitter replace newswire for breaking news? In: ICWSM (2013)

    Google Scholar 

  44. Petrović, S., Osborne, M., Lavrenko, V.: The edinburgh twitter corpus. In: NAACL-HLT (2010)

    Google Scholar 

  45. Yuan, J., Gao, F., Ho, Q., Dai, W., Wei, J., Zheng, X., Xing, E.P., Liu, T.-Y., Ma, W.-Y.: LightLDA: big topic models on modest computer clusters. In: WWW (2015)

    Google Scholar 

  46. Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: WSDM (2015)

    Google Scholar 

Download references

Acknowledgments

Thanks to Dr. Jian Tang and the anonymous reviewers for giving valuable suggestions on this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Huang, W., Wang, T., Chen, W., Wang, Y. (2017). Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10177. Springer, Cham. https://doi.org/10.1007/978-3-319-55753-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55753-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55752-6

  • Online ISBN: 978-3-319-55753-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics