Abstract
Many Web applications need the accurate event detection technique on microblog stream. But the accuracy of existing methods is still challenged by microblog’s short length and high noise. We develop a novel category-level transfer learning method TransDetector to deal with the task. TransDetector bases on two facts, that microblog is short but can be enriched by knowledge base semantically with transfer learning; and events can be detected more accurately on microblogs with richer semantics. The following contributions are made in TransDetector. (1) We propose a structure-guided category-level topics extraction method, which exploits the knowledge base’s hierarchical structure to extract categories’ highly correlated topics. (2) We develop a probabilistic model CTrans-LDA for category-level transfer learning, which utilizes the word co-occurrences and transfers the knowledge base’s category-level topics into microblogs. (3) Events are detected accurately on category-level word time series, due to richer semantics and less noise. (4) Experiment verifies the quality of category-level topics extracted from knowledge base, and the further study on the benchmark Edinburgh twitter corpus validates the effectiveness of our proposed transfer learning method for event detection. TransDetector achieves high accuracy, promoting the precision by 9% without sacrificing the recall rate.
This research is supported by the Natural Science Foundation of China (Grant No. 61572043), and the National Key Research and Development Program (Grant No. 2016YFB1000704).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
References
Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment in twitter events. J. Am. Soc. Inf. Sci. Technol. 62(2), 406–418 (2011)
Li, R., Lei, K.H., Khadiwala, R., Chang, K.C-C.: TEDAS: a Twitter-based event detection and analysis system. In: ICDE (2012)
Yin, J., Karimi, S., Robinson, B., Cameron, M.A.: ESA: emergency situation awareness via microbloggers. In: CIKM (2012)
Allan, J., Carbonell, J.G., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report (1998)
Atefeh, F., Khreich, W.: A survey of techniques for event detection in twitter. Comput. Intell. 31(1), 132–164 (2015)
Huang, J., Peng, M., Wang, H., Cao, J., Gao, W., Zhang, X.: A probabilistic method for emerging topic tracking in microblog stream. In: World Wide Web (2016)
Pan, S.J., Yang, Q.: A survey on transfer learning. TKDE 22(10), 1345–1359 (2010)
Petrović, S., Osborne, M., Lavrenko, V.: Using paraphrases for improving first story detection in news and twitter. In: NAACL-HLT (2012)
Klyne, G., Carroll, J.J.: Resource description framework (RDF): concepts and abstract syntax (2006)
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. In: VLDB (2011)
Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: CIKM (2012)
Dai, W., Xue, G.-R., Yang, Q., Yu, Y.: Transferring naive bayes classifiers for text classification. In: AAAI (2007)
Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, bounds, timelines: UMass and TDT-3. In: Proceedings of Topic Detection and Tracking Workshopp (2000)
Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to twitter. In: HLT-NAACL (2010)
Wurzer, D., Lavrenko, V., Osborne, M.: Twitter-scale new event detection via K-term hashing. In: EMNLP (2015)
Mathioudakis, M., Koudas, N.: TwitterMonitor: trend detection over the twitter stream. In: SIGMOD (2010)
Weng, J., Yao, Y., Leonardi, E., Lee, F.: Event detection in twitter. In: ICWSM (2011)
Diao, Q., Jiang, J., Zhu, F., Lim, E.-P.: Finding bursty topics from microblogs. In: ACL (2012)
Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI (2015)
Yin, H., Cui, B., Lu, H., Huang, Y.: A unified model for stable and temporal topic detection from social media data. In: ICDE (2013)
Huang, W., Chen, W., Zhang, L., Wang, T.: An efficient online event detection method for microblogs via user modeling. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds.) APWeb 2016. LNCS, vol. 9931, pp. 329–341. Springer, Heidelberg (2016). doi:10.1007/978-3-319-45814-4_27
Osborne, M., Petrovic, S., McCreadie, R., Macdonald, C., Ounis, I.: Bieber no more: first story detection using twitter and wikipedia. In: SIGIR Workshop on Time-aware Information Access (2012)
Steiner, T., Van Hooland, S., Summers, E.: MJ no more: using concurrent wikipedia edit spikes with social network plausibility checks for breaking news detection. In: WWW (2013)
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD (2012)
Wang, F., Wang, Z., Li, Z., Wen, J.-R.: Concept-based short text classification and ranking. In: CIKM (2014)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_52
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge unifying wordnet and wikipedia. In: WWW (2007)
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD (2008)
Kuzey, E., Weikum, G.: EVIN: building a knowledge base of events. In: WWW (2014)
Leban, G., Fortuna, B., Brank, J., Grobelnik, M.: Event registry: learning about world events from news. In: WWW (2014)
Wu, Z., Liang, C., Giles, C.L.: Storybase: towards building a knowledge base for news events. In: ACL-IJCNLP (2015)
Trampuš, M., Novak, B.: Internals of an aggregated web news feed. In: Proceedings of 15th Multiconference on Information Society, pp. 431–434 (2012)
Leetaru, K., Schrodt, P.A.: GDELT: global data on events, location, and tone, 1979–2012. In: ISA Annual Convention, vol. 2. Citeseer (2013)
Xiang, E.W., Cao, B., Hu, D.H., Yang, Q.: Bridging domains using world wide knowledge for transfer learning. TKDE 22, 770–783 (2010)
Faralli, S., Stilo, G., Velardi, P.: Large scale homophily analysis in twitter using a twixonomy. In: IJCAI (2015)
Yan, R., Song, Y., Li, C.-T., Zhang, M., Hu, X.: Opportunities or risks to reduce labor in crowdsourcing translation? Characterizing cost versus quality via a pagerank-HITS hybrid model. In: IJCAI (2015)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML (1997)
Liu, Y., Loh, H.T., Sun, A.: Imbalanced text classification: a term weighting approach. Expert Syst. Appl. 36(1), 690–701 (2009)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)
Wallach, H.M.: Structured topic models for language. Ph.D. thesis, University of Cambridge (2008)
Tang, J., Meng, Z., Nguyen, X., Mei, Q., Zhang, M.: Understanding the limiting factors of topic modeling via posterior contraction analysis. In: ICML (2014)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Petrović, S., Osborne, M., McCreadie, R., Macdonald, C., Ounis, I.: Can twitter replace newswire for breaking news? In: ICWSM (2013)
Petrović, S., Osborne, M., Lavrenko, V.: The edinburgh twitter corpus. In: NAACL-HLT (2010)
Yuan, J., Gao, F., Ho, Q., Dai, W., Wei, J., Zheng, X., Xing, E.P., Liu, T.-Y., Ma, W.-Y.: LightLDA: big topic models on modest computer clusters. In: WWW (2015)
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: WSDM (2015)
Acknowledgments
Thanks to Dr. Jian Tang and the anonymous reviewers for giving valuable suggestions on this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Huang, W., Wang, T., Chen, W., Wang, Y. (2017). Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10177. Springer, Cham. https://doi.org/10.1007/978-3-319-55753-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-55753-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55752-6
Online ISBN: 978-3-319-55753-3
eBook Packages: Computer ScienceComputer Science (R0)