ABSTRACT
To facilitate an environment of inclusive urban management, civic agencies need to listen to the voices of citizens on web sources such as social media, online blogs, public forums and so on. Owing to the vastness and noisy nature of online data, it is challenging, yet important to mine actionable issues related to a city as faced by the citizens firsthand, so that timely measures can be taken by the administration to remedy them. In this work, we filter, analyze, and model web data on urban civic issues of a city, with respect to three modalities - semantics, spatial and temporal. We have come up with a novel approach that captures the contexts through dense distributed word embedding as well as identifies the latent issues through a generative model. Due to the scarcity of geo-tagged posts and delayed reporting, we rely primarily on the textual content of the data for location mining and temporal resolution. We present a first-of-a-kind unified system named CUrb that introduces a novel pipeline to construct long term topology of issues across three dimensions, aggregated over a variety of documents. Through extensive experimentation, we demonstrate the efficacy of our system both qualitatively and quantitatively. It achieves improvement upto 24% compared to the state-of-the-art technique.
- H. Abdelhaq, C. Sengstock, and M. Gertz. Eventweet: Online localized event detection from twitter. Proc. VLDB Endow., 6(12):1326--1329, Aug. 2013. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, Mar. 2003. Google ScholarCross Ref
- J. Chang, J. Boyd-Graber, C. Wang, S. Gerrish, and D. M. Blei. Reading tea leaves: How humans interpret topic models. In Neural Information Processing Systems, 2009. Google ScholarDigital Library
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226--231, 1996. Google ScholarDigital Library
- J. Fleiss et al. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378--382, 1971.Google ScholarCross Ref
- Google. Google places API. https://developers.google.com/places/.Google Scholar
- Google. Google Word2vec model. https://code.google.com/archive/p/word2vec/, 2013.Google Scholar
- J. Hartigan and M. Wong. Algorithm AS 136: A K-means clustering algorithm. Applied Statistics, pages 100--108, 1979.Google Scholar
- B. Hu and M. Ester. Spatial topic modeling in online social media for location recommendation. In Proceedings of the 7th A CM Conference on Recommender Systems, RecSys '13, pages 25--32, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- C.-H. Lee, H.-C. Yang, T.-F. Chien, and W.-S. Wen. A novel approach for event detection by mining spatio-temporal information on microblogs. In Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining, ASONAM '11, pages 254--259, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarDigital Library
- I. Mani and G. Wilson. Robust temporal processing of news. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL '00, pages 69--76, Stroudsburg, PA, USA, 2000. Association for Computational Linguistics. Google ScholarDigital Library
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google Scholar
- D. Mimno, H. M. Wallach, E. Talley, M. Leenders, and A. McCallum. Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11, pages 262--272, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics. Google ScholarDigital Library
- T. Mukherjee, D. Chander, S. Eswaran, M. Singh, P. Varma, A. Chugh, and K. Dasgupta. Janayuja: A people-centric platform to generate reliable and actionable insights for civic agencies. In Proceedings of the 2015 Annual Symposium on Computing for Development, DEV '15, pages 137--145, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- A. Nurwidyantoro and E. Winarko. Event detection in social media: A survey. In Proceedings of the International Conference on ICT for Smart Society (ICISS), IEEE, pages 1--5, 2013.Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarDigital Library
- Q. Qu, C. Chen, C. S. Jensen, and A. Skovsgaard. Space-time aware behavioral topic modeling for microblog posts. IEEE Data Eng. Bull., 38(2):58--67, 2015.Google Scholar
- D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP '09, pages 248--256, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics. Google ScholarDigital Library
- V. K. Rangarajan Sridhar. Unsupervised topic modeling for short texts using distributed representations of words. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pages 192--200, Denver, Colorado, June 2015. Association for Computational Linguistics.Google ScholarCross Ref
- R. Řehůřek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45--50, Valletta, Malta, May 2010. ELRA.Google Scholar
- A. Ritter, Mausam, O. Etzioni, and S. Clark. Open domain event extraction from twitter. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 1104--1112, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- C. C. Robusto. The cosine-haversine formula. The American Mathematical Monthly, 64(1):38--40, 1957.Google ScholarCross Ref
- X. Wang and A. McCallum. Topics over time: A non-markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, pages 424--433, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- X. Yan, J. Guo, Y. Lan, and X. Cheng. A biterm topic model for short texts. In Proceedings of the 22Nd International Conference on World Wide Web, WWW '13, pages 1445--1456, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
Recommendations
The Problem of Community Engagement: Disentangling the Practices of Municipal Government
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing SystemsIn this paper, we work to inform the growing space of Digital Civics with a qualitative study of community engagement practices across the breadth of municipal departments and agencies in a large US city. We conducted 34 inter-views across 15 different ...
Digital soapboxes: towards an interaction design agenda for situated civic innovation
UbiComp '13 Adjunct: Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publicationWe argue that there are at least two significant issues for interaction designers to consider when creating the next generation of human interfaces for civic and urban engagement: (1) The disconnect between citizens participating in either digital or ...
Agora2.0: enhancing civic participation through a public display
C&T '13: Proceedings of the 6th International Conference on Communities and TechnologiesProviding a common place for the civil society to gather and discuss topics of mutual interest is a growing challenge for social and collaborative computing. Web-based tools for civic engagement, while promising, are still disconnected from meaningful ...
Comments