skip to main content
10.1145/3297001.3297008acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

My City, My Voice: Listening to the Citizen Views from Web Sources

Published:03 January 2019Publication History

ABSTRACT

To facilitate an environment of inclusive urban management, civic agencies need to listen to the voices of citizens on web sources such as social media, online blogs, public forums and so on. Owing to the vastness and noisy nature of online data, it is challenging, yet important to mine actionable issues related to a city as faced by the citizens firsthand, so that timely measures can be taken by the administration to remedy them. In this work, we filter, analyze, and model web data on urban civic issues of a city, with respect to three modalities - semantics, spatial and temporal. We have come up with a novel approach that captures the contexts through dense distributed word embedding as well as identifies the latent issues through a generative model. Due to the scarcity of geo-tagged posts and delayed reporting, we rely primarily on the textual content of the data for location mining and temporal resolution. We present a first-of-a-kind unified system named CUrb that introduces a novel pipeline to construct long term topology of issues across three dimensions, aggregated over a variety of documents. Through extensive experimentation, we demonstrate the efficacy of our system both qualitatively and quantitatively. It achieves improvement upto 24% compared to the state-of-the-art technique.

References

  1. H. Abdelhaq, C. Sengstock, and M. Gertz. Eventweet: Online localized event detection from twitter. Proc. VLDB Endow., 6(12):1326--1329, Aug. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, Mar. 2003. Google ScholarGoogle ScholarCross RefCross Ref
  3. J. Chang, J. Boyd-Graber, C. Wang, S. Gerrish, and D. M. Blei. Reading tea leaves: How humans interpret topic models. In Neural Information Processing Systems, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226--231, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Fleiss et al. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378--382, 1971.Google ScholarGoogle ScholarCross RefCross Ref
  6. Google. Google places API. https://developers.google.com/places/.Google ScholarGoogle Scholar
  7. Google. Google Word2vec model. https://code.google.com/archive/p/word2vec/, 2013.Google ScholarGoogle Scholar
  8. J. Hartigan and M. Wong. Algorithm AS 136: A K-means clustering algorithm. Applied Statistics, pages 100--108, 1979.Google ScholarGoogle Scholar
  9. B. Hu and M. Ester. Spatial topic modeling in online social media for location recommendation. In Proceedings of the 7th A CM Conference on Recommender Systems, RecSys '13, pages 25--32, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C.-H. Lee, H.-C. Yang, T.-F. Chien, and W.-S. Wen. A novel approach for event detection by mining spatio-temporal information on microblogs. In Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining, ASONAM '11, pages 254--259, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. Mani and G. Wilson. Robust temporal processing of news. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL '00, pages 69--76, Stroudsburg, PA, USA, 2000. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google ScholarGoogle Scholar
  13. D. Mimno, H. M. Wallach, E. Talley, M. Leenders, and A. McCallum. Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11, pages 262--272, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Mukherjee, D. Chander, S. Eswaran, M. Singh, P. Varma, A. Chugh, and K. Dasgupta. Janayuja: A people-centric platform to generate reliable and actionable insights for civic agencies. In Proceedings of the 2015 Annual Symposium on Computing for Development, DEV '15, pages 137--145, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Nurwidyantoro and E. Winarko. Event detection in social media: A survey. In Proceedings of the International Conference on ICT for Smart Society (ICISS), IEEE, pages 1--5, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  16. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Q. Qu, C. Chen, C. S. Jensen, and A. Skovsgaard. Space-time aware behavioral topic modeling for microblog posts. IEEE Data Eng. Bull., 38(2):58--67, 2015.Google ScholarGoogle Scholar
  18. D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP '09, pages 248--256, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. K. Rangarajan Sridhar. Unsupervised topic modeling for short texts using distributed representations of words. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pages 192--200, Denver, Colorado, June 2015. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  20. R. Řehůřek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45--50, Valletta, Malta, May 2010. ELRA.Google ScholarGoogle Scholar
  21. A. Ritter, Mausam, O. Etzioni, and S. Clark. Open domain event extraction from twitter. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 1104--1112, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. C. Robusto. The cosine-haversine formula. The American Mathematical Monthly, 64(1):38--40, 1957.Google ScholarGoogle ScholarCross RefCross Ref
  23. X. Wang and A. McCallum. Topics over time: A non-markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, pages 424--433, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Yan, J. Guo, Y. Lan, and X. Cheng. A biterm topic model for short texts. In Proceedings of the 22Nd International Conference on World Wide Web, WWW '13, pages 1445--1456, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
    January 2019
    380 pages
    ISBN:9781450362078
    DOI:10.1145/3297001

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 3 January 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    CODS-COMAD '19 Paper Acceptance Rate62of198submissions,31%Overall Acceptance Rate197of680submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader