Abstract
The advancements in digital tools and data collection methods ensure the continuing growth of textual data obtained through large-scale participation processes in urban contexts. In order to extract the thematic content of such underutilized textual datasets, topic modeling (TM) and content analysis have been deployed as promising AI-based Natural Language Processing (NLP) techniques. Yet, implementing such techniques has not been exploited in urban design domains due to the complexity of textual datasets and the lack of a systematic evaluation framework. In this paper, we addressed the challenges in the utilization of large textual data by using a real-world dataset collected via a digital participation platform in Madrid, Spain. Firstly, we identified prominent data structures and potential information embedded into the dataset by using a document-oriented NoSQL database. In this step, we systematically discussed data pre-processing steps to convert them into a series of structured data collections. Secondly, we evaluated three different TM algorithms, i.e. LDA, LSI, and HDP, according to a number of hyperparameters controlling the learning process. This step aimed to reveal the required number of topics to extract meaningful content through the algorithms. Lastly, we presented possible textual data visualization techniques to enable the use of textual information in digital participation processes. Consequently, this paper facilitates the use of large textual datasets by investigating data structures & processing, revealing the potentials of different TM algorithms, and eventually analyzing the results with the support of urban big data analytics and computational linguistic techniques for informed urban design processes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Airoldi, E., Blei, D., Xing, E., Fienberg, S.: A latent mixed membership model for relational data. In: Proceedings of the 3rd international workshop on Link discovery – LinkKDD’05, pp. 82–89 (2005)
Ataman, C., Tuncer, B.: Urban interventions and participation tools in urban design processes: a systematic review and thematic analysis (1995–2021). Sustain. Cities Soc. 76, 103462 (2022)
Ataman, C., Tunçer, B., Perrault, S.T.: Asynchronous digital participation in urban design processes: qualitative data exploration and analysis with natural language processing. In: POST-CARBON – Proceedings of the 27th CAADRIA Conference, pp. 383–392 (2022)
Vyankatrao Barde, B., Madhavrao Bainwad, A.: An overview of topic modeling methods and tools. In: 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 745–750 (2017)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Dembski, F., Wössner, U., Letzgus, M., Ruddat, M., Yamu, C.: Urban digital twins for smart cities and citizens: the case study of Herrenberg, Germany. Sustainability 12(6), 2307 (2020)
Dunne, C., Skelton, C., Diamond, S., Meirelles, I., Martino, M.: Quantitative, Qualitative, and Historical Urban Data Visualization Tools for Professionals and Stakeholders, pp. 405–416 (2016)
Egger, R. (ed.): Applied Data Science in Tourism: Interdisciplinary Approaches, Methodologies, and Applications. Springer International Publishing, Cham (2022)
Jacobi, C., van Atteveldt, W., Welbers, K.: Quantitative analysis of large amounts of journalistic texts using topic modelling. Digit. J. 4(1), 89–106 (2015)
Jelodar, H., et al.: Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimed. Tools Appl. 78(11), 15169–15211 (2018). https://doi.org/10.1007/s11042-018-6894-4
Krasnov, F., Sen, A.: The number of topics optimization: clustering approach. Mach. Learn. Knowl. Extr. 1(1), 416–426 (2019)
Liu, L., Tang, L., Dong, W., Yao, S., Zhou, W.: An overview of topic modeling and its current applications in bioinformatics. Springerplus 5(1), 1–22 (2016). https://doi.org/10.1186/s40064-016-3252-8
Mueller, J., Hangxin, L., Chirkin, A., Klein, B., Schmitt, G.: Citizen design science: a strategy for crowd-creative urban design. Cities 72, 181–188 (2018)
Mazhar Rathore, M., Paul, A., Hong, W.-H., Seo, H., Awan, I., Saeed, S.: Exploiting IoT and big data analytics: defining smart digital city using real-time urban data. Sustain. Cities Soc. 40, 600–610 (2018)
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In:Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408. https://doi.org/10.1145/2684822.2685324 (2015)
Sbalchiero, S., Eder, M.: Topic modeling, long texts and the best number of topics. Some problems and solutions. Qual. Quant. 54(4), 1095–1108 (2020). https://doi.org/10.1007/s11135-020-00976-w
Nareshkumar Singh, K.S.H., Dickeeta Devi, S., Mamata Devi, H., Mahanta, A.K.: A novel approach for dimension reduction using word embedding: an enhanced text classification approach. Int. J. Inform. Manag. Data Insights 2(1), 100061 (2022)
Tekler, Z.D., Low, R., Choo, K.T.W., Blessing, L.: User Perceptions and adoption of plug load management systems in the workplace. In: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, pp. 1–6 (2021)
Wang, Y., Taylor, J.E.: Urban crisis detection technique: a spatial and data driven approach based on latent Dirichlet Allocation (LDA) topic modeling. In:Construction Research Congress 2018, pp. 250–259 (2018). https://doi.org/10.1061/9780784481271.025
Wilkerson, J., Casas, A.: Large-scale computerized text analysis in political science: opportunities and challenges. Annu. Rev. Polit. Sci. 20(1), 529–544 (2017)
Wong, T.-T., Yeh, P.-Y.: Reliable accuracy estimates from k-fold cross validation. IEEE Trans. Knowl. Data Eng. 32(8), 1586–1594 (2020)
Zhao, W., et al.: A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics 16(S13), S8 (2015)
Decide Madrid. https://decide.madrid.es
Acknowledgment
This research is supported by “Designing mobile-friendly cartograms for visualising geospatial data” Grant, from the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 programme (award number MOE-T2EP20221-0007) and by Singapore International Graduate Award (SINGA).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ataman, C., Tunçer, B., Perrault, S. (2023). Transforming Large-Scale Participation Data Through Topic Modelling in Urban Design Processes. In: Turrin, M., Andriotis, C., Rafiee, A. (eds) Computer-Aided Architectural Design. INTERCONNECTIONS: Co-computing Beyond Boundaries. CAAD Futures 2023. Communications in Computer and Information Science, vol 1819. Springer, Cham. https://doi.org/10.1007/978-3-031-37189-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-37189-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37188-2
Online ISBN: 978-3-031-37189-9
eBook Packages: Computer ScienceComputer Science (R0)