skip to main content
10.1145/3627673.3679109acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper
Open access

Dataset Generation for Korean Urban Parks Analysis with Large Language Models

Published: 21 October 2024 Publication History

Abstract

Understanding how urban parks are utilized and perceived by the public is crucial for effective urban planning and management. This study introduces a novel dataset derived from Instagram, using 42,187 images tagged with #Seoul and #Park hashtags from 2017 to 2023. These images were filtered using InternLM-XComposer2, a Multimodal Large Language Model (MLLM), to confirm they depicted park scenes. GPT-4 then annotated the filtered images, resulting in 29,866 valid image annotations of physical elements, human activities, animals, and emotions. The dataset is publicly available at https://huggingface.co/datasets/RedBall/seoul-urban-park-analysis-by-llm.

References

[1]
Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. 2023. Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966 (2023).
[2]
Rong Bao, Lei Chen, and Ping Cui. 2021. User behavior and user experience analysis for social network services. Wireless Networks, Vol. 27 (2021), 3613--3619.
[3]
Joshua WR Baur, Joanne F Tynon, Paul Ries, and Randall S Rosenberger. 2014. Urban parks and attitudes about ecosystem services: does park use matter? Journal of Park and Recreation administration, Vol. 32, 4 (2014).
[4]
Greg Brown, Morgan Faith Schebella, and Delene Weber. 2014. Using participatory GIS to measure physical activity and urban park benefits. Landscape and urban planning, Vol. 121 (2014), 34--44.
[5]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.
[6]
Yiyong Chen, Xiaoping Liu, Wenxiu Gao, Raymond Yu Wang, Yun Li, and Wei Tu. 2018. Emerging social media data on measuring urban park use. Urban forestry & urban greening, Vol. 31 (2018), 130--141.
[7]
Jiawen Deng, Kiyan Heybati, and Matthew Shammas-Toma. 2024. When vision meets reality: Exploring the clinical applicability of GPT-4 with vision., 110101 pages.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[9]
Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, et al. 2024. InternLM-XComposer2: Mastering free-form text-image composition and comprehension in vision-language large model. arXiv preprint arXiv:2401.16420 (2024).
[10]
Cees Goossens. 2000. Tourism information and pleasure motivation. Annals of tourism research, Vol. 27, 2 (2000), 301--321.
[11]
ChengHe Guan, Jihoon Song, Michael Keith, Yuki Akiyama, Ryosuke Shibasaki, and Taisei Sato. 2020. Delineating urban park catchment areas using mobile phone data: A case study of Tokyo. Computers, Environment and Urban Systems, Vol. 81 (2020), 101474.
[12]
Sihui Guo, Gege Yang, Tao Pei, Ting Ma, Ci Song, Hua Shu, Yunyan Du, and Chenghu Zhou. 2019. Analysis of factors affecting urban park service area in Beijing: Perspectives from multi-source geographic data. Landscape and Urban Planning, Vol. 181 (2019), 103--117.
[13]
Maliha Jahan, Helin Wang, Thomas Thebaud, Yinglun Sun, Giang Ha Le, Zsuzsanna Fagyal, Odette Scharenborg, Mark Hasegawa-Johnson, Laureano Moro Velazquez, and Najim Dehak. 2024. Finding Spoken Identifications: Using GPT-4 Annotation for an Efficient and Fast Dataset Creation Pipeline. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 7296--7306.
[14]
Michelle L Johnson, Lindsay K Campbell, Erika S Svendsen, and Heather L McMillen. 2019. Mapping urban park cultural ecosystem services: A comparison of twitter and semi-structured interview methods. Sustainability, Vol. 11, 21 (2019), 6137.
[15]
OV Johnson, O Mohammed Alyasiri, D Akhtom, and OE Johnson. 2023. Image Analysis through the lens of ChatGPT-4. Journal of Applied Artificial Intelligence, Vol. 4, 2 (2023).
[16]
Hyunwoo Lee, Hayoung Choi, Hyojung Lee, Sunmi Lee, and Changhoon Kim. 2024. Uncovering COVID-19 Transmission Tree: Identifying Traced and Untraced Infections in an Infection Network. medRxiv (2024), 2024--05.
[17]
Huilin Liang and Qingping Zhang. 2021. Temporal and spatial assessment of urban park visits from multiple social media data sets: A case study of Shanghai, China. Journal of Cleaner Production, Vol. 297 (2021), 126682.
[18]
Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. 2023. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744 (2023).
[19]
Mathias Lux, Marian Kogler, and Manfred del Fabro. 2010. Why did you take this photo: a study on user intentions in digital photo productions. In Proceedings of the 2010 ACM workshop on Social, adaptive and personalized multimedia interaction and access. 41--44.
[20]
Feinan Lyu and Li Zhang. 2019. Using multi-source big data to understand the factors affecting urban park use in Wuhan. Urban Forestry & Urban Greening, Vol. 43 (2019), 126367.
[21]
Joy Rumbidzai Mangachena and Catherine Marina Pickering. 2021. Implications of social media discourse for managing national parks in South Africa. Journal of Environmental Management, Vol. 285 (2021), 112159.
[22]
Seunghyun Brian Park, Jinwon Kim, Yong Kyu Lee, and Chihyung Michael Ok. 2020. Visualizing theme park visitors? emotions using social media analytics and geospatial analytics. Tourism Management, Vol. 80 (2020), 104127.
[23]
Katharine Sanderson. 2023. GPT-4 is here: what scientists think. Nature, Vol. 615, 7954 (2023), 773.
[24]
Jisoo Sim and Patrick Miller. 2019. Understanding an urban park through big data. International journal of environmental research and public health, Vol. 16, 20 (2019), 3816.
[25]
Henrikki Tenkanen, Enrico Di Minin, Vuokko Heikinheimo, Anna Hausmann, Marna Herbst, Liisa Kajala, and Tuuli Toivonen. 2017. Instagram, Flickr, or Twitter: Assessing the usability of social media data for visitor monitoring in protected areas. Scientific reports, Vol. 7, 1 (2017), 17615.
[26]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
[27]
Sai Zhang and Weiqi Zhou. 2018. Recreational visits to urban parks and factors affecting park visits: Evidence from geotagged social media data. Landscape and urban planning, Vol. 180 (2018), 27--35.

Index Terms

  1. Dataset Generation for Korean Urban Parks Analysis with Large Language Models

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
      October 2024
      5705 pages
      ISBN:9798400704369
      DOI:10.1145/3627673
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 October 2024

      Check for updates

      Author Tags

      1. datasets
      2. image annotation
      3. large language models
      4. urban park

      Qualifiers

      • Short-paper

      Funding Sources

      • National Research Foundation of Korea grant funded by the Korean government(MSIT)
      • Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT)
      • Korea Planning & Evaluation Institute of Industrial Technology (KEIT) grant funded by the Korea government (MOTIE)
      • Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT)

      Conference

      CIKM '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 155
        Total Downloads
      • Downloads (Last 12 months)155
      • Downloads (Last 6 weeks)37
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media