Abstract
The concept of Place has a structured meaning that involves three different aspects: location, locale and sense of place. The definition of sense of place for a specific location requires the analysis of a large amount of data, and social networks are good sources for their extraction since users act as social sensors on them. In this paper, we want to detect the sense of place defined by Twitter’s users for the city of New York using Latent Dirichlet Allocation (LDA). Our assumption is that LDA could be used to summarise the several sense of place shared by users.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Place is an underestimated concept frequently used in every-day life in sentences as “This is my favourite place”, “I finally found my place in the world”, “Lay a place at the table for Mr. Twist”. In the commonsense language, we use the word place to refer to a city (e.g., New York), a public space (e.g., Central Park), a shop or even to the seat we usually take at the table. From these examples it is easy to understand that the concept of place has a high width degree which ranges from a punctual space to a wide area.
Place is also one of the main concepts in Geography, but it assumes in this field a more structured representation. In particular, according to Cresswell [6, 7] the concept of place embodies three different aspects:
-
location: the physical absolute point in the space, identified by a set of coordinates;
-
locale: the visible features and settings of a place, such as streets, shops, parks and so on;
-
sense of place: the set of emotions and feelings that a place inspires in people. These sentiments can be subjective or shared: they are subjective when they are based on someone’s personal biography, and shared when a group of people feels the same sentiment towards a place.
Starting from the intuition of Cresswell, it is clear that a complete definition of a place can’t ignore a systematic analysis of the three aspects listed above. First of all, they involve the identification of a place to focus on and the subsequent collection of a set of observations about the chosen place as for example feelings and emotions. Finally the data must be processed in order to find the sense of place (SOP).
To accomplish such a task automatically, it is necessary to analyse a large amount of data which must encompass enough information to define a place in terms of its three composing aspects. Social networks are good sources for the extraction of data suitable to carry on this analysis. Millions of users post their activities, their emotions, their interests and their opinions every day. This is the reason why the scientific community has become increasingly interested on these data, considering people in social network acting as social sensors in different fields such as politics, economics and sociology. Moreover, the possibility to associate a geographical reference to a post (the so called geo-tags) or to infer the location starting from significant hash-tags allowed scientists to develop map-based data analysis which can also be used to identify disaster-affected areas or regions with high crime rates, as respectively in the works of Cerutti et al. [5] and Ristea et al. [12].
In our aim to detect the SOP, we used Twitter as our source of information. Since Twitter allows to extract tweets posted within a specific geographical region, it was easy for us to fix both location and locale, leaving free the SOP that we extracted from the tweets. We collected all tweets containing the word nyc (New York city) and located over the area of New YorkFootnote 1. Then, following the idea proposed in [14], we applied Latent Dirichlet Allocation (LDA) over the collected data. The assumption is that topics generated by LDA can summarise the several SOP shared by social sensors, since topics capture words that frequently co-occur each other. The topics are used both to select tweets expressing the SOP and to visualise them on a map.
The remain of this paper is structured as follows: Sect. 2 describes some works which use georeferenced data extracted from the social networks and some applications of LDA on them; Sect. 3 describes the experiment we made and the results we obtained in order to detect the SOP for the city of New York applying LDA; finally, in Sect. 4 the article concludes.
2 Related Works
The specification of a geographical reference in a shared post is nowadays an habit for the users of the most famous social networks as TwitterFootnote 2, FacebookFootnote 3 and InstagramFootnote 4. In addition to these well-known platforms, other new location-based services emerged in the last years. Among them, we mention TrendsmapFootnote 5 which shows on a map the latest trends emerging from Twitter, UshahidiFootnote 6 which collects and visualises information about crisis witnesses providing the users the possibility to respond and FixMyStreetFootnote 7 which allows the UK citizens to signal streets problems (pot holes, unsafe walls, not working lampposts) to the local authorities. FirstLifeFootnote 8 [2] is a more interactivity-oriented service which focuses its attention on the user intended as citizen, giving him the possibility to interact with a map on which he can share events, news and even aggregate people. Moreover, the data are associated with a temporal dimension which allows users to filter and order the information according to time.
As previously mentioned, the mass dissemination of social networks and the possibility to acquire the data posted by users trough the RESTful API provided by the social-networks themselves encouraged the scientific community to analyse these data in different fields of interest.
In their work, Sakaki et al. [13] used the intuition of considering the users as social sensors in order to implement event detection. Through a semantic analysis of a collection of tweets and the application of location estimation methods, they were able to approximate the earthquakes’ centre and the typhoons’ trajectories. Cataldi et al. [4] extracted in real time the most emerging topics expressed by the community based on the interests of a specific user in a particular temporal frame. Allisio et al. [1] exploited the temporal and spacial information associated with the tweets in order to produce a daily estimation of the degree of happiness of the main Italian cities. An interactive map shows the data obtained combining Sentiment Analysis and visualisation techniques. Referring to the definition of places given by Cresswell, this work can be considered as an experiment designed to the extraction of the sense of place associated to a location.
Besides the Sentiment Analysis techniques, also Latent Dirichlet Allocation (LDA) [3] was successfully applied on data extracted from social networks. LDA is a probabilistic generative model that treats a document as a finite mixture of topics, where a topic is a distribution over the vocabulary. In details, each topic captures words co-occurrences inside documents, allowing to explore the document collection. In the work of Pennacchiotti and Gurutmundi [11], the authors used LDA to discover users’ interests. In their model, users are represented as a mixture of topics. Thus, it can be used to suggest friends or people to follow just comparing the topics. Zhang et al. [15] proposed a model called SSN-LDA (Simple Social Network LDA) which is able to find communities. In this case, the latent variables (topics) are the communities. Eisenstein et al. [9] argue that words co-occurrences are corrupted by geographical information. According to the authors, people living in a certain geographical region use a different vocabulary from the people that live in a different one. Thus, they treat the geographical area as a latent variable. Lau et al. [10] proposed a method to track emerging events in microblogs based on LDA. Finally, Di Caro et al. [8] proposed a framework called TMine which defines a navigable tag-flag: a kind of topic with the associated words.
3 Experiments
As previously described in the Introduction, we would like to extract the SOP defined by Cresswell [6, 7] from social networks. To accomplish this research question, we applied Latent Dirchlet Allocation (LDA) [3] to find common topics expressed in users’ posts. Our idea is that topics can capture the SOP expressed by people regarding a place or a city. For instance, we may capture that there is an ongoing concert in a park. To validate our assumption, we fixed location and locale to New York, searching all tweets containing the word nyc (New York City) with the constraint that they are geo-located over the area of New York. The only free parameter is the SOP which is extracted from the tweets.
3.1 Dataset Creation
We downloaded a set of 449054 tweets using Twitter APIs. This set includes all the tweets in which the “nyc” chars sequence appears somewhere in the tweet (it could be into the text or into the hash-tags).
Then, analysing the tweets, we noticed that some of them reported news or irrelevant information for our task. Since SOP regards the sentiments expressed toward the city (e.g., a street, a park or a monument), we decided to filter those ones that express a neutral sentiment. To perform sentiment classification, we used Python’s TextBlob library which comprises a pre-trained classifier. After the classification, we found 395467 tweets expressing a non-neutral sentiment. However, many of them were duplicated tweets due to re-tweets. For instance, we found the tweet “So this happened, gotta love NYC” 1620 times. We decided to filter those tweets to create a dataset containing unique ones. Such dataset is composed by 120538 tweets.
Finally, we used a regular expression to remove from the set of tweets those in which the “nyc” chars sequence was part of another word and was not used as the acronym of “New York City”. We applied this regular expression both on the text and the hash-tags of the tweets. Thus, we obtained the final dataset of 21808 tweets from which we extracted the topics using LDA. Table 1 contains some statistics about the collected dataset: the average length of tweets (expressed in terms of number of words, including hash-tags); the average number of hash-tags per tweet; and the vocabulary dimension of the collected dataset, that is the number of different words which appear in the tweets.
We then extracted the frequent words contained in those tweets to see if there exist words that express a sentiment towards New York City and to analyze the content of the dataset. We started removing stopwords, user names and links. Then, we lowercased the text of those tweets and we stemmed the words. To extract and plot the frequent words we used the WordCloud toolFootnote 9. The produced wordcloud is depicted in Fig. 1. From the image, we can find words that express a sentiment towards New York City, such as: “love”, “happen” and “deadly”. Furthermore, the wordcloud shows words related to weather, school and a march against weapon, meaning that in the days we collected the tweets, those three topics were the most discussed ones.
3.2 Topic Extraction
We used Latent Dirchlet Allocation (LDA) model to extract the underline tweets present in the dataset. LDA requires in input the number of topics and extract, for each topic, a probability distribution over the vocabulary. Then, the top-k words of each distribution are selected to represent the topic.
In details, our pipeline to extract the topics is the following one: first, each tweet text has been lowercased and tokenized, preserving users name, hash-tags and urls. Then, we filtered out stopwords, usernames and urls. Finally, we stemmed the words. We also filtered those words that have a globally frequency less than 5. The constructed Bag-Of-Words are given in input to the LDA model.
We used Gensim implementation of LDA, tuning its hyperparameters. We randomly searched the number of topics in output (trying 5, 10, 20 and 50 topics), the number of passes through the corpus (trying 1, 10, 50, 100), and the number of steps of the Expectation-Maximization Algorithm (trying 100, 500 and 1000). We found good results setting the number of topics to 20, the number of passes to 50 and the number of steps to 1000.
3.3 Topic Selection and Analysis
Once we extracted the topics, we had to select those ones that express the sense of place. We gave to two annotators the extracted topics with some tweets associated to them (to understand the topics), asking to judge if a topic expresses the SOP. We then considered only those topics that for both annotators express the SOP. Table 2 shows some selected topics and an associated tweet.
We decided to perform some analysis on the tweets associated to the selected topics to deeply understand how the SOP is spatially and temporally distributed over the city. We started plotting the tweets that have a geographical information on a map labelling them with their associated topic. From Fig. 2, we can notice that the dominant topic is the blue one, which expresses the love of the people towards the city (see the pop-up in the image).
We conducted a second analysis dividing the tweets by posted date and by posted hour to see if a SOP could emerge in a particular day of the week (e.g., Monday) or time (e.g., from 4 pm to 8 pm). From the split of tweets by day, we noticed that on Monday tourists tweetted that they’ll miss New York (see Fig. 3) and that the schools will be closed due to snow (see Fig. 4) on Wednesday.
For the daytime analysis, unfortunately we did not find a set of tweets that expresses the same content. We thought that this is due to the nature of Twitter, whereby people tend to express their opinions, thoughts and news during all the day.
4 Conclusions
In this paper, we created a dataset of tweets regarding New York City in order to extract the sense of place defined by Cresswell [6, 7]. In detail, we fixed location and locale to New York and we tried to extract the sense of place (SOP) from the users posts. The detection of the SOP is performed by using Latent Dirchlet Allocation (LDA) [3] in order to extract topics that summarise the several sentiments that people expressed towards the city. Finally, we showed that is possible to capture the SOP from tweets and that it could depend by the day of the week.
As future works, we are planning to improve the pre-processing phase, since some unfiltered tweets do not express the SOP. Furthermore, we are interest to apply other LDA models to unveil information present inside tweets.
Notes
- 1.
We used both the word and the geo-tag because not all tweets contain the latter information.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Allisio, L., Mussa, V., Bosco, C., Patti, V., Ruffo, G.F.: Felicittà : visualizing and estimating happiness in Italian cities from geotagged tweets. In: CEUR Workshop Proceedings, vol. 1096, pp. 95–106. CEUR Workshop Proceedings (2013)
Antonini, A., et al.: First life, from the global village to local communities. In: 1st IASC Thematic Conference on Urban Commons (2015)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cataldi, M., Caro, L.D., Schifanella, C.: Personalized emerging topic detection based on a term aging model. ACM Trans. Intell. Syst. Technol. (TIST) 5(1), 7 (2013)
Cerutti, V., Fuchs, G., Andrienko, G., Andrienko, N., Ostermann, F.: Identification of disaster-affected areas using exploratory visual analysis of georeferenced tweets: application to a flood event, p. 5. Association of Geographic Information Laboratories in Europe, Helsinki, Finland (2016)
Cresswell, T.: Place. In: International Encyclopedia of Human Geography, vol. 8, pp. 169–177. Elsevier (2009)
Cresswell, T.: Place-part i. In: The Wiley-Blackwell Companion to Human Geography, pp. 235–244 (2011)
Di Caro, L., Candan, K.S., Sapino, M.L.: Navigating within news collections using tag-flakes. J. Visual Lang. Comput. 22(2), 120–139 (2011)
Eisenstein, J., O’Connor, B., Smith, N.A., Xing, E.P.: A latent variable model for geographic lexical variation. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1277–1287. Association for Computational Linguistics (2010)
Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: #twitter trends detection topic model online. In: COLING, pp. 1519–1534 (2012)
Pennacchiotti, M., Gurumurthy, S.: Investigating topic models for social media user recommendation. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 101–102. ACM (2011)
Ristea, A., Kurland, J., Resch, B., Leitner, M., Langford, C.: Estimating the spatial distribution of crime events around a football stadium from georeferenced tweets. ISPRS Int. J. Geo-Inf. 7(2), 43 (2018)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)
Siragusa, G.: Place as topics: analysis of spatial and temporal evolution of topics from social networks data. In: SIDEWAYS@ LREC, pp. 32–35 (2016)
Zhang, H., Qiu, B., Giles, C.L., Foley, H.C., Yen, J.: An LDA-based community structure discovery approach for large-scale social networks. In: ISI, p. 200 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Siragusa, G., Leone, V. (2018). Such a Wonderful Place: Extracting Sense of Place from Twitter. In: Gangemi, A., et al. The Semantic Web: ESWC 2018 Satellite Events. ESWC 2018. Lecture Notes in Computer Science(), vol 11155. Springer, Cham. https://doi.org/10.1007/978-3-319-98192-5_55
Download citation
DOI: https://doi.org/10.1007/978-3-319-98192-5_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98191-8
Online ISBN: 978-3-319-98192-5
eBook Packages: Computer ScienceComputer Science (R0)