1 Introduction

Social media has succeeded in creating a very convenient platform for individuals to connect with others regardless of their geographic location. As the number of social network users grows each day, the concern for privacy also increases [1,2,3]. Social media is a significant part of modern life; therefore, it is critically important to address potential privacy concerns, and more specifically the risks that accompany the direct or indirect disclosure of personally identifiable information to the public.

1.1 Personally Identifiable Information (PII)

According to the National Institute of Standards and Technology (NIST) [4], PII is any piece of information that could be used to distinguish or trace a person’s identity, e.g., full name, driver’s license number, or street address. Additionally, PII is any information that is linked or linkable to an individual, such as license plate, signature, or handwriting [4].

When social users reveal PII, their information can be used against them or their family and friends. The worst-case scenario reveals itself when a professional malicious user acquires an individual’s private information. In 2010, a MythBusters show host accidentally revealed his PII when he posted a geotagged image of his truck and house on Twitter. The posted geotagged photo of his Land Cruiser in the driveway when accompanied with the text, “Now it’s off to work,” provided all of the information needed for potential thieves to know the location of the house and that it was at the time uninhabited [5]. In the same year, New Hampshire Police investigation of a series of eighteen burglaries revealed a strong connection between the homeowners’ posts on social media and subsequent burglaries [6]. According to the Department of Justice, Bureau of Justice Statistics (BJS), an estimated 16.6 million people, or 7% of all persons 16 or older in the U.S., experienced at least one incident of identity theft totaling 24.7 billion in 2012 [7]. The losses of PII, sensitive and non-sensitive, in the U.S. are prevalent, with serious consequences to individuals and organizations.

Common privacy incidents following serious emergency situations include the phenomenon of targeted spearphishing attacks and online scams asking for money to support “relief efforts” [8]. Another less researched phenomenon is the frequency of people who elect to share PII with anonymous online users after a disaster in an effort to receive short term relief.

Our goal is to investigate if PII related images on Twitter occur more frequently post disasters, such as in the events surrounding earthquakes, hurricanes or any other life-threatening events, where people tend to think less about their privacy. In other words, we expect people to deliberately post more private information during disasters on social media by uploading images of their house or neighborhood to show the damages to their property.

As a matter of fact, the majority of pictures taken with digital cameras or smartphones include longitude and latitude coordinates in the image files stored on those devices [9]. Fortunately, social media platforms, such as Twitter and Facebook, have eliminated most of the georeferenced metadata found in image files [10]. However, within a photo’s raster layer, some information related to the user’s location can be revealed, i.e. an easily recognizable building, distinguishable neighborhood, street or business sign. Moreover, tweeted images may contain more sensitive information when they illustrate any form of identification, such as a driver’s card, a university student/faculty ID or an office badge.

We chose Twitter as the social media platform. Therefore, we should acknowledge that, in general, 69% of adults in the US are active on one or more social media platforms. Yet, only 24% of this population are on Twitter [11]. Furthermore, Twitter users tend to be younger and have higher levels of education than the actual population [12]. Despite all of these facts, Twitter offers invaluable social data which is accessible using a Twitter API.

The rest of the paper is organized as follows. Section 2 discusses related works. Section 3 presents an overview of the dataset collection, limitations of the proposed research, and methodology used for sampling and analysis. Section 4 illustrates results using graphs and figures specific for each of the three majors disasters; and Sect. 5 discusses the correlations between the results and real-world reports of social media and emergency situation incidents. Finally, Sect. 6 poses the risks and rewards of using social media during disasters and explores the potential future work.

2 Related Work

Though there are many discussions to have regarding complications of using social networking services for communication, we distinguish between (i) issues focusing on social media security and privacy limitations, (ii) the influence of social media during disasters, and (iii) the drawbacks of excessive use of social media after disasters.

2.1 Privacy and Security Issues in Social Media

In the United States, the Privacy Act of 1974 regulates the collection of personal information by government agencies. Regrettably, there is no overarching federal law regulating private entities [13]. Most social network platforms aim to preserve their clients’ privacy as much as possible [14], especially Twitter. Twitter does not require users to provide their real names; instead, it encourages users to create unique pseudonyms with no relation to their real names. When a photo is deleted on Facebook or Instagram, the image and the information carried in the image (URL and shared link) can be accessible for several days due to “photo-deletion delay.” Conversely, this is not the case for Twitter which deletes that sensitive information immediately [14].

Regrettably, even with the compliance of anonymity by the users and strict standards set forth by Twitter, a study by Peddenti et al. [15] in 2017 revealed that only \(\simeq \) 6% of Twitter users are truly anonymous. Using multiple social media networks, it is possible to infer 39.9% more personal information via deanonymization and aggregation [2, 16, 17]. When algorithms and deanonymization fail, information can also be directly revealed by advertisement companies and social media platforms. Recently in 2013, it was reported that the Facebook bug leaked the private contact information of 6 million users [17].

Revealing personal details about your life on social media where everyone can access the information is risky. While using social media, Cai et al. [18] recommended that individuals disguise their attributes (i.e. using encryption) and remove friendship links in order to achieve the “privacy-utility trade-off.” Data mining and social media shortcomings are a cause of concern for privacy on the Internet, and sanitizing network data prior to release is necessary [18]. However, this is not easy to implement on regular days, and it gets more complicated during a disaster.

To the best of our knowledge, all researchers investigating the possible security and privacy issues were looking at users’ profile attributes and posted texts, and few have used images for social media and privacy correlations.

2.2 Social Media’s Influence During Disasters

In a disaster, whether it is natural (earthquake or hurricane), technological (oil spill) or human (terrorism), a lack of communication fuels a crisis [19]. It is commonplace for organizations to willingly use a comprehensive framework for “disaster social media” in order to successfully employ meaningful social media communication in a disaster [19]. Some government, non-profit and news organizations (Salvation Army, The National Weather Service, US Federal Emergency Management Agency) include frameworks, such as the Crisis and Emergency Risk Communication (CERC) model and the Disaster Communication Intervention Framework (DCIF) [19]. Social media emergency frameworks are employed to better communicate with disaster victims and provide feedback in the event of an emergency; but the way social media is utilized in a disaster should be adjusted to reflect changing circumstances. Social media platforms -including Twitter, YouTube, and Facebook- are among the most important two-way mediated channels of communication by officials for interaction with the public before, during and after natural disasters [20]. This can be demonstrated by the social media awareness during the terror attacks on Brussels [20] and again during the 2011 Virginia earthquake, where tweets of information about the earthquake moved across the state faster than seismic waves did [21]. Similarly, social media was invaluable during the Great East Japan Earthquake in 2011 when Tsukuba had major power outages, towers went down and web-enabled phones and smartphones became the primary devices for media access. In that situation, the number of tweets per day of 39 local governments increased tenfold. Consequently, after an exhaustive list of positive implications to use social media during an emergency -especially for preparedness, response, and recovery- social media has notable drawbacks [22]. Examples of skepticism for using social media during a disaster include unconfined or unreliable information, possible technical problems, and the notable collection of privacy concerns [20]. Our critical approach to social media and privacy concerns also takes advantage of the fact that in the times of disasters, affected people post images to reach the rescue teams, officials, and organizations.

2.3 Privacy Issues in Social Media During Disasters

Social media is known for its share of active attacks including stalking, cyberbullying, malvertising, phishing, social spamming, and scamming; yet, post-disaster the passive attacks could arguably be of greater concern. During a disaster, people often take photos to document the cascading events and subsequent sharing of information in such situations can be informative, newsworthy, and therapeutic [23]. However, privacy issues arise on Twitter post-disaster when sensitive information is revealed through images or texts regarding a person’s location or personal information, such as cell phone number.

It is not impossible to infer a anonymous Twitter users whereabouts. According to Hecht et al. [24], 66% of Twitter users have a geographic location in the location field of their users’ profile, while the rest of users leave the field blank or filled in with a non-geographic location. Even if the location field is not occupied, it is possible to predict the location based on tweet content. Furthermore, despite the fact that less than 1% of all tweets are geo-tagged, algorithms are available to accurately predict the location of a tweet at the city level from a combination of information including: the tweet contents (e.g., place names, hashtags), tweeting behavior-based time zone location (volume of tweets per time unit), and trained location dataset based on the tweet contents (e.g., the dictionary containing dynamically weighted ensemble of locations) [25].

Unlike the disaster itself, privacy issues which accompany them can be avoided. As people voluntarily capture, gather and aggregate information through social media, the result is a very large-scale collection of personal information [23]; and upon deliberation of privacy consequences, every precaution must be taken when sharing images post-disaster. A pattern exists that people may write a quick message immediately after a life-threatening event but it takes a long time for images and videos to be uploaded from cameras to large-scale social forums (such as Twitter and YouTube). Hence, before any information is compromised on social media, users need to remember that they may be able to partially mask their location by carefully avoiding the mentions of geographic places in their posts [25].

When considering the security risks in the tweets’ actual text, one’s attention should also be on the information residing in the posted photos. While there is a possibility of predicting the users’ location in an image, there is also a possibility of retrieving personally identifiable information in the photo.

3 Experimental Design

3.1 Definition of PII in This Study

Based on our interpretation of the NIST-defined PII, described in Sect. 1.1, we define three types of PII images. These are “location disclosure”, “personal information disclosure”, and “linkable information.”

Location Disclosure. An image is labeled as a location-disclosure PII when it contains a complete exterior view of at least one recognizable building. Most of the images which are tagged by location-disclosure include more than one structure in the exterior view. Therefore, the location of these images can be determined by any individual that is familiar with the area. For instance, Fig. 1a shows a flooded apartment complex in Puerto Rico. The street-level location of this picture can be discovered easily by the people that are familiar with that neighborhood.

Personal Information Disclosure. This type of PII images includes all pictures of government/non-government issued identification cards. In addition, any documents that contain at least the full name of an individual are defined as “Personal information disclosure.” Fig. 1b shows a tweeted image of a driver’s license found after an earthquake in Mexico City.

Linkable Information. The third type of PII images includes any photos containing information that is linked to or can be linkable to an individual. Examples of this category include tweeted images of a ticket to a concert, a signature on a personal bank check, or as shown in Fig. 1c, the photo of a parked car with a visible license plate.

Fig. 1.
figure 1

Examples of images posted on Twitter which are considered PII in this study. The images were blurred to protect the users’ privacy.

3.2 Data Collection

Twitter data can be downloaded in various ways. Below, two methods are described for the dataset collection of (1) a Hurricane Harvey Twitter dataset and (2) a Hurricane Maria and a Mexico City earthquake Twitter dataset.

Hurricane Harvey. We used the PowerTrack API to amass a comprehensive dataset of geotagged tweets [26]. Using available operators, all tweets within the vicinity of Houston, TX were retrieved within a radius of 24.5 miles from the coordinates of 29.750641 latitude and −95.365851 longitude. These tweets were collected between the dates of July \(21^{th}\) and October \(1^{st}\), including more than two weeks before and after the hurricane occurred (Fig. 2a).

Table 1. Overview of datasets
Fig. 2.
figure 2

The processes of collecting tweeted images in this paper.

Hurricane Maria and Mexico’s Earthquake. Unlike the Hurricane Harvey method, we used the Twitter Streaming API to collect data for Hurricane Maria and the Mexico City earthquake. On Tuesday September 19th at 7:30pm CST, \(\sim \)11 h prior to Hurricane Maria making landfall on Puerto Rico, tweets were recorded based on designated queries. This recording continued for 48 h. Similarly, tweets posted \(\sim \)1 h after the Mexico City earthquake (occurring \(\sim \)6:15pm CST) were recorded based on specific queries. For the Hurricane Maria dataset, we captured all the tweets that mentioned “Hurricane_Maria”, “Hurricane”, “Maria”, “huracan”, “Puerto”, “Rico” during the Hurricane. For the earthquake dataset, we captured all the tweets that mentioned “earthquake”, “Puebla”, “Mexico”, “terremoto”, “sacudida”, “shaking”, and “Mexico City” at the time of the disaster. These terms were carefully chosen to filter out irrelevant tweets that were not about these particular disasters. In both cases, we used the Spanish equivalent of the English terms, such as “sacudida” and “shaking” for the earthquake in Mexico.

We gathered tweets to select users within forty-eight hours of the two disasters. As the interest of user selection pertained to Twitter users that were firmly tweeting from inside the disaster area, we initially attempted to narrow our search to users with tweets that were georeferenced. The number of users with geotagged tweets was lower than what we needed for this research (less than 100 users for Hurricane Maria). In order to broaden the number of observed tweets, we instead selected users who registered their location to be “Puerto Rico” or “Mexico City” in their user profile. This newer approach of categorizing location by profile location widened the number of available users for the hurricane in Puerto Rico to an estimated \(\simeq \) 7,000 and expanded the available users to approximately \(\simeq \) 3,200 for the Mexico City earthquake. We retrieved up to 3,200 tweets from each individual user between January \(2^{nd}\) and September \(29^{th}\), 2017 (Fig. 2b).

3.3 Methodology

We extracted the user IDs that were active two weeks before and two weeks after each disaster. There were 3,920, 6,918, and 3,210 users for Harvey, Puerto Rico and Mexico City, respectively. Therefore, we extracted a total of 985, 6,250, and 3,109 active users for Hurricane Harvey, Hurricane Maria, and the Mexico City earthquake, respectively.

600 users were randomly selected out of the pool of identified active users for each disaster. Each user was randomly selected to minimize bias for a uniform representative sample. Once the users were selected, we used Get Tweet Timelines API to retrieve all the tweeted images of our sample users within the four weeks (two weeks before and after the disaster) [27].

To determine if the people’s habit of posting PII images was affected by disasters, we examined the tweeted images of every user for each disaster: before, during, and after it. Table 1 shows the number of tweeted images for each disaster from the streamlined selection of users. Finally, we manually analyzed each of these images to label them as either PII or non-PII images (Fig. 2c).

4 Experimental Results

4.1 Hurricane in Houston

Tropical storm Harvey intensified to a category 4 hurricane before making landfall along the middle Texas coastline late August 26, 2017. Houston’s metropolitan area faced severe rain and wind between the \(25^{th}\) and \(29^{th}\) of August. However, Tweets peaked on August \(27^{th}\), but this surge in social media presence did not last. After three days, the number of tweets returned to normal where the pattern of daily tweets two weeks after the hurricane mimicked the patterns before the hurricane took place (Fig. 3).

The daily estimate of tweeted images fluctuated but trends emerged for Twitter user presence and behavior throughout the disaster period. While the amount of Twitter users accessing their Twitter accounts and uploading PII images increased by 514% (Fig. 4), the amount of tweeted PII images followed a similar trend, increasing by 633% (Fig. 4) during/after Hurricane Harvey made landfall. There were only four tweeted images (out of 1,568) before the hurricane made landfall, but after the hurricane reached Texas’s coastline, this number increased to twenty-three (out of 1,573). Moreover, from a sample of 600 Twitter users, only three Twitter users uploaded PII images to social media compared to the twenty-three users who tweeted PII images during and after the disaster. There was no overlap in users before and after the hurricane.

To examine the change in the pattern of the amount of tweets containing PII images, we performed a paired t-test on the number of PII images posted by each user in our sample group before and after the hurricane. The averaged PII tweeted images during/after the hurricane was more than the average PII images tweeted before the hurricane made landfall (df = 598, p = 0.004). This suggests that as user presence increased, so did the amount of PII imaged tweets posted per individual.

4.2 Hurricane in Puerto Rico

Puerto Rico was devastated by Hurricane Irma, category 3, on September \(6^{th}\) and again by the rain shield of Hurricane Maria, category 5, on September \(20^{th}\), 2017. These disasters had significant effects on social media. The two peaks in Fig. 5 indicate the influence that the devastation of the two different hurricanes had on Twitter user behavior. Comparing two time periods - before the hurricanes and the fourteen days during the hurricanes (September \(5^{th}\) to September \(19^{th}\)) before blackouts occurred - on average, the number of tweeted images per day increased by 20%.

To assess the behavioral change of Twitter users posting PII images in a time proximity to the hurricanes, we compiled each posted image containing PII for each user in our sample group before Hurricane Irma and after Irma up until during Hurricane Maria (over 25k images). There was a 388% increase in the number of users posting PII images after/during the disasters (Fig. 6). Subsequently, in the same period of time, the number of posted PII images increased by 276%. To examine the behavioral pattern of users tweeting images containing PII before and after Hurricane Irma, we performed a paired t-test on the number of PII images posted by each user in our sample group before and after the disaster. There were significantly more tweeted images containing PII after the disasters in Puerto Rico (df = 598, p = 5.788e−08). Similar to Hurricane Harvey, in the period surrounding Hurricane Irma and Hurricane Maria, the user presence increased, and so did the amount of tweets each user posted.

Fig. 3.
figure 3

Daily tweeted images in the Houston metropolitan area. Lines indicate Twitter collection time frame for two weeks before Hurricane Harvey made landfall, when Hurricane Harvey made landfall, and then two weeks after the hurricane made landfall.

Fig. 4.
figure 4

Tweeted images containing PII over a 4 week period in Houston: (A) Comparison of the number of PII users, (B) Comparison of the number of PII tweets.

Fig. 5.
figure 5

Daily amount of tweeted images for the island of Puerto Rico two weeks before Hurricane Irma made landfall, the day after Hurricane Irma made landfall, and the day that Hurricane Maria made landfall.

Fig. 6.
figure 6

Tweeted images containing PII over a 4 week period in Puerto Rico: (A) Comparison of the number of PII users, (B) Comparison of the number of PII tweets.

Fig. 7.
figure 7

Daily tweeted images for Mexico City two weeks before the earthquake, the day of the earthquake, and two weeks after the earthquake.

Fig. 8.
figure 8

Tweeted images containing PII over a 4 week period in Mexico City: (A) Comparison of the number of PII users, (B) Comparison of the number of PII tweets.

4.3 Earthquake in Mexico City

On September \(19^{th}\), a 7.1 magnitude earthquake struck central Mexico, in proximity to its capital, Mexico City. In its wake, the earthquake had subtle effects on Twitter user presence and behavior. After examining over 25,000 posted images within a 4 week time-frame, the amount of posted images increased rapidly post-disaster. The number of tweeted images escalated on September 19 (the day of the earthquake) and peeked on September 20, with those two days comprising 27% of the tweets posted for the two weeks following the first seismic event (Fig. 7). For the first two weeks following the disaster, the user presence increased by 355% and the amount of tweeted images containing PII increased by the same percentage; thus, the amount of users directly corresponded to the number of tweeted images containing PII (Fig. 8).

Using a paired t-test, it was demonstrated that earthquake disasters changed the way affected users managed their PII online. For the Mexico earthquake dataset, we saw that from a 600 user sample, the amount of tweeted PII images after the earthquake was higher than the amount of tweeted PII images before the earthquake (df = 598, p = 6.679e−05).

4.4 PII Image Predominance in Each Disaster

Using three defined categories of PII in Sect. 1.1, i.e. location disclosure, personal information, and linkable information, we distinguished which types of sensitive information these users were revealing through images for each corresponding disaster. Starting with the most prevalent form of sensitive information disclosure, “location disclosure.” “Location disclosure” increased after Hurricane Harvey, Hurricane Maria, and the earthquake of Mexico City by 2100%, 457%, 4600%, respectively (Fig. 10). Images with “location disclosure” occurred at the highest frequency compared to the two other types of PII images, though “personal disclosure” was also high after both hurricanes and the earthquake in Mexico City.

Only in two cases of tweeted images from Hurricane Harvey and Maria were there no forms of “personal information” disclosure. Yet, “personal information” disclosure was not as relevant for the earthquake disaster dataset. In fact, after the earthquake in Mexico City, it was only 50% more likely to see personal information being disclosed in a tweeted image (Fig. 10). “Linkable information” did increase after the disasters in Puerto Rico and Mexico City by 200% and 400%, respectively, but while “linkable information” was prevalent for these two disasters, the chance of having tweeted images containing linkable information decreased 33% in the case of Hurricane Harvey (Fig. 10).

For Hurricane Harvey, the authors computed the increase ratio of posted images and the increase ratio of gauge height and mapped them, shown in Fig. 9. The ratio for both gauges and number of images was calculated by (\( (X_{After disaster} -X_{Before disaster})/X_{Before disaster}\)). The data for gauge height in Houston was acquired from the USGS website [28], and the number of tweets computed for each 0.5-minute grid cell is presented as a bar at the center of the cells.

Fig. 9.
figure 9

Distribution of tweeted images based on geographical locations and water intensity.

Fig. 10.
figure 10

Distribution of tweeted images containing PII based on the three defined types of PII disclosure mentioned in Sect. 1.1 - for Hurricane Harvey, Hurricanes Irma and Maria, and the earthquake in Mexico City.

5 Discussion

Users tend to reveal more PII during and after the time of disasters. In fact, people experiencing unexpected natural disasters tend to post more images during and after disasters. Each disaster demonstrated that the number of posted images returned to normal in less than a week. However, in the case of Hurricane Maria, 100% of Puerto Ricans were left without power [29]; therefore, as shown in Fig. 5, there was a quick and drastic decline in the number of tweeted images right after the hurricane.

Figures 4, 6 and 8 show that along with the rise in the amount of tweeted images with PII, the number of users posting PII images is also increasing. People that are devastated by disasters post images more often in an attempt to get help from their local government or rescue teams [23].

After images posted in social media were assessed based on the three PII categories defined in Sect. 1.1, as expected, during disasters people post significantly more images that may potentially reveal their locations. Surprisingly, in the case of the earthquake in Mexico, results show that most of the images taken before the earthquake contained a large quantity of “personal information disclosure.” It may have been linked to the several missing person reports that were posted before and after the earthquake. However, almost no images containing “personal information disclosure” had been found after the hurricanes in Puerto Rico and Houston. In addition, images with “linkable information” were recognized in all three disasters, but this category had prevalence during and after Hurricane Maria and the earthquake in Mexico.

Social media provides important data for first responders and rescue teams [20,21,22]. However, if such data is assimilated by a malicious user, it can threaten users’ privacy or safety. While a post containing an image of a flooded neighborhood could guide rescue teams to the area, it could also be troublesome in the future by revealing a home/work location.

According to Frailing and Harper, [30,31,32,33] looting is happening in the wake of natural catastrophes. For example, the rate of burglary increased 200% in the aftermath of Hurricane Katrina in New Orleans. Considering the fact that people between 18–29 are the most active users on social media [11] and also responsible for the largest number of crimes in the United States [7], the likelihood of malicious users taking advantage of innocent posts on social media to select their next target increases.

During disaster situations and emergencies, people tend to be distracted and can more easily fall victim to privacy incidents. Users should delete their posts containing PII after disasters. Alternatively, social media platforms can treat the posts that contain PII the same way as they treat the posts containing graphic violence and adult content. In other words, the site can require users to remove the posts containing PII or at least draw their attention to such posts [34].

Several limitations must be acknowledged. Our datasets were limited to the data which were only available from Twitter. Another drawback was that we were unable to explore the differences in Twitter usage by demographic characteristics and the degree to which the users were in proximity to the disaster. Twitter users usually do not reveal such information (birth-date/age, gender, location’s coordinate). Furthermore, access to the full dataset of geo-referenced tweets was only available for Hurricane Harvey. Therefore, for the hurricanes in Puerto Rico and the earthquake in Mexico City, we relied on the datasets we recorded using Streaming API.

6 Conclusion and Future Work

Social media is beneficial to the public in the event of a disaster because it improves community awareness, and government agencies and advocacy groups rely on social media sites (such as Twitter) to communicate information to and with the public. Still, as infrastructure fails and the capacity of law enforcement diminishes, crime rates increase and looting/identity theft can be attributed to information being revealed on social media. In this study, we assessed Twitter usage by the public following three disasters and found that users were trusting social media more often during/after a disaster in three separate scenarios. Randomly selected groups of 600 users revealed more significant amounts of personally identifiable information (PII) during and after disasters. In all “location disclosure” cases, users tended to more frequently indicate street addresses, neighboring buildings, or distinctive landmarks in their uploaded images. Users facing the after-effects of Hurricane Maria and the earthquake in Mexico revealed “linkable information” in the form of signatures on bank checks and images of license plates. In the case of the earthquake in Mexico, people tended to disclose “personal information” by revealing birth dates, job positions, or height of missing loved ones.

The level of PII exposed differed from place to place and varied based on the severity of an event. The hurricanes did not have the same rapid initial Twitter response as the earthquake; this is in part due to the longevity of a hurricane event and the lack of accessibility to power during the event. It is possible that the effects of revealing personal information could have long-lasting effect on privacy that would only emerge months or even years after disasters. Users may be advised to follow up to erase sensitive uploads often at ordinary times, but it is especially important to follow up post-disasters.

Future work includes automation of the process of detecting PIIs in images and development of an application to evaluate the posted images on the users’ timeline in real time. Eventually, the suggested system should detect the images containing PII and inform the users about the risks they face.