Keywords

1 Introduction

Visual analytics and data visualizations provide users with new modes of interaction to explore data sets and aid in the discovery of trends in mobility. Understanding mobility patterns is an integral part of urban planning. It allows planners to explore how city inhabitants vary in land use and urban behavior. Researchers have explored many themes using mobility data. In this paper, we expand on taxi data specifically. We describe one approach of land classification through analyzing the cost of travel to different traffic analysis zones (TAZs) in Riyadh, Saudi Arabia. By analyzing drop-off cost distribution using taxi data, we can identify the median, minimum and maximum cost of travel to different parts of the city. Given that the mobility data provided is in the context of Riyadh, Saudi Arabia, a city where female mobility is prohibited, we have also visualized drop-off cost distribution for both genders, which allows us to examine TAZs that men and women spend more to travel to. These observations may help urban planners identify the demand on the city’s transportation infrastructure and possibly improve the allocation of services targeted towards specific demographics.

The paper is structured as follows. Section 2 describes some previous work linking taxi data with social dynamics. Section 3 presents a land analysis based on taxi trip drop-off cost. The discussion and lines for future work are included in Sect. 4. We conclude in Sect. 5.

2 Background

According to Zhang et al. “Traces of taxi GPS routes have been used for city-scale social event detection and analytics.” [1]. Taxi data can be used to understand many different aspects of city life, from economic activity and human behavior to mobility patterns. In the literature, taxi data is frequently used to find and understand social patterns. In [2], the authors found drops in taxi activity in New York during August 2011 and October 2012, which they linked to hurricanes. Similarly, they identified a lack of trips along certain roads, which suggested traffic was blocked in those areas. Additionally, in [3], the authors identified a method to optimize linking taxi drivers to passengers, thereby reducing gas emissions. In [4], the authors utilized taxi origin and destination data to understand urban mobility. They relied on taxi data as opposed to other public transit mobility data because it gave greater accuracy.

An exhaustive survey of the work on mining taxi traces identified three categories of work on this data: social dynamics, traffic dynamics and operational dynamics [5]. The authors define social dynamics as “the study of the collective behavior of a city’s population, based on their observed movements.” Thus, understanding social dynamics is essential for the management, design, maintenance and advancement of a city’s infrastructure [6]. Section 2.1 will expand on work in social dynamics linking taxi data to land use and Sect. 2.2 will describe some visualization work on taxi data that facilitates understanding the underlying social dynamics.

2.1 Using Taxi Data to Understand Land Use

Urban planning policies affect the world in different ways, from influencing the amount of energy consumption by controlling urban sprawl to the preservation of agricultural land [7]. One major factor that influences urban policies is transportation and mobility behavior. Understanding mobility patterns is crucial for measuring and characterizing urban form and urban policymaking [7, 8]. In [9], the author identified approaches for understanding the relationship between urban form and travel behavior. One method is to look at travel patterns.

With the emergence of new technologies, the transportation industry has revolutionized access to its services. New taxi companies have started disrupting the traditional models of transportation service [10], and through these new models, valuable urban data are captured. Researchers used these data to explore different areas in urban behavior, from traffic patterns to land classification.

Researchers in [11] concluded that data on land use is difficult to obtain, so they observed the social function of urban land by using taxi traces. Their classification technique aimed at identifying land-use classes of regions achieved a recognition accuracy of 95%. From taxi data, they found that pickup/drop-off locations “exhibited clear patterns corresponding to the land-use classes of these regions.”

In [12], the authors attempted to identify land use types (commercial, industrial, residential, institutional and recreational) using seven-day taxi trajectory data in Shanghai. They believe that increasing availability of human mobility data “is valuable for urban planners and policy makers in mitigating traffic, planning for public services and resources, and other purposes.”

2.2 Visual Representation of Transportation Data

Visually exploring data is the first step toward understanding emerging patterns and trends. There are many layers that can be explored in taxi transportation data, from GPS trajectories to rider demographics to trip cost evaluation. In [13], Hubcab creators created a visual platform that allows users to explore how to reduce the social and environmental costs embedded in transportation systems.

Additionally, in [14], a platform was developed to visualize the number of taxis operating during a specific date (chosen by the user) in Singapore. The developers used a heatmap representation to visually describe areas of taxi concentration throughout the city. Another example of applying visual analytics on taxi trajectory data appears in [15], where a graph was created to store and manifest real traffic information recorded by taxi trajectories over city streets.

Similarly, the authors in [16] analyzed taxi GPS traces collected in Lisbon, Portugal, and visualized the spatiotemporal variation of taxi services. They explored relationships between pickup and drop-off locations, and analyzed the behavior in downtime (between the previous drop-off and the following pickup). They also carried out the analysis of the predictability of taxi trips for the next pickup area type given history of taxi flow in time and space.

3 Taxi Trip Cost Analysis

3.1 Data Set

We used taxi drop-off data to examine trip cost distribution within the city of Riyadh. Data was acquired from a taxi booking service through which clients can request a car from their phone or through a website. We received the data separated into several files according to month, from May 2014 until December of 2016, with more than 122,000 finished trips in Riyadh since the service launched in Saudi Arabia. We used a gender annotation algorithm based on names [17] to find the gender of the trip requester. The spatial scale used for mapping costs to parts of the city was based on traffic analysis zones (TAZs), which are the official segmentation used in transportation planning. Segmenting the city into TAZs was based on census block information, such as population per hour, where zones tend to be smaller in denser areas and larger in areas of low density, creating 1,492 TAZs in Riyadh [18].

There are some biases in the data set that must be mentioned. First, we noticed there were considerably more female trips than male trips. This bias is expected because in Saudi Arabia, women are prohibited from driving, and limited public transportation is available. Figure 1 shows a snapshot of the gender of riders between May 2015 and December 2016. We were able to gender-annotate approximately 80% of all trips in the data set. Additionally, 86% of all gender-annotated rides in our data set were female. Around July 2016 onward, female trips drop significantly, we have no data to explain this drop yet, but it is reflected in the overall trip count of those months. The dramatic drop in December’s ridership is because we only have data on part of that month. Another important note is that our gender inference strategy relied on name annotation, and given that rider user names are self-reported, there is a possibility that they do not accurately reflect their gender. Finally, there is also no way of identifying whether the requester of the ride is the one taking it.

Fig. 1.
figure 1

Snapshot of male and female riders in the taxi data set between May 2015 and December 2016

3.2 Data Preparation

To produce the visualizations in Sect. 3.3, we began by annotating all trips with gender. To complete this step, we ran a method [17] that gender annotates the first name of the taxi customer using some Arabic-language-specific features. A data cleaning step was applied to remove all incomplete trips (where the cost was zero or null). Afterward, we linked drop-off points and cost with their associated TAZs (Fig. 2). We used TAZs because they allow us to explore the data in higher resolution than the larger defined neighborhood districts. Finally, we calculated minimum, maximum, average and median cost per TAZ. These are the scores we used to evaluate the TAZs.

Fig. 2.
figure 2

Representation of drop-off points in Riyadh

3.3 Visualizations and Observations

In this section, we show the spatial distribution of Riyadh’s taxi drop-offs in relation to TAZs and then explore the cost of drop-offs by TAZ. The trips cover 1,330 out of 1,492 TAZs. For all the choropleth maps in this section, the following description holds:

  1. 1.

    Areas that have no drop-off points are not colored.

  2. 2.

    Colors range from red to yellow, representing high to low values respectively.

  3. 3.

    Outlined in blue are twenty TAZs with the highest values.

To observe patterns of trips, we began by visualizing TAZs and their associated total number of drop-offs (Fig. 3). This step can identify areas in the city where people often go.

Fig. 3.
figure 3

Number of drop-off points per TAZ

Figure 3 indicates that there is a relatively high number of trips toward the center of the city. This area is considerably high in commercial amenities, which could be a reason why there is a higher trip count. Figure 4 shows four choropleth maps linking the cost of trips to TAZs. After applying the steps mentioned in 3.2, we generated a choropleth map for every TAZ’s median, average, maximum and minimum trip cost.

Fig. 4.
figure 4

Choropleth maps describing (a) median (b) average (c) maximum and (d) minimum trip cost in SAR per TAZ

The median cost map (Fig. 4a) and average cost map (Fig. 4b) produced similar results; sixteen out of twenty of the highest value TAZs in these figures overlap. The figures indicate that most TAZs with higher average and median values appear around the edges of the city. This could be because there is a higher population concentration in the center of the city, and traveling to TAZs farther from the center is relatively expensive. However, this needs further analysis to prove. Figure 4c and d show the minimum and maximum cost distribution of TAZs. The purpose of these maps is to explore extreme values in the data set.

Gendered Spatial Distribution of Drop-off Costs.

We were interested in exploring differences in gendered drop off costs, to spatially explore cost distribution based on gender. First, we observed gendered trip distributions by number of drop-off points per TAZ in Figs. 5 and 6. We found that six out of the 20 highest value TAZs overlap.

Fig. 5.
figure 5

Distribution of female trips per TAZ

Fig. 6.
figure 6

Distribution of male trips per TAZ

As a second step, we applied the same methodology previously described to visualize female and male drop-off costs (Fig. 7).

Fig. 7.
figure 7

Choropleth map describing the median (a, b), maximum (c, d), and minimum (e, f) trip cost in SAR per TAZ by gender.

One main pattern we noticed when looking at Fig. 7 is that women spend more money than men on taxis in this data set. In [19], Loukaitou-Sideris explains that although women predominate as users of mass transit around the globe, they may be reluctant to use taxis because they are more expensive. However, this observation contradicts our data set, as we clearly see a high number of female riders in our sample. We expect this is due to the limited availability of public transit options and the ban on driving placed on women in Riyadh.

4 Discussion

After computing the statistical distribution of different cost scores, we found that the average drop-off fare in our data set is around 13 dollars (approximately 50 SAR), the minimum fare is around 3 dollars (approximately 10 SAR) and the maximum fare is approximately 40 dollars (around 50 SAR). Finally, we were interested in examining the relationships between TAZs that attract trips with a relatively higher median cost and have a higher number of drop-off points.

We chose to explore median values as opposed to average, minimum and maximum values, because median numbers avoid extreme values, which act as noise in the data. Figure 8 shows a scatterplot that represents TAZs based on their median cost and total trip count. We filtered out TAZs with fewer than 11 trips and TAZs with a median cost over 300 SAR (80 dollars). Each point on the plot represents one TAZ. After applying the filtration step, 896 TAZs (plot points) are represented.

Fig. 8.
figure 8

Median cost and number of trips per TAZ

Figure 8 shows that most TAZs with a higher number of drop-offs fall near the average taxi fare range. More analysis is needed to identify possible causes for this. Another observation from Figs. 3 and 4a, is that TAZs that have a higher number of drop-off points are mainly concentrated around the center of the city, whereas TAZs with a relatively high median cost are around the peripheral of the city. Additional work on pickup locations, TAZ population size and point of interest distribution can provide more insight.

This association can be useful in many domains since it can be improved to help identify areas in the city people are willing to pay more to go to. When linking this information with amenity and businesses distribution in the city, business owners can observe whether TAZs that have a higher aggregated cost or a higher trip count are associated with more amenities, allowing them to identify a link between amenity saturation and taxi visits or drop-off cost. Additionally, urban planners can use this information to identify how much different demographics pay to get to different parts of the city, which can be useful in service distribution.

5 Conclusion

Linking taxi drop-off cost to different parts of the city can be used to understand social dynamics and urban behavior. Utilizing 122,000 finished trips in Riyadh, mostly gender annotated, we observed social and urban patterns that result from linking taxi drop-off points to TAZs. We included gendered drop-off and total drop-off visualizations and then recorded general observations from those visualizations. The outcome of this research is intended to explore an approach for land classification that may be useful for urban planners and business owners.