Skip to main content
Log in

Knowledge extraction from crowdsourced data for the enrichment of road networks

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

In current navigation systems quantitative metrics such as distance, time and energy are used to determine optimal paths. Yet, a “best path”, as judged by users, might take qualitative features into account, for instance the scenery or the touristic attractiveness of a path. Machines are unable to quantify such “soft” properties. Crowdsourced data provides us with a means to record user choices and opinions. In this work, we survey heterogeneous sources of spatial, spatio-temporal and textual crowdsourced data as a proxy for qualitative information of users in movement. We (i) explore the process of extracting qualitative information from uncertain crowdsourced data sets employing different techniques, (ii) investigate the enrichment of road networks with the extracted information by adjusting its properties and by building a meta-network, and (iii) show how to use the enriched networks for routing purposes. An extensive experimental evaluation of our proposed methods on real-world data sets shows that qualitative properties as captured by crowdsourced data can indeed be used to improve the quality of routing suggestions while not sacrificing their quantitative aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. www.foursquare.com, www.yelp.com, www.twitter.com, www.flickr.com

  2. www.openstreetmap.org, www.capitalbikeshare.com, www.endomondo.com

  3. www.travelblog.com, www.traveljournal.com, www.travelpod.com

  4. www.geonames.org

  5. www.openstreetmap.org

  6. Insight Guides: Explore Paris, https://www.insightguides.com/shop/product/insight-guides-explore-paris/9781786716590,seeGoogleBooksforexcerpts.

References

  1. Goodchild MF (2007) Citizens as sensors: the world of volunteered geography. GeoJournal 69:211–221

    Article  Google Scholar 

  2. Skoumas G, Schmid KA, Jossé G, Züfle A, Nascimento MA, Renz M, Pfoser D (2014) Towards knowledge-enriched path computation. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, Dallas/Fort Worth, TX, USA, November 4-7, 2014, pp 485–488

  3. Skoumas G, Schmid KA, Jossé G, Schubert M, Nascimento MA, Züfle A, Renz M, Pfoser D (2015) Knowledge-enriched route computation. In: Advances in spatial and temporal databases - 14th international symposium, SSTD 2015, Hong Kong, China, August 26-28, 2015. Proceedings, pp 157–176

  4. Jossé G, Franzke M, Skoumas G, Züfle A, Nascimento MA, Renz M (2015) A framework for computation of popular paths from crowdsourced data. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pp 1428–1431

  5. Jossé G, Schmid KA, Züfle A, Skoumas G, Schubert M, Pfoser D (2015) Tourismo: A user-preference tourist trip search engine. In: Advances in spatial and temporal databases - 14th international symposium, SSTD 2015, Hong Kong, China, August 26-28, 2015. Proceedings, pp 514–519

  6. Lv M, Chen L, Chen G (2012) Discovering personally semantic places from gps trajectories. In: ACM CIKM12, pp 1552–1556

  7. Yan Z, Chakraborty D, Parent C, Spaccapietra S, Aberer K (2011) Semitri: A framework for semantic annotation of heterogeneous trajectories. In: Proceedings of the 14th international conference on extending database technology, pp 259–270

  8. Palma AT, Bogorny V, Kuijpers B, Alvares LO (2008) A clustering-based approach for discovering interesting places in trajectories. In: Proceedings of the ACM symposium on applied computing , pp 863–868

  9. Alvares LO, Bogorny V, Kuijpers B, de Macedo JAF, Moelans B, Vaisman A (2007) A model for enriching trajectories with semantic geographical information. In: Proceedings of the 15th annual ACM international symposium on advances in geographic information systems. ACM, pp 22:1–22:8

  10. Parent C, Spaccapietra S, Renso C, Andrienko G, Andrienko N, Bogorny V, Damiani ML, Gkoulalas-Divanis A, Macedo J, Pelekis N, Theodoridis Y, Yan Z (2013) Semantic trajectories modeling and analysis. ACM Comput Surv ’13 45:42:1–42:32

    Google Scholar 

  11. Spaccapietra S, Parent C (2011) Adding meaning to your steps. In: Proceedings of the 30th international conference on conceptual modeling, pp 13–31

  12. Yan Z, Chakraborty D, Parent C, Spaccapietra S, Aberer K (2013) Semantic trajectories: Mobility data computation and annotation. ACM Trans Intell Syst Technol 4:49:1–49:38

    Article  Google Scholar 

  13. Yan Z, Spremic L, Chakraborty D, Parent C, Spaccapietra S, Aberer K (2010) Automatic construction and multi-level visualization of semantic trajectories. In: Proceedings of the 18th international conference on advances in geographic information systems, pp 524–525

  14. Feldman D, Sugaya A, Sung C, Rus D (2013) idiary: From gps signals to a text-searchable diary. In: Proceedings of the 11th ACM conference on embedded networked sensor systems. ACM , pp 6:1–6:12

  15. Duckham M, Kulik L (2003) Simplest paths: Automated route selection for navigation. In: Proceedings of the international conference on spatial information theory COSIT 2003. Springer, pp 169–185

  16. Duckham M, Winter S, Robinson M (2010) Including landmarks in routing instructions. Journal on Location Based Services 4(4):28–52

    Article  Google Scholar 

  17. Richter KF, Duckham M (2008) Simplest instructions: Finding easy-to-describe routes for navigation. In: Proceedings of 5th international conference geographic information science 2008, pp 274–289

  18. Westphal M, Renz J (2011) Evaluating and minimizing ambiguities in qualitative route instructions. In: Proceedings of the 19th ACM international conference on advances in geographic information systems, pp 171–180

  19. Sacharidis D, Bouros P (2013) Routing directions: Keeping it fast and simple. In: Proceedings of the 21st ACM international conference on advances in geographic information systems, pp 164–173

  20. Quercia D, Schifanella R, Aiello LM (2014) The shortest path to happiness: Recommending beautiful, quiet, and happy routes in the city. CoRR (abs/1407.1031

  21. Sun Y, Fan H, Bakillah M, Zipf A (2015) Road-based travel recommendation using geo-tagged images. Comput Environ Urban Syst 53:110–122

    Article  Google Scholar 

  22. Runge N, Samsonov P, Degraen D, Schöning J (2016) No more autobahn!: Scenic route generation using googles street view. In: Proceedings of the 21st international conference on intelligent user interfaces. ACM, pp 147–151

  23. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  24. Li F, Cheng D, Hadjieleftheriou M, Kollios G, Teng SH (2005) On trip planning queries in spatial databases. In: Proceedings of the 9th international conference on advances in spatial and temporal databases. SSTD’05. Springer-Verlag, Berlin, Heidelberg, pp 273–290

  25. Kanza Y, Safra E, Sagiv Y, Doytsher Y (2008) Heuristic algorithms for route-search queries over geographical data. In: ACM SIGSPATIAL GIS, p 11

  26. Chen H, Ku WS, Sun MT, Zimmermann R (2011) The partial sequenced route query with traveling rules in road networks. Geoinformatica 15:541–569

    Article  Google Scholar 

  27. Gavalas KC, Mastakas K, Pantziou G (2014) A survey on algorithmic approaches for solving tourist trip design problems. J Heuristics 20:291–328

    Article  Google Scholar 

  28. Garcia A, Arbelaitz O, Linaza MT, Vansteenwegen P, Souffriau W (2010) Personalized tourist route generation. Springer

  29. Chen C, Zhang D, Guo B, Ma X, Pan G, Wu Z (2015) Tripplanner: personalized trip planning leveraging heterogeneous crowdsourced digital footprints. IEEE Trans Intell Transp Syst 16:1259–1273

    Article  Google Scholar 

  30. Hsieh HP, Li CT, Lin SD (2012) Exploiting large-scale check-in data to recommend time-sensitive routes. In: Proceedings of the ACM SIGKDD international workshop on urban computing. ACM

  31. Brilhante I, Macedo JA, Nardini FM, Perego R, Renso C (2013) Where shall we go today?: Planning touristic tours with tripbuilder. In: Proceedings of the 22nd ACM international conference on information & knowledge management. CIKM ’13. ACM, New york, NY, USA, pp 757–762

  32. Hsieh H, Li C (2014) Mining and planning time-aware routes from check-in data. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, China, November 3-7, 2014, pp 481–490

  33. De Choudhury M, Feldman M, Amer-Yahia S, Golbandi N, Lempel R, Yu C (2010) Automatic construction of travel itineraries using social breadcrumbs. In: Proceedings of the 21st ACM conference on hypertext and hypermedia. HT ’10. ACM, New york, NY, USA, pp 35–44

  34. Vansteenwegen P, Souffriau W, Oudheusden DV (2011) The orienteering problem: a survey. Eur J Oper Res 209:1–10

    Article  Google Scholar 

  35. Souffriau W, Vansteenwegen P, Berghe GV, Oudheusden DV (2011) The planning of cycle trips in the province of east flanders. Omega 39:209–213

    Article  Google Scholar 

  36. Verbeeck C, Vansteenwegen P, Aghezzaf EH (2014) An extension of the arc orienteering problem and its application to cycle trip planning. Transport Res E-Log 68 :64–78

    Article  Google Scholar 

  37. Lu Y, Shahabi C (2015) An arc orienteering algorithm to find the most scenic path on a large-scale road network. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems. GIS ’15. ACM, New York, NY, USA, pp 46:1–46:10

  38. Hauff C (2013) A study on the accuracy of flickr’s geotag data. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 1037–1040

  39. Zielstra D, Hochmair HH (2013) Positional accuracy analysis of flickr and panoramio images for selected world regions. J Spat Sci 58:251–273

    Article  Google Scholar 

  40. Zandbergen PA (2008) Positional accuracy of spatial data: Non-normal distributions and a critique of the national standard for spatial data accuracy. Trans GIS 12:103–130

    Article  Google Scholar 

  41. Skoumas G, Pfoser D, Kyrillidis A, Sellis T (2016) Location estimation using crowdsourced spatial relations. ACM Trans Spatial Algorithms Syst 2 :5:1–5:23

    Article  Google Scholar 

  42. Loper E, Bird S (2002) Nltk: The natural language toolkit. In: Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, vol 1, pp 63–70

  43. Skoumas G, Pfoser D, Kyrillidis A (2013) On quantifying qualitative geospatial data: A probabilistic approach. In: Proceedings of the second ACM international workshop on crowdsourced and volunteered geographic information, pp 71–78

  44. Schlieder C, Matyas C (2009) Photographing a city: an analysis of place concepts based on spatial choices. Spat Cogn Comput 9:212–228

    Google Scholar 

  45. Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1:269–271

    Article  Google Scholar 

  46. Schubert M, Renz M, Kriegel HP (2010) Route skyline queries: a multi-preference path planning approach. In: Proceedings ICDE, pp 261–272

  47. Shekelyan M, Jossé G., Schubert M (2015) Paretoprep: Efficient lower bounds for path skylines and fast path computation SSTD15

  48. Shekelyan M, Jossé G, Schubert M, Kriegel H (2014) Linear path skyline computation in bicriteria networks. In: Database systems for advanced applications - 19th international conference, DASFAA 2014, bali, indonesia, april 21-24 2014. Proceedings, Part I, pp 173–187

  49. Shekelyan M, Jossé G, Schubert M (2015) Linear path skylines in multicriteria networks. In: ICDE15, pp 459–470

  50. Neis P, Zielstra D (2014) Recent developments and future trends in volunteered geographic information research: The case of openstreetmap. Future Internet 6:76–106

    Article  Google Scholar 

  51. Neis P, Zielstra D, Zipf A (2013) Comparison of volunteered geographic information data contributions and community development for selected world regions. Future Internet 5:282–300

    Article  Google Scholar 

  52. Graf F, Kriegel HP, Renz M, Schubert M (2011) Mario: multi-attribute routing in open street map. In: Advances in spatial and temporal databases. vol 6849 of lecture notes in computer science , pp 486–490

  53. Mousselly-Sergieh H, Watzinger D, Huber B, Döller M, Egyed-Zsigmond E, Kosch H (2014) World-wide scale geotagged image dataset for automatic image annotation and reverse geotagging. In: Proceedings of the 5th ACM multimedia systems conference, pp 47–52

  54. Yang D, Zhang D, Chen L, Qu B (2015) Nationtelescope: monitoring and visualizing large-scale collective behavior in lbsns. J Netw Comput Appl 55:170–180

    Article  Google Scholar 

  55. Yang D, Zhang D, Qu B (2015) Participatory cultural mapping based on collective behavior in location based social networks. ACM Transactions on Intelligent Systems and Technology. in press

  56. Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer-verlag New York, inc., secaucus, NJ, USA

    Google Scholar 

  57. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B 39:1–38

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Schubert.

Additional information

Mario A. Nascimento’s work was partially supported by NSERC, Canada.

Appendix: Mining Spatial Closeness from Textual Data

Appendix: Mining Spatial Closeness from Textual Data

As presented in [3], we rely on a probabilistic model to counter the ambiguity inherent in textual data. This is done in three steps. First, we create feature vectors for each occurrence of a particular spatial closeness relation. Second, employing the feature vectors we train a Gaussian Mixture Model (GMM) for each kind of closeness relation. Third, we use the derived GMMs to infer a closeness score which can be used to enrich the underlying road network.

For each occurrence of a closeness relation (e.g., “next to”) mined from the text (e.g., the triplet (P i ,“next to”,P j )), a spatial feature vector v i j = (r, ϕ) is created. v i j describes the Euclidean distance between P i and P j and the orientation as the counterclockwise rotation of the x-axis, centered at P j , to P i . Thus, for any triplet extracted from the corpus (sufficiently often) we obtain a two-dimensional feature vector. We group these vectors by the relation they represent, obtaining a set of vectors \(V_{REL^{k}}\) for each of the closeness relations R E L k. For each relation, we propose to train a probabilistic model. Following the argumentation in [43] where similar relations were examined, we propose employing GMMs, a well-proven and extensively studied method for many supervised and unsupervised learning problems [56].

A GMM is a weighted sum of a fixed number M of Gaussian component densities

$$\mathbb{P}_{\theta}(v) = \sum\limits_{i = 1}^{M} w_{i} g(v;\mu_{i},{\Sigma}_{i}) $$

where v is an l-dimensional vector, w i are the mixture weights (summing to 1) and

$$g(v;\mu_{i}, {\Sigma}_{i}) = \frac{1}{\left( (2\pi)^{l} \det({\Sigma}_{i})\right)^{1/2}} \exp \left( -\frac{1}{2}(v-\mu_{i})^{T}{\Sigma}_{i}(v-\mu_{i})\right) $$

is an l-variate Gaussian density function with mean vector \(\mu _{i} \in \mathbb {R}^{l}\) and covariance matrix \({\Sigma }_{i} \in \mathbb {R}^{l\times l}\). The model is fully characterized by the weights, mean vectors and covariance matrices, collectively represented in 𝜃 = {w i , μ i i },i = 1,…,M. In our case, l = 2, the dimensionality of the spatial feature vectors v. For the parameter estimation of each Gaussian component, Expectation Maximization (EM) [57] is the state-of-the-art technique. It updates the parameters of the components iteratively w.r.t. a given (feature) vector set until a convergence threshold is reached.

Employing EM, we obtain a set of probabilistic models \(\mathbb {P}(\cdot \mid \theta ^{k})\), one for each closeness relation R E L k ∈{R E L 1,…,R E L m} in the set of all closeness relations. For a pair of POIs P i and P j with spatial feature vector v i j , \(\mathbb {P}_{\theta ^{k}}(v_{ij})\) is the probability that P i and P j stand in spatial relation R E L k to each other. Based on this information, we now derive a closeness score for pairs of POIs by Bayesian inference.

For two distinct POIs, let R E L i j denote the set of all closeness relations existing between P i and P j . Note that R E L k denotes an abstract relation, while R E L i j denotes the set of occurrences of relations between a pair of POIs. Furthermore, let v i j denote the spatial feature vector of P i and P j . In order to determine a numeric score describing the closeness of the POIs, we estimate the posterior probability of all closeness relations R E L k and accumulate them. The posterior probability of relation R E L k is given by

$$\mathbb{P}(REL^{k} \mid v_{ij}) = \frac{\mathbb{P}(v_{ij} \mid \theta^{k})\ \mathbb{P}(\theta^{k})}{{\sum}_{l=1}^{m} \mathbb{P}(v_{ij} \mid \theta^{l})\ \mathbb{P}(\theta^{l})} $$

where \(\mathbb {P}(\theta ^{k}) = \mathbb {P}(REL^{k})\) denotes the prior probability of relation R E L k given by the trained model represented by 𝜃 k.

In a traditional classification problem the task would be to choose the closeness relation with the highest posterior and to assign the pair of POIs to this class. We, in contrast, consider each posterior probability \(\mathbb {P}(REL^{k} \mid v_{ij})\) as a measure of confidence of the existence of R E L k between P i and P j . Stressing that all considered relations represent spatial closeness, we combine all posteriors into one measure which we call closeness score c s i j of the pair of POIs P i and P j :

$$cs_{ij} = \frac{1}{m} \sum\limits_{k=1}^{m} \frac{\mathbb{P}(REL^{k} \mid v_{ij})}{\max\{\mathbb{P}(REL^{k} \mid v_{ij}) \mid \forall \ i\neq j\}} $$

This closeness score can be used to enrich the underlying road network.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jossé, G., Schmid, K., Züfle, A. et al. Knowledge extraction from crowdsourced data for the enrichment of road networks. Geoinformatica 21, 763–795 (2017). https://doi.org/10.1007/s10707-017-0306-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-017-0306-1

Keywords

Navigation