Skip to main content
Log in

Combined geo-social search: computing top-k join queries over incomplete information

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Geo-social data sets, which fuse the social and the geospatial facets of data, are vibrant data sources that associate people and activities with locations. In a combined geo-social search, several search queries are posed over geospatial and social data sources, or over data sources with both geospatial and social facets; and the search results, provided as ranked lists of items, are integrated by associating matching items, yielding combinations. Each combination has a score which is a function of the scores of the items it comprises, and the goal is to compute the k combinations with the highest score, that is, the top-k combinations. However, since geo-social data sources are heterogeneous, data items may not have matching items in all the ranked lists. Such items cannot be included in complete combinations. Hence, we study the approach where combinations are padded by nulls for missing items, as in outer-join. A combination is maximal if it cannot be extended by replacing a null by an item. We show that if some of the top-k maximal combinations contain null values, the computation requires reading entire lists, and hence, traditional top-k algorithms and optimization techniques are not as effective as in the case of an ordinary top-k join. Thus, we present two novel algorithms for computing the top-k maximal combinations. One novel algorithm is instance optimal over the class of algorithms that compute a 𝜃-approximation to the answer. The second algorithm is more efficient than the modification of two common top-k algorithms to compute maximal combinations. We show this analytically, and experimentally over real and synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. For instance, the geo-location API of Twitter provides the ability to attach geographic metadata to tweets. See http://twitter.com.

  2. Setting the score of nulls to be zero is arbitrary and any value that is smaller than the scores of non-null items can also be used. It makes sense to set this value to zero because a lacking item should not increase the score of a combination. Another alternative is to use a scoring function which is the multiplication of the item scores and in this case, all the item scores should be greater than 1 and the score of nulls should be equal to 1. All the results in this paper will remain unchanged in such a case.

  3. A preliminary version of the hybrid approach was presented in [66].

  4. Algorithms are instance optimal up to a factor that equals the number of lists.

  5. http://local.yahoo.com/

References

  1. Abid A, Tagliasacchi M (2013) Provisional reporting for rank joins. J Intell Inf Syst 40(3):479–500

    Article  Google Scholar 

  2. Abiteboul S, Kanellakis PC, Grahne G (1991) On the representation and querying of sets of possible worlds. Theor Comput Sci 78(1):158–187

    Google Scholar 

  3. Ali MS, Consens M, Gu X, Kanza Y, Rizzolo F, Stasiu R (2006) Efficient, effective and flexible xml retrieval using summaries International workshop of the initiative for the evaluation of XML retrieval. Springer, Berlin, pp 89–103

    Google Scholar 

  4. Antova L, Koch C, Olteanu D. (2007) \(10^{10^{6}}\) worlds and beyond: efficient representation and processing of incomplete information Proceedings of the 23nd international conference on data engineering. IEEE Society, Istanbul (Turkey), pp 606–615

    Google Scholar 

  5. Antova L, Koch C, Olteanu D (2007) From complete to incomplete information and back Proceedings of the ACM SIGMOD international conference on management of data. ACM, Beijing (China), pp 713–724

    Google Scholar 

  6. Antova L, Koch C, Olteanu D (2007) Maybms: managing incomplete information with probabilistic world-set decompositions Proceedings of the 23nd international conference on data engineering. IEEE Society, Istanbul (Turkey), pp 1479–1480

    Google Scholar 

  7. Arai B, Das G, Gunopulos D, Koudas N (2009) Anytime measures for top-k algorithms on exact and fuzzy data sets. VLDB J 18(2):407–427

    Article  Google Scholar 

  8. Baeza-Yates RA, Ribeiro-Neto BA (1999) Modern information retrieval. Addison-Wesley

  9. Bao J, Zheng Y, Mokbel MF (2012) Location-based and preference-aware recommendation using sparse geo-social networking data SIGSPATIAL ’12. Redondo Beach, CA, pp 199–208

    Chapter  Google Scholar 

  10. Beeri C, Doytsher Y, Kanza Y, Safra E, Sagiv Y (2005) Finding corresponding objects when integrating several geo-spatial datasets Proceedings of the 13th annual ACM international workshop on geographic information systems. ACM, Bremen, Germany, pp 87–96, doi:http://dx.doi.org/10.1145/1097064.1097078, (to appear in print)

  11. Beeri C, Kanza Y, Safra E, Sagiv Y (2004) Object fusion in geographic information systems Proceedings of the thirtieth international conference on very large data bases, vol 30. VLDB Endowment, Toronto, Canada, pp 816–827

  12. Bekhor S, Cohen S, Doytsher Y, Kanza Y, Sagiv Y (2015) A personalized geosocial app for surviving an earthquake Proceedings of the 1st ACM SIGSPATIAL international workshop on the use of GIS in emergency management. Bellevue, Washington, pp 21:1–21:6

  13. Braga D, Campi A, Ceri S, Raffio A (2008) Joining the results of heterogeneous search engines. Inf Syst 33(7-8):658–680

    Article  Google Scholar 

  14. Braga D, Ceri S, Daniel F, Martinenghi D (2008) Optimization of multi-domain queries on the web. Proceedings of the VLDB Endowment 1(1):562–573

    Article  Google Scholar 

  15. Bruno N, Chaudhuri S, Gravano L (2002) Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM Trans Database Syst 27(2):153–187

    Article  Google Scholar 

  16. Carmel D, Zwerdling N, Guy I, Ofek-Koifman S, Har’el N, Ronen I, Uziel E, Yogev S, Chernov S (2009) Personalized social search based on the user’s social network Proceedings of the 18th ACM conference on information and knowledge management, CIKM ’09. ACM, New York, pp 1227– 1236

    Chapter  Google Scholar 

  17. Ceri S, Brambilla M (eds) (2010) Search computing: challenges and directions [outcome of the first SeCO workshop on search computing challenges and directions, Como, Italy, June 17–19, 2009], Lecture Notes in Computer Science, vol 5950. Springer

  18. Chandra AK, Merlin PM (1977) Optimal implementation of conjunctive queries in relational data bases Proceedings of the ninth annual ACM symposium on theory of computing, STOC ’77. ACM, Boulder, pp 77–90

    Chapter  Google Scholar 

  19. Cohen S, Fadida I, Kanza Y, Kimelfeld B, Sagiv Y (2006) Full disjunctions: polynomial-delay iterators in action Proceedings of the 32nd international conference on very large data bases, VLDB ’06. VLDB Endowment, Seoul, pp 739–750

    Google Scholar 

  20. Cohen S, Sagiv Y (2007) An incremental algorithm for computing ranked full disjunctions. J Comput Syst Sci 73(4):648–668

    Article  Google Scholar 

  21. Croitoru A, Crooks A, Radzikowski J, Stefanidis A (2013) Geosocial gauge: a system prototype for knowledge discovery from social media. IJGIS 27(12):2483–2508

    Google Scholar 

  22. Das G, Gunopulos D, Koudas N, Tsirogiannis D (2006) Answering top-k queries using views Proceedings of the 32nd international conference on very large data bases, VLDB ’06. VLDB Endowment, pp 451–462

  23. Date CJ (1983) The outer join Proceedings of the 2nd international conference on databases. Cambridge Press, Cambridge, pp 76–106

    Google Scholar 

  24. Doytsher Y, Galon B, Kanza Y (2010) Querying geo-social data by bridging spatial networks and social networks Proceedings of the 2nd ACM SIGSPATIAL international workshop on location based social networks. ACM, San Jose, California, pp 39–46

    Chapter  Google Scholar 

  25. Doytsher Y, Galon B, Kanza Y (2011) Storing routes in socio-spatial networks and supporting social-based route recommendation Proceedings of the 3rd ACM SIGSPATIAL international workshop on location-based social networks. ACM, Chicago, Illinois, pp 49–56

    Google Scholar 

  26. Doytsher Y, Galon B, Kanza Y (2012) Querying socio-spatial networks on the world-wide web Proceedings of the 21st international conference on world wide web. ACM, Lyon, France, pp 329– 332

    Google Scholar 

  27. Elbery A, ElNainay M, Chen F, Lu CT, Kendall J (2013) A carpooling recommendation system based on social vanet and geo-social data Proceedings of the 21st ACM SIGSPATIAL international conference on advances in geographic information systems. ACM, Orlando, Florida, pp 556–559

    Google Scholar 

  28. Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, Santa Barbara, pp 102–113

    Chapter  Google Scholar 

  29. Fagin R, Lotem A, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656

    Article  Google Scholar 

  30. Flatow D, Naaman M, Xie KE, Volkovich Y, Kanza Y (2015) On the accuracy of hyper-local geotagging of social media content Proceedings of the eighth ACM international conference on web search and data mining. ACM, Shanghai, pp 127–136

    Google Scholar 

  31. Galiando-Legaria C, Rosenthal A (1997) Outerjoin simplification and reordering for query optimization. ACM Trans Database Syst 22(1):43–73

    Article  Google Scholar 

  32. Galindo-Legaria CA (1994) Outerjoins as disjunctions Proceedings of the 1994 ACM SIGMOD international conference on management of data. ACM, Minneapolis (Minnesota), pp 348–358

    Chapter  Google Scholar 

  33. Garcia-Molina H, Ullman JD, Widom J (2008) Database systems: the complete book, 2nd edn. Prentice Hall Press, Upper Saddle River

    Google Scholar 

  34. de Graaff V, van Keulen M, de By RA (2012) Towards geosocial recommender systems Proceedings of the 4th international workshop on web intelligence & communities. ACM, Lyon, France, pp 8:1–8:4

  35. Grabovitch-Zuyev I, Kanza Y, Kravi E, Pat B (2007) On the correlation between textual content and geospatial locations in microblogs Proceedings of workshop on managing and mining enriched geo-spatial data. ACM, Snowbird, UT, USA, pp 3:1–3:6

  36. Ilyas F, Aref G, Elmagarmid K (2004) Supporting top-k join queries in relational databases. VLDB J 13(3):207–221

    Article  Google Scholar 

  37. Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv. 40(4):1–58

    Article  Google Scholar 

  38. Ilyas IF, Martinenghi D, Tagliasacchi M (2010) Chapter 11: rank-join algorithms for search computing. In: Ceri S, Brambilla M (eds) Search computing. Springer, Berlin, Heidelberg, pp 211–224

    Chapter  Google Scholar 

  39. Ilyas IF, Shah R, Aref WG, Vitter JS, Elmagarmid AK (2004) Rank-aware query optimization Proceedings of the 2004 ACM SIGMOD international conference on management of data, SIGMOD ’04. ACM, New York, NY, USA, pp 203–214

    Chapter  Google Scholar 

  40. Ilyas IF, Soliman MA (2011) Probabilistic ranking techniques in relational databases. Synthesis lectures on data management. Morgan & Claypool Publishers, California

    Google Scholar 

  41. Kanza Y (2016) Uncertainty in geosocial data: friend or foe? SIGSPATIAL Special 8(2):3–10

    Article  Google Scholar 

  42. Kanza Y, Nutt W, Sagiv Y (1999) Queries with incomplete answers over semistructured data Proceedings of the eighteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Philadelphia, Pennsylvania, USA, pp 227–236

  43. Kanza Y, Nutt W, Sagiv Y (2002) Querying incomplete information in semistructured data. J Comput Syst Sci 64(3):655–693

    Article  Google Scholar 

  44. Kanza Y, Sagiv Y (2003) Computing full disjunctions Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, San Diego, California, pp 78–89

    Chapter  Google Scholar 

  45. Kanza Y, Samet H (2015) An online marketplace for geosocial data 23rd ACM SIGSPATIAL international conference on advances in geographic information systems. Seattle, Washington, USA

  46. Karamshuk D, Noulas A, Scellato S, Nicosia V, Mascolo C (2013) Geo-spotting: Mining online location-based services for optimal retail store placement KDD ’13. Chicago, pp 793–801

  47. Kimelfeld B, Sagiv Y (2006) Finding and approximating top-k answers in keyword proximity search Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, Chicago, IL, USA, pp 173–182

    Chapter  Google Scholar 

  48. Levin R, Kanza Y (2012) Interactive traffic-aware route search on smartphones Proceedings of the first ACM SIGSPATIAL international workshop on mobile geographic information systems. ACM, Redondo Beach, California, pp 1–8

    Google Scholar 

  49. Levin R, Kanza Y (2014) TARS: traffic-aware route search. GeoInformatica 18(3):461–500

    Article  Google Scholar 

  50. Libkin L (1995) A semantics-based approach to design of query languages for partial information Semantics in databases. Lecture Notes in Computer Science, Prague (Czech Republic), pp 170–208

    Google Scholar 

  51. Mamoulis N, Yiu ML, Cheng KH, Cheung DW (2007) Efficient top-k aggregation of ranked inputs. ACM Trans Database Syst 32(3):1–47

    Article  Google Scholar 

  52. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  Google Scholar 

  53. Marian A, Amer-Yahia S, Koudas N, Srivastava D (2005) Adaptive processing of top-k queries in xml Proceedings of the 21st international conference on data engineering, ICDE ’05. IEEE Computer Society, Washington, DC, USA, pp 162–173

    Google Scholar 

  54. Marian A, Bruno N, Gravano L (2004) Evaluating top-k queries over web-accessible databases. ACM Trans Database Syst 29(2):319–362

    Article  Google Scholar 

  55. Markowetz A, Brinkhoff T, Seeger B (2005) Geographic information retrieval. Next Generation Geospatial Information: From Digital Image Analysis to Spatiotemporal Databases 3:5

    Google Scholar 

  56. Martinenghi D, Tagliasacchi M (2012) Proximity measures for rank join. ACM Trans Database Syst 37(1):2:1–2:46

    Article  Google Scholar 

  57. Mendelzon AO, Mihaila GA (2001) Querying partially sound and complete data sources Proceedings of the 20th symposium on principles of database systems. ACM Press, Santa Barbara (California, USA)

  58. Natsev A, Chang YC, Smith JR, Li CS, Vitter JS (2001) Supporting incremental join queries on ranked inputs Proceedings of the 27th international conference on very large data bases, VLDB ’01. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 281–290

  59. Pat B, Kanza Y, Naaman M (2015) Geosocial search: Finding places based on geotagged social-media posts Proceedings of the 24th international conference on world wide web. ACM, Florence, Italy, pp 231–234

    Google Scholar 

  60. Rajaraman A, Ullman J (1996) Integrating information by outerjoins and full disjunctions Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM Press, Montreal (Canada), pp 238–248

    Chapter  Google Scholar 

  61. Re C, Dalvi NN, Suciu D (2007) Efficient top-k query evaluation on probabilistic data Proceedings of the 23rd international conference on data engineering. IEEE Computer Society, Istanbul (Turkey)

  62. Safra E, Kanza Y, Sagiv Y, Beeri C, Doytsher Y (2010) Location-based algorithms for finding sets of corresponding objects over several geo-spatial data sets. Int J Geogr Inf Sci 24(1):69–106

    Article  Google Scholar 

  63. Salton G, McGill M (1983) Introduction to modern information retrieval. McGraw-Hill

  64. Samet H (1990) The design and analysis of spatial data structures vol, vol 85. Addison-Wesley, Reading, MA

  65. Schnaitter K, Polyzotis N (2008) Evaluating rank joins with optimal cost Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’08. ACM, Vancouver, Canada, pp 43–52

    Chapter  Google Scholar 

  66. Shalem M, Kanza Y (2010) Computing the top-k maximal answers in a join of ranked lists Proceedings of the 19th ACM international conference on information and knowledge management. ACM, Toronto, ON, Canada, pp 1381–1384

    Google Scholar 

  67. Shalem M, Kanza Y (2012) On optiMality-ratio and coverage in ranking of joined search results. Distrib Parallel Databases 30(3–4):209–237

    Article  Google Scholar 

  68. Soliman MA, Chang KCC, Ilyas IF (2007) Top-k query processing in uncertain databases Proceedings of the 23rd international conference on data engineering. IEEE Computer Society, Istanbul (Turkey)

  69. Soliman MA, Ilyas IF, Ben-David S (2010) Supporting ranking queries on uncertain and incomplete data. VLDB J 19(4):477–501

    Article  Google Scholar 

  70. Soliman MA, Ilyas IF, Martinenghi D, Tagliasacchi M (2011) Ranking with uncertain scoring functions: semantics and sensitivity measures Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, Athens, Greece, pp 805–816

    Chapter  Google Scholar 

  71. Stupar A, Michel S (2012) Being picky: processing top-k queries with set-defined selections Proceedings of the 21st ACM international conference on information and knowledge management. ACM, New York, NY, USA, pp 912–921

    Google Scholar 

  72. Theobald M, Weikum G, Schenkel R (2004) Top-k query evaluation with probabilistic guarantees Proceedings of the thirtieth international conference on very large data bases, vol 30. VLDB Endowment, Toronto, Canada, pp 648–659

  73. Wagner A, Duc TT, Ladwig G, Harth A, Studer R (2012) Top-k linked data query processing Proceedings of the 9th international conference on the semantic web: research and applications. Springer, Berlin, Heidelberg, pp 56–71

    Chapter  Google Scholar 

  74. Wu M, Berti-Équille L, Marian A, Procopiuc CM, Srivastava D (2010) Processing top-k join queries. Proc VLDB Endow 3(1-2):860–870

    Article  Google Scholar 

  75. Xia C, Schwartz R, Xie K, Krebs A, Langdon A, Ting J, Naaman M (2014) Citybeat: Real-time social media visualization of hyper-local city data Proceedings of the 23rd international conference on world wide web. ACM, Seoul, Korea, pp 167–170

    Google Scholar 

  76. Ye Y, Zheng Y, Chen Y, Feng J, Xie X (2009) Mining individual life pattern based on location history Proceedings of the 2009 tenth international conference on mobile data management: systems, services and middleware. IEEE Computer Society, Washington, DC, USA, pp 1–10

    Google Scholar 

  77. Yi K, Li F, Kollios G, Srivastava D (2008) Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Trans Knowl Data Eng 20(12):1669–1682

    Article  Google Scholar 

  78. Zhang W, Lin X, Zhang Y, Pei J, Wang W (2010) Threshold-based probabilistic top-k dominating queries. VLDB J 19(2):283–305

    Article  Google Scholar 

Download references

Acknowledgements

This research was cnducted while Yaron Kanza was a visiting assistant professor at Jacobs Institute, Cornell Tech. This research was supported in part by the Israel Science Foundation (Grant 1467/13) and by the Isreali Ministry of Science and Technology (Grant 3-9617). We thank the anonymous reviewers for their insightful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaron Kanza.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kanza, Y., Shalem, M. Combined geo-social search: computing top-k join queries over incomplete information. Geoinformatica 22, 615–660 (2018). https://doi.org/10.1007/s10707-017-0297-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-017-0297-y

Keywords

Navigation