Skip to main content

Characterizing Robotic and Organic Query in SPARQL Search Sessions

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2020)

Abstract

SPARQL, as one of the most powerful query languages over knowledge graphs, has gained significant popularity in recent years. A large amount of SPARQL query logs have become available and provided new research opportunities to discover user interests, understand query intentions, and model search behaviors. However, a significant portion of the queries to SPARQL endpoints on the Web are robotic queries that are generated by automated scripts. Detecting and separating these robotic queries from those organic ones issued by human users is crucial to deep usage analysis of knowledge graphs. In light of this, in this paper, we propose a novel method to identify SPARQL queries based on session-level query features. Specifically, we define and partition SPARQL queries into different sessions. Then, we design an algorithm to detect loop patterns, which is an important characteristic of robotic queries, in a given query session. Finally, we employ a pipeline method that leverages loop pattern features and query request frequency to distinguish the robotic and organic SPARQL queries. Differing from other machine learning based methods, the proposed method can identify the query types accurately without labelled data. We conduct extensive experiments on six real-world SPARQL query log datasets. The results demonstrate that our approach can distinguish robotic and organic queries effectively and only need \(7.63 \times 10^{-4}\) s on average to process a query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://linkeddata.org/.

  2. 2.

    https://sparqles.ai.wu.ac.at/availability.

  3. 3.

    https://wiki.dbpedia.org/.

  4. 4.

    http://httpd.apache.org/docs/current/mod/mod_log_config.html.

  5. 5.

    http://affymetrix.bio2rdf.org/sparql.

  6. 6.

    http://dbsnp.bio2rdf.org/sparql.

  7. 7.

    http://gendr.bio2rdf.org/sparql.

  8. 8.

    http://goa.bio2rdf.org/sparql.

  9. 9.

    http://linkedspl.bio2rdf.org/sparql.

  10. 10.

    http://linkedgeodata.org/sparql.

  11. 11.

    https://pypi.org/project/fuzzywuzzy/.

  12. 12.

    We only consider queries without parse errors and merge the same queries in adjacent positions. For instance, a sequence [0, 1, 1, 1, 2] (in which 0, 1, 2 means the query id) can be processed to [0, 1, 2].

References

  1. Arias, M., Fernández, J.D., Martínez-Prieto, M.A., de la Fuente, P.: An empirical study of real-world sparql queries. arXiv preprint arXiv:1103.5043 (2011)

  2. Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)

    Article  Google Scholar 

  3. Berendt, B., Hollink, L., Hollink, V., Luczak-Rösch, M., Möller, K., Vallet, D.: USEWOD 2011: 1st international workshop on usage analysis and the web of data. In: The 20th International Conference on World Wide Web, pp. 305–306 (2011)

    Google Scholar 

  4. Bielefeldt, A., Gonsior, J., Krötzsch, M.: Practical linked data access via SPARQL: the case of wikidata. In: The 11th Workshop on Linked Data on the Web, pp. 1–10 (2018)

    Google Scholar 

  5. Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. VLDB J. 1–25 (2017)

    Google Scholar 

  6. Bonifati, A., Martens, W., Timm, T.: Navigating the maze of wikidata query logs. In: The World Wide Web Conference, pp. 127–138 (2019)

    Google Scholar 

  7. Haklay, M., Weber, P.: OpenStreetMap: user-generated street maps. IEEE Pervasive Comput. 7(4), 12–18 (2008)

    Article  Google Scholar 

  8. Han, X., Feng, Z., Zhang, X., Wang, X., Rao, G., Jiang, S.: On the statistical analysis of practical SPARQL queries. In: The 19th International Workshop on Web and Databases, pp. 1–6 (2016)

    Google Scholar 

  9. Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 query language. W3C Recommendation 21(10), 778 (2013)

    Google Scholar 

  10. Kang, H., Wang, K., Soukal, D., Behr, F., Zheng, Z.: Large-scale bot detection for search engines. In: The 19th International Conference on World Wide Web, pp. 501–510 (2010)

    Google Scholar 

  11. Klyne, G., Carroll, J.J., McBride, B.: Resource description framework (RDF): concepts and abstract syntax. W3C Recommendation (2004)

    Google Scholar 

  12. Lorey, J., Naumann, F.: Detecting SPARQL query templates for data prefetching. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 124–139. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38288-8_9

    Chapter  Google Scholar 

  13. Möller, K., Hausenblas, M., Cyganiak, R., Grimnes, G.A., Handschuh, S.: Learning from linked open data usage: patterns & metrics. In: The WebSci10: Extending the Frontiers of Society On-Line, pp. 1–8 (2010)

    Google Scholar 

  14. Picalausa, F., Vansummeren, S.: What are real SPARQL queries like? In: The International Workshop on Semantic Web Information Management, pp. 1–6 (2011)

    Google Scholar 

  15. Raghuveer, A.: Characterizing machine agent behavior through SPARQL query mining. In: The International Workshop on Usage Analysis and the Web of Data, pp. 1–8 (2012)

    Google Scholar 

  16. Rico, M., Touma, R., Queralt Calafat, A., Pérez, M.S.: Machine learning-based query augmentation for SPARQL endpoints. In: The 14th International Conference on Web Information Systems and Technologies, pp. 57–67 (2018)

    Google Scholar 

  17. Rietveld, L., Hoekstra, R., et al.: Man vs. machine: Differences in SPARQL queries. In: The 4th USEWOD Workshop on Usage Analysis and the Web of of Data, pp. 1–7 (2014)

    Google Scholar 

  18. Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngomo, A.-C.N.: LSQ: the linked SPARQL queries dataset. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 261–269. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_15

    Chapter  Google Scholar 

  19. Shakiba, T., Zarifzadeh, S., Derhami, V.: Spam query detection using stream clustering. World Wide Web 21(2), 557–572 (2017). https://doi.org/10.1007/s11280-017-0471-z

    Article  Google Scholar 

  20. Stadler, C., Lehmann, J., Höffner, K., Auer, S.: LinkedGeoData: a core for a web of spatial open data. Semant. Web 3(4), 333–354 (2012)

    Article  Google Scholar 

  21. Stegemann, T., Ziegler, J.: Pattern-based analysis of SPARQL queries from the LSQ dataset. In: International Semantic Web Conference (Posters, Demos & Industry Tracks), pp. 1–4 (2017)

    Google Scholar 

  22. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    Article  Google Scholar 

  23. Zhang, W.E., Sheng, Q.Z., Qin, Y., Yao, L., Shemshadi, A., Taylor, K.: SECF: improving SPARQL querying performance with proactive fetching and caching. In: The 31st Annual ACM Symposium on Applied Computing, pp. 362–367 (2016)

    Google Scholar 

Download references

Acknowledgement

This work was supported by National Science Foundation of China with Grant Nos. 61906037 and U1736204; National Key Research and Development Program of China with Grant Nos. 2018YFC0830201 and 2017YFB1002801; the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, X., Wang, M., Zhao, B., Liu, R., Zhang, J., Yang, H. (2020). Characterizing Robotic and Organic Query in SPARQL Search Sessions. In: Wang, X., Zhang, R., Lee, YK., Sun, L., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2020. Lecture Notes in Computer Science(), vol 12317. Springer, Cham. https://doi.org/10.1007/978-3-030-60259-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60259-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60258-1

  • Online ISBN: 978-3-030-60259-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics