Skip to main content

Systematic Literature Review Search Query Refinement Pipeline: Incremental Enrichment and Adaptation

  • Conference paper
  • First Online:
Advanced Information Systems Engineering (CAiSE 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13295))

Included in the following conference series:

  • 1574 Accesses

Abstract

Systematic literature reviews (SLRs) are at the heart of evidence-based research, collecting and integrating empirical evidence regarding specific research questions. A leading step in the search for relevant evidence is composing Boolean search queries, which are still at the core of how information retrieval systems work to perform an advanced literature search. Building these queries thus requires going from the general aims of the research questions into actionable search terms that are combined into potentially complex Boolean expressions. Researchers are thus tasked with the daunting and challenging task of building and refining search queries in their quest for sufficient coverage and proper representation of the literature. In this paper, we propose an adaptive Boolean query generation and refinement pipeline for SLR search. Our approach utilizes a reinforcement learning technique to learn the optimal modifications for a query based on the feedback collecting from the researchers about the query retrieval performance. Empirical evaluations with 10 SLR datasets showed our approach to achieve comparable performance to that of queries manually composed by SLR authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The per-iteration regret is the mean of rewards of a choice with the best rewards and the action taken by the algorithm [28].

  2. 2.

    We define a query clause as a conjunction of a set of terms such as (\(t_1\) AND \(t_3\)).

  3. 3.

    In a query cluster such as (\(t_1\) OR \(t_2\)) , \(t_1\) is considered as sibling term to \(t_2\).

  4. 4.

    For full details about the datasets, experimental details and in-depth results please refer to our supplementary material at https://tinyurl.com/496zuar3 and implementation details on https://tinyurl.com/2rp4m5cs.

  5. 5.

    Popular dataset repositories, at https://zenodo.org and http://figshare.com.

  6. 6.

    We only included the results of three seed types in the table, the full list is available in Appendix at https://tinyurl.com/496zuar3.

References

  1. Adamo, G., Ghidini, C., Di Francescomarino, C.: What is a process model composed of ? A systematic literature review of meta-models in bpm. arXiv preprint arXiv:2011.09177 (2020)

  2. Badami, M., Baez, M., Zamanirad, S., et al.: On how cognitive computing will plan your next systematic review. arXiv preprint arXiv:2012.08178 (2020)

  3. Barišić, A., Goulão, M., Amaral, V.: Domain-specific language domain analysis and evaluation: a systematic literature review. Universidade Nova da Lisboa, Faculdade de Ciencias e Technologia (2015)

    Google Scholar 

  4. Brochu, E., Cora, V.M., et al.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)

  5. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surveys (CSUR) 44(1), 1–50 (2012)

    Article  Google Scholar 

  6. van Dinter, R., Tekinerdogan, B., Catal, C.: Automation of systematic literature reviews: A systematic literature review. Information & Soft. Tech, p. 106589 (2021)

    Google Scholar 

  7. Frank, M., Hilbrich, M., Lehrig, S., Becker, S.: Parallelization, modeling, and performance prediction in the multi-/many core area: a systematic literature review. In: 2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2), pp. 48–55. IEEE (2017)

    Google Scholar 

  8. Garousi, V., Felderer, M.: Experience-based guidelines for effective and efficient data extraction in systematic reviews in software engineering. In: Proceedings of EASE 2017, pp. 170–179 (2017)

    Google Scholar 

  9. Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38(6), 1276–1304 (2011)

    Article  Google Scholar 

  10. Jamshidi, P., Ahmad, A., Pahl, C.: Cloud migration research: a systematic review. IEEE Trans. Cloud Comput. 1(2), 142–157 (2013)

    Article  Google Scholar 

  11. Kim, Y., Seo, J., Croft, W.B.: Automatic Boolean query suggestion for professional search. In: Proceedings of SIGIR, pp. 825–834 (2011)

    Google Scholar 

  12. Kitchenham, B., Brereton, O.P., Budgen, D., Turner, M., Bailey, J., Linkman, S.: Systematic literature reviews in software engineering-a systematic literature review. Inf. Softw. Technol. 51(1), 7–15 (2009)

    Article  Google Scholar 

  13. Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering (2007)

    Google Scholar 

  14. Kohavi, R., Longbotham, R., Sommerfield, D., et al.: Controlled experiments on the web: survey and practical guide. DMKD 18(1), 140–181 (2009)

    MathSciNet  Google Scholar 

  15. Kuzi, S., Shtok, A., Kurland, O.: Query expansion using word embeddings. In: Proceedings of CIKM, pp. 1929–1932 (2016)

    Google Scholar 

  16. Lee, G.E., Sun, A.: Seed-driven document ranking for systematic reviews in evidence-based medicine. In: SIGIR, pp. 455–464 (2018)

    Google Scholar 

  17. Li, H., Scells, H., Zuccon, G.: Systematic review automation tools for end-to-end query formulation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2141–2144 (2020)

    Google Scholar 

  18. Manning, C.D., Surdeanu, M., et al.: The stanford coreNLP natural language processing toolkit. In: Proceedings of ACL, pp. 55–60 (2014)

    Google Scholar 

  19. Marcos-Pablos, S., García-Peñalvo, F.J.: Decision support tools for SLR search string construction. In: Proceedings of TEEM 2018, pp. 660–667 (2018)

    Google Scholar 

  20. Mergel, G.D., Silveira, M.S., da Silva, T.S.: A method to support search string building in systematic literature reviews through visual text mining. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 1594–1601 (2015)

    Google Scholar 

  21. Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: NeurIPS, pp. 3111–3119 (2013)

    Google Scholar 

  22. Miller, G.A.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  23. Ouzzani, M., Hammady, H., Fedorowicz, Z., Elmagarmid, A.: Rayyan-a web and mobile app for systematic reviews. Syst. Rev. 5(1), 210 (2016)

    Article  Google Scholar 

  24. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  25. Qin, C., Eichelberger, H., Schmid, K.: Enactment of adaptation in data stream processing with latency implications-a systematic literature review. Inf. Softw. Technol. 111, 1–21 (2019)

    Article  Google Scholar 

  26. Radjenović, D., Heričko, M., Torkar, R., Živkovič, A.: Software fault prediction metrics: a systematic literature review. Inf. Softw. Technol. 55(8), 1397–1418 (2013)

    Article  Google Scholar 

  27. Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Doc. 60(5), 503–520 (2004)

    Google Scholar 

  28. Russo, D., Van Roy, B., Kazerouni, A., et al.: A tutorial on Thompson sampling. arXiv preprint arXiv:1707.02038 (2017)

  29. Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. J. Am. Soc. Inf. Sci. 41(4), 288–297 (1990)

    Article  Google Scholar 

  30. Scells, H., Zuccon, G.: Generating better queries for systematic reviews. In: ACM SIGIR, pp. 475–484 (2018)

    Google Scholar 

  31. Scells, H., Zuccon, G., Koopman, B.: Automatic Boolean query refinement for systematic review literature search. In: WWW, pp. 1646–1656 (2019)

    Google Scholar 

  32. Scells, H., Zuccon, G., Koopman, B.: A comparison of automatic Boolean query formulation for systematic reviews. Inf. Retrieval J. 24(1), 3–28 (2021)

    Article  Google Scholar 

  33. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    Google Scholar 

  34. Tabebordbar, A., Beheshti, A., Benatallah, B., et al.: Feature-based and adaptive rule adaptation in dynamic environments. DSE 5(3), 207–223 (2020)

    Google Scholar 

  35. Teixeira, E.N., Aleixo, F.A., de Sousa Amâncio, F.D., OliveiraJr, E., Kulesza, U., Werner, C.: Software process line as an approach to support software process reuse: a systematic literature review. Inf. Softw. Technol. 116, 106175 (2019)

    Article  Google Scholar 

  36. Wahono, R.S.: A systematic literature review of software defect prediction. J. Softw. Eng. 1(1), 1–16 (2015)

    Google Scholar 

  37. Wallace, B.C., Small, K., Brodley, C.E., et al.: Who should label what ? Instance allocation in multiple expert active learning. In: SDM, pp. 176–187. SIAM (2011)

    Google Scholar 

  38. Williams, J.J., Kim, J., Rafferty, A., et al.: AXIS: generating explanations at scale with learner sourcing and machine learning. In: L@Scale, pp. 379–388 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maisie Badami .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Badami, M., Benatallah, B., Baez, M. (2022). Systematic Literature Review Search Query Refinement Pipeline: Incremental Enrichment and Adaptation. In: Franch, X., Poels, G., Gailly, F., Snoeck, M. (eds) Advanced Information Systems Engineering. CAiSE 2022. Lecture Notes in Computer Science, vol 13295. Springer, Cham. https://doi.org/10.1007/978-3-031-07472-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-07472-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-07471-4

  • Online ISBN: 978-3-031-07472-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics