Abstract
Systematic literature reviews (SLRs) are at the heart of evidence-based research, collecting and integrating empirical evidence regarding specific research questions. A leading step in the search for relevant evidence is composing Boolean search queries, which are still at the core of how information retrieval systems work to perform an advanced literature search. Building these queries thus requires going from the general aims of the research questions into actionable search terms that are combined into potentially complex Boolean expressions. Researchers are thus tasked with the daunting and challenging task of building and refining search queries in their quest for sufficient coverage and proper representation of the literature. In this paper, we propose an adaptive Boolean query generation and refinement pipeline for SLR search. Our approach utilizes a reinforcement learning technique to learn the optimal modifications for a query based on the feedback collecting from the researchers about the query retrieval performance. Empirical evaluations with 10 SLR datasets showed our approach to achieve comparable performance to that of queries manually composed by SLR authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The per-iteration regret is the mean of rewards of a choice with the best rewards and the action taken by the algorithm [28].
- 2.
We define a query clause as a conjunction of a set of terms such as (\(t_1\) AND \(t_3\)).
- 3.
In a query cluster such as (\(t_1\) OR \(t_2\)) , \(t_1\) is considered as sibling term to \(t_2\).
- 4.
For full details about the datasets, experimental details and in-depth results please refer to our supplementary material at https://tinyurl.com/496zuar3 and implementation details on https://tinyurl.com/2rp4m5cs.
- 5.
Popular dataset repositories, at https://zenodo.org and http://figshare.com.
- 6.
We only included the results of three seed types in the table, the full list is available in Appendix at https://tinyurl.com/496zuar3.
References
Adamo, G., Ghidini, C., Di Francescomarino, C.: What is a process model composed of ? A systematic literature review of meta-models in bpm. arXiv preprint arXiv:2011.09177 (2020)
Badami, M., Baez, M., Zamanirad, S., et al.: On how cognitive computing will plan your next systematic review. arXiv preprint arXiv:2012.08178 (2020)
Barišić, A., Goulão, M., Amaral, V.: Domain-specific language domain analysis and evaluation: a systematic literature review. Universidade Nova da Lisboa, Faculdade de Ciencias e Technologia (2015)
Brochu, E., Cora, V.M., et al.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surveys (CSUR) 44(1), 1–50 (2012)
van Dinter, R., Tekinerdogan, B., Catal, C.: Automation of systematic literature reviews: A systematic literature review. Information & Soft. Tech, p. 106589 (2021)
Frank, M., Hilbrich, M., Lehrig, S., Becker, S.: Parallelization, modeling, and performance prediction in the multi-/many core area: a systematic literature review. In: 2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2), pp. 48–55. IEEE (2017)
Garousi, V., Felderer, M.: Experience-based guidelines for effective and efficient data extraction in systematic reviews in software engineering. In: Proceedings of EASE 2017, pp. 170–179 (2017)
Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38(6), 1276–1304 (2011)
Jamshidi, P., Ahmad, A., Pahl, C.: Cloud migration research: a systematic review. IEEE Trans. Cloud Comput. 1(2), 142–157 (2013)
Kim, Y., Seo, J., Croft, W.B.: Automatic Boolean query suggestion for professional search. In: Proceedings of SIGIR, pp. 825–834 (2011)
Kitchenham, B., Brereton, O.P., Budgen, D., Turner, M., Bailey, J., Linkman, S.: Systematic literature reviews in software engineering-a systematic literature review. Inf. Softw. Technol. 51(1), 7–15 (2009)
Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering (2007)
Kohavi, R., Longbotham, R., Sommerfield, D., et al.: Controlled experiments on the web: survey and practical guide. DMKD 18(1), 140–181 (2009)
Kuzi, S., Shtok, A., Kurland, O.: Query expansion using word embeddings. In: Proceedings of CIKM, pp. 1929–1932 (2016)
Lee, G.E., Sun, A.: Seed-driven document ranking for systematic reviews in evidence-based medicine. In: SIGIR, pp. 455–464 (2018)
Li, H., Scells, H., Zuccon, G.: Systematic review automation tools for end-to-end query formulation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2141–2144 (2020)
Manning, C.D., Surdeanu, M., et al.: The stanford coreNLP natural language processing toolkit. In: Proceedings of ACL, pp. 55–60 (2014)
Marcos-Pablos, S., García-Peñalvo, F.J.: Decision support tools for SLR search string construction. In: Proceedings of TEEM 2018, pp. 660–667 (2018)
Mergel, G.D., Silveira, M.S., da Silva, T.S.: A method to support search string building in systematic literature reviews through visual text mining. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 1594–1601 (2015)
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: NeurIPS, pp. 3111–3119 (2013)
Miller, G.A.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Ouzzani, M., Hammady, H., Fedorowicz, Z., Elmagarmid, A.: Rayyan-a web and mobile app for systematic reviews. Syst. Rev. 5(1), 210 (2016)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)
Qin, C., Eichelberger, H., Schmid, K.: Enactment of adaptation in data stream processing with latency implications-a systematic literature review. Inf. Softw. Technol. 111, 1–21 (2019)
Radjenović, D., Heričko, M., Torkar, R., Živkovič, A.: Software fault prediction metrics: a systematic literature review. Inf. Softw. Technol. 55(8), 1397–1418 (2013)
Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Doc. 60(5), 503–520 (2004)
Russo, D., Van Roy, B., Kazerouni, A., et al.: A tutorial on Thompson sampling. arXiv preprint arXiv:1707.02038 (2017)
Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. J. Am. Soc. Inf. Sci. 41(4), 288–297 (1990)
Scells, H., Zuccon, G.: Generating better queries for systematic reviews. In: ACM SIGIR, pp. 475–484 (2018)
Scells, H., Zuccon, G., Koopman, B.: Automatic Boolean query refinement for systematic review literature search. In: WWW, pp. 1646–1656 (2019)
Scells, H., Zuccon, G., Koopman, B.: A comparison of automatic Boolean query formulation for systematic reviews. Inf. Retrieval J. 24(1), 3–28 (2021)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Tabebordbar, A., Beheshti, A., Benatallah, B., et al.: Feature-based and adaptive rule adaptation in dynamic environments. DSE 5(3), 207–223 (2020)
Teixeira, E.N., Aleixo, F.A., de Sousa Amâncio, F.D., OliveiraJr, E., Kulesza, U., Werner, C.: Software process line as an approach to support software process reuse: a systematic literature review. Inf. Softw. Technol. 116, 106175 (2019)
Wahono, R.S.: A systematic literature review of software defect prediction. J. Softw. Eng. 1(1), 1–16 (2015)
Wallace, B.C., Small, K., Brodley, C.E., et al.: Who should label what ? Instance allocation in multiple expert active learning. In: SDM, pp. 176–187. SIAM (2011)
Williams, J.J., Kim, J., Rafferty, A., et al.: AXIS: generating explanations at scale with learner sourcing and machine learning. In: L@Scale, pp. 379–388 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Badami, M., Benatallah, B., Baez, M. (2022). Systematic Literature Review Search Query Refinement Pipeline: Incremental Enrichment and Adaptation. In: Franch, X., Poels, G., Gailly, F., Snoeck, M. (eds) Advanced Information Systems Engineering. CAiSE 2022. Lecture Notes in Computer Science, vol 13295. Springer, Cham. https://doi.org/10.1007/978-3-031-07472-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-07472-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07471-4
Online ISBN: 978-3-031-07472-1
eBook Packages: Computer ScienceComputer Science (R0)