1 Introduction

In the past decade, the need for automation and intelligence has led to huge advancements in machine learning (ML) and artificial intelligence (AI) [1]. Despite substantial growth in AI-based systems, we continue to see significant challenges and project failures [2]. Weiner [3] stated that according to recent data, 87% of AI projects do not make it into production, meaning that most projects are never deployed. One primary reason is that AI has disrupted traditional software development practices, which are typically deductive, where requirements are explicitly defined and translated into code. In contrast, AI-based systems, i.e., systems that incorporate AI components [4], are developed inductively, as they learn and adapt from training data. This shift in approach makes it challenging to anticipate and understand the behavior of AI-based systems. Due to the critical role of AI-based systems, software engineering (SE) and the AI community must collaborate to develop new strategies to address these issues.

Requirements engineering (RE), the systematic handling of software requirements [5], has been impacted by the complexities of AI-based systems [6]. Existing RE approaches are challenging to apply to AI-based systems because of their probabilistic nature and the need for constant adaptation. To address these challenges, RE needs to evolve to be compatible with AI-based systems [7]. The roles and responsibilities related to RE have changed, with data scientists now being responsible for specifying high-level requirements in ML systems, which can lead to systems that prioritize data quality over stakeholder requirements [8]. Hence, software engineers and data scientists must work together to address issues arising from the combination of AI and SE [9]. Kondermann [10] claimed that despite its long history, RE has yet to be used extensively for AI, especially computer vision, and called for more research to integrate data selection approaches. Further, he emphasized that AI system requirements are complex to elicit, specify, and manage.

Recently, substantial growth in studies addressing Requirements Engineering for Artificial Intelligence (RE4AI) can be seen. With this proliferation, there is a need to understand what has been achieved so far. This is important for practitioners to identify suitable methods for their day-to-day work as well as for researchers to tackle important challenges. Therefore, we conducted a systematic mapping study to explore the potential of RE for contributing to AI-based systems. We chose a systematic mapping study (SMS) over a systematic literature review because our objective was to capture the landscape of research in our area of study, not just to answer specific questions but to understand the diversity and scope of research that has been conducted. An SMS allows us to achieve this by categorizing research works based on various dimensions such as methodology, themes, outcomes, and geographical focus. This study aims to explore the current RE4AI research landscape regarding RE practices for AI-based systems, topics covered regarding SWEBOK [11] areas, maturity of the research area, challenges, and future directions.

The main contributions of this SMS are summarised as follows:

  • We provide a research overview regarding RE4AI, i.e., which topics have been explored according to SWEBOK [11] and the type of research conducted according to the classification of Wieringa et al. [12].

  • We identify which existing RE practices have been applied to AI-based systems and also present new RE practices proposed specifically for AI-based systems.

  • We identify the current trends and challenges in applying the existing RE approaches for AI-based systems.

  • Finally, we extracted potential future research directions from selected primary studies.

Article organization: The remainder of the article is organized as follows: Sect. 2 compares existing studies similar to ours. Section 3 describes the research design we followed in this study. Then, Sect. 4 presents the extractions, synthesis, and results from the selected primary studies. Section 5 identifies challenges and future directions whereas, Sect. 6 discusses our results by providing combined analysis of RQ4 and RQ5, and Sect. 7 presents threats to validity and how we mitigated them. Finally, Sect. 8 concludes this paper.

2 Related work

With the recent growth of AI-based systems, the number of empirical studies on RE for AI is increasing, with more and more different aspects being investigated. There exist secondary studies that highlighted research in the field of SE for AI-based systems [4, 13,14,15,16]. However, these studies are not exclusively focused on the RE process but consider it only as a part of SE. In relation to RE, we identified three secondary studies on RE for AI/ML [17,18,19] that are closely related to our work. Ahmad et al. [17] conducted a systematic literature review and identified 27 primary studies. They identified the notations and modeling languages utilized in the development of AI systems, focusing specifically on the application domains of these AI systems. Villamizar et al. [18] conducted a mapping study incorporating findings from 35 distinct studies. They identified the contribution of RE aspects to the development of ML-based systems, focusing on RE topics. Further, they emphasized the quality characteristics that are considered during RE for ML-based systems. Yoshioka et al. [19] also conducted a systematic literature review. Their focus was mainly on current techniques and practices of RE for machine learning systems (MLS). They analyzed 32 studies and mapped a research landscape of RE techniques and practices while identifying research gaps.

Our research significantly updates and expands upon previous reviews by analyzing 126 primary studies, broadening the scope to capture developments up to and including July 2023. We contribute by organizing current RE practices within SWEBOK Knowledge Areas (KAs) and spotlighting newly introduced practices.

Our work synthesizes challenges and future research directions from these studies, providing an in-depth overview of the evolving trends in the field. Distinguishing our systematic mapping study, we align our categorization with SWEBOK standards and employ Wieringa’s [12] classification and Wohlin’s [20] evaluation strategy to assess research maturity. This methodology sets our study apart from others, such as the systematic literature reviews by Ahmad et al. [17] and Yoshioka et al.. [19], which cover studies up to August 2021, and Villamizar et al. [18]’s study, which concludes in December 2020.

Our inclusion of literature up to and including July 2023 and our broader examination of AI-based systems, encompassing the entirety of AI, allows for a comprehensive understanding of RE challenges and practices relevant to AI. By including an additional two years and 100 more papers, our study reflects the significant shift in focus towards safe and responsible AI, as well as the evolving legal requirements of AI, particularly as AI impacts many more aspects of life following the rise of large language models (LLMs).

We critically assess and identify emerging RE practices tailored for AI, addressing the distinctive needs of AI systems compared to traditional approaches. This thorough analysis leads to a detailed compilation of challenges and directions for future research, significantly advancing the discussion on RE for AI.

We compare the related work to our study in Table 1.

Table 1 Comparison of related studies with our Systematic Mapping Study

3 Research design

We conducted this systematic mapping study by following the guidelines of Petersen et al. [21]. In the following, we outline our research questions (RQs) and their rationales, how we obtained articles, and how we addressed them systematically.

3.1 Research questions and rationales

To define the scope of our study, we formulated RQs that highlight the categorization of the literature in a way that can be interesting for scholars and practitioners, while also providing insights into how RE approaches have been used for AI-based system development.

RQ1: Where and when have RE practices for AI-based systems been published?

We aim to explore the recent trends in this research area in terms of the publication year of articles, and how active this area has been. For this purpose, we analyze the study distribution by year of publication and the most preferred venues for RE4AI.

RQ2: How are the different RE topics distributed within the literature on AI-based systems?

We intend to use the SWEBOK [11] classification scheme to assign RE categories to our primary studies. The most recently published version of SWEBOK V3.0 has 8 knowledge areas. Using the RE knowledge area and its categories, we analyze frequently studied RE topics, as well as topics that may require further attention.

RQ3: What is the maturity of the research in this area?

To evaluate the maturity of this area, we analyze the primary studies according to the criteria given below:

  • We use Wieringa et al.’s [12] classification scheme for RE publications.

  • We scrutinize if RE techniques discussed in the paper have been empirically evaluated [20].

  • If yes, we analyze if the evaluation took place in an industrial environment.

RQ4: What RE practices have been proposed for or applied to AI-based systems?

This question aims to identify new RE practices that have been proposed in the context of AI-based systems and how existing RE practices have been used for AI-based systems. We intend to provide a holistic overview of the degree to which RE has contributed to AI. We answer the following sub-questions:

  • RQ4.1: Which conventional RE practices have been applied to AI-based systems?

  • RQ4.2: Which new RE practices have been proposed for AI-based systems? We classify practices into tools, techniques, processes, and models. To make a clear distinction, we define these terms below. Technique: A technique is a specific method applied during a part of the procedure, focusing on actions that are observable and measurable by practitioners [22, 23]. For example, a common technique in requirements gathering is the use of interviews, where the interviewer follows a structured approach to elicit information from stakeholders. Another example is prototyping, a technique used to quickly create a working model of a system’s features to gather feedback. Process: A process is a set of activities to reach a certain goal, which specifies the concrete activity details and the sequence in which they are performed. However, a technique is a specific way to perform one of those steps. Model: A model is an abstraction of a system that describes it from a certain perspective. For example, a UML (Unified Modeling Language) Use Case Diagram serves as an example of a model. In the context of RE for AI, models can also include frameworks or theoretical constructs. Tool: Any software that has been used to support the RE process.

RQ5: What makes requirement engineering for AI-based systems challenging, and what are future directions to address?

To answer this question, we analyze AI-based systems and their key characteristics in the targeted studies. This analysis helps us to identify differences between AI-based systems and conventional systems, as well as RE challenges for AI-based systems. Furthermore, we analyze studies that propose future research directions. To address this RQ, we split it into two sub-research questions:

  • RQ5.1: What are the current challenges in the development of AI-based systems?

  • RQ5.2: What are future research directions?

Fig. 1
figure 1

Search Process

3.2 Research protocol

To execute an impartial, objective investigation, a research protocol is required. It manages the flow of research and maximizes the study’s valuable findings. We created a research protocol that describes the parts of the study and is depicted in Fig. 1. The main steps of the research protocol are as follows.

  1. 1.

    Search query formulation: We formulated our search query using the first two elements of the PICO criteria [21]. The first element is the Population \((``P'')\), which indicates RE publications. The second element is Intervention \((``I'')\), which specifies AI, where ML and deep learning (DL) are a part of AI. We excluded the Comparison(C) and Outcome (O) criteria from our search to broaden its scope, allowing us to capture a wider range of studies, which is especially useful for exploratory research or obtaining a comprehensive field overview as in an SMS. We constructed our query iteratively and restricted our search to article titles to achieve the best results. Initially, we evaluated four digital libraries ACM Digital Library, IEEE Xplore, ScienceDirect, and Springer for our search. Our findings indicated that IEEE Xplore and ACM Digital Library were most effective in handling our search query. Concerns may arise regarding the exclusivity of our selection potentially overlooking relevant studies, particularly from Springer. To address this, we incorporated a snowballing technique, systematically reviewing references from our initial findings to ensure no significant work was overlooked. Alongside these selected digital libraries, we expanded our search to include meta-search engines, specifically Google Scholar and Web of Science, focusing on article titles to refine our results. This comprehensive approach, combining direct searches with snowballing, was designed to ensure thorough coverage of the literature. The finalized search string is given below:

    ("requirement" OR "requirements")

    AND

    ("AI" OR "artificial intelligence" OR "ML" OR "machine learning" OR "DL" OR "deep learning")

    This query resulted in 123 articles from ACM Digital Library and 260 from IEEEXplore, while meta-search engines such as Web of Science returned 452 and Google Scholar 955 articles. In total, we obtained 1790 papers.

  2. 2.

    Removal of duplicates: In this step, we removed duplicated articles as we ran our query on two digital libraries and two meta-search engines. After the removal of duplicates, we are left with 795 papers.

  3. 3.

    Inclusion/exclusion criteria: Our query yielded literature that included all keywords in their title. To rectify the literature according to the scope of our study, we developed inclusion (IC) and exclusion criteria (EC).

    1. 1.

      EC1: Not relevant to the scope of the study (i.e., studies that do not focus on RE for AI)

    2. 2.

      EC2: Published before 2010 and after July 2023

    3. 3.

      EC3: Not written in English

    4. 4.

      EC4: Secondary studies

    5. 5.

      EC5: Not peer-reviewed / not a scientific paper

    6. 6.

      EC6: Not accessible

    7. 7.

      IC1: The primary focus of the paper is requirements engineering

    8. 8.

      IC2: The paper targets AI-based systems

    During our study, we used Rayyan [24] to remove duplicates and applied inclusion/exclusion of articles. Subsequently, after eliminating duplication, we were left with 795 unique papers. The inclusion/exclusion criteria mentioned above were used to refine articles that fit our study scope. EC3 and EC6 were designed to exclude studies that are not written in English and are accessible through any source. At the same time, EC2 is intended to strictly select the publication years considered during the mapping study. To exclude the secondary studies and the grey literature, we apply EC4 and EC5. IC1, IC2, and EC1 required an in-depth study of articles to analyze whether the article fits the scope of the study. The first three authors independently assessed 795 studies using these criteria. Discrepancies in their selections were resolved through discussion and consensus voting. This rigorous process resulted in the selection of 93 articles.

  4. 4.

    Data extraction: We extracted data from each primary study to answer our research questions described in Sect. 3.1 above. We defined extraction sheets to record the necessary information related to each publication. Having a specified extraction sheet will reduce the opportunity to include researcher bias. As a result, during data extraction, the researcher extracted data that should answer the research questions. Before we started extracting data, a pilot extraction process was conducted to develop a shared understanding and avoid any confusion regarding the extraction process. This pilot ensured that each researcher clearly understood the research questions and respective extraction sheets. For this purpose, we selected three initial studies, and each researcher independently extracted data into their sheet. Afterward, we discussed the extracted data and further improved the extraction sheets. We outlined the individual data cells according to each research question. Since each RQ has multiple fields, we maintained a separate spreadsheet for each RQ. RQ1 is primarily focused on studying metadata, including the year of publication, publishing venue, and the involved research community (Note: we classified papers based on author affiliations as industry, academic, or collaborative). To identify which RE topics have been covered frequently within the literature (RQ2), we classified the primary studies using the SWEBOK [11] subcategories for RE. Moreover, to judge the maturity of this research area (RQ3), we classified literature according to the RE publication types proposed by Wieringa et al. [12] as well as empirical evaluation method [20]. We characterized RE practices in four dimensions: tool, techniques, model, and process. Therefore, RQ4 is designed to capture these details and differentiate between practices that are new or already existing. Finally, we extracted challenges highlighted by different authors and synthesized literature to outline possible research directions (RQ5). Eventually, we split up the extractions and assigned two researchers to each paper, and after every week, a synchronization meeting was held to discuss extraction as shown in Fig. 2. A separate consensus spreadsheet was maintained where all finalized entries were recorded. Further, to analyse inter-rater reliability, the agreement level was measured using Cohen’s kappa coefficient, which provides a robust statistical measure of inter-rater reliability. For the inclusion/exclusion criteria, the researchers used the rayyan.ai tool and independently performed the inclusion/exclusion of papers. However, there was no conflict found during this process. For the thematic analysis, the researchers evaluated 5 papers with 4 questions each, making a total of 20 evaluations.

    • Total items evaluated (N): 5 papers x 4 questions = 20

    • Agreement on all 4 questions for 3 papers: 3 x 4 = 12 agreements

    • Agreement on 3 questions for 1 paper: 3 agreements

    • Agreement on 2 questions for 1 paper: 2 agreements

    • Total agreements (A): 12 + 3 + 2 = 17

    • Total disagreements (D): 20 - 17 = 3

    Cohen’s kappa \((\kappa )\) is calculated as follows: \(\kappa = \frac{P_o - P_e}{1 - P_e}\) Where \(P_o\) is the observed agreement and \(P_e\) is the expected agreement by chance.

    1. 1.

      Observed Agreement \(P_o\): Number of agreements / Total items evaluated =\(17 / 20 = 0.85\)

    2. 2.

      Expected Agreement \(P_e\): Assuming equal probability for agreement and disagreement: \(P_e\)=(Probability of both agreeing)2+(Probability of both disagreeing) Again, assuming equal distribution (0.5 for agreement and 0.5 for disagreement): \(P_e\)= \((0.5\times 0.5)+(0.5\times 0.5)=0.25+0.25=0.50\)

    3. 3.

      Cohen’s Kappa (\(\kappa\)): \(\kappa = \frac{0.85 - 0.50}{1 - 0.50} = \frac{0.35}{0.50} = 0.70\)

    These kappa values indicate a substantial level of agreement between the researchers, supporting the reliability of the conclusions derived from their analyses.

  5. 5.

    Snowballing: Following the first iteration of extractions, we applied forward and backward snowballing according to the guidelines by Wohlin [25]. Snowballing, which involves using the references of identified papers to find additional relevant studies, can be especially effective in fields where consistent terminology is lacking. This approach helped us identify important studies that might be missed due to inconsistent keyword use in database searches. By using snowballing, we ensured a more comprehensive review by capturing relevant research that might not be easy to find through traditional database searches alone. To ensure overall coverage, snowballing iterations were performed until no further studies were included. The first round on the start set of 93 articles yielded 15 additional papers. After extracting these 15 papers, the second round of snowballing was carried out, which resulted in 15 more articles. We then performed a third iteration of snowballing, which yielded 3 more articles. Lastly, snowballing on these 3 articles did not result in additional papers. After this process, we ended with a final set of 126 primary studies.

  6. 6.

    Data synthesis: We began data analysis and synthesis once extractions had been completed. To categorize the retrieved data, we used both quantitative and qualitative analysis. Some extraction discrepancies and errors were detected throughout this process and were removed. The first author performed the synthesis and frequently presented the results to the rest, leading to iterative refinements. To address RQ1, we conducted a frequency analysis to examine bibliographical data. For RQ2 and RQ3, we undertook quantitative analyses. Specifically, the analysis for RQ2 categorized literature based on SWEBOK KAs, whereas RQ3 focused on classifying literature according to Wieringa’s [12] framework. Additionally, we identified and analyzed the evaluation methods used for RE practices. To respond to RQ4, we employed a combination of methods, including qualitative analysis through thematic analysis, as recommended by Cruzes et al. [26]. As RQ5 is divided into two sub-questions, we conducted a qualitative analysis for both questions. Further, we applied the thematic synthesis approach recommended by Cruzes et al. [26]. We then extracted free text from the papers and labeled the free text. Finally, we identified the most recurrent themes in the next step and assigned them to extracted text. In the following section, we present our data extraction results and their mapping to respective research questions.

Fig. 2
figure 2

Data extraction and synthesis process

4 Results

After data extraction, we move towards the data analysis phase. In this section, we summarise the results of our mapping study. Starting from RQ1, we systematically present results for each RQ, respectively.

4.1 Bibliometrics (RQ1)

This RQ covers the publication trends over the years. Based on the earliest published primary study, we see this field emerged in 2017. Although we started our search from the year 2010, we found the first relevant paper in 2017. As expected, there is a growing trend of studies in the field of RE for AI after that, as shown in Fig. 3.

Fig. 3
figure 3

Yearly Publications Distribution

Initially, the exploration of this area was predominantly undertaken by the academic community. However, with the recent advancements in AI, there has been a noticeable increase in industrial engagement up to 2021. This trend is evidenced by the rise in the number of publications from the industry during that period. Although there was a slight decrease in industrial publications in 2022 and 2023, the overall percentage of industrial papers has remained relatively steady since 2019. This indicates a sustained interest and collaboration between academia and industry in RE4AI, as depicted in Fig. 4. Notably, IBM USA has emerged as a significant contributor with the highest publication count, while Fujitsu Laboratories Ltd., located in Kawasaki, Japan, has also made notable contributions to this field.

Fig. 4
figure 4

Research Community

The academic sector has consistently indicated substantial interest and research efforts in this field, with a gradual increase in publications over the years. In 2017 and 2018, the academic community published two and three papers, respectively. This momentum continued into 2019 with eight academic articles, three collaborative efforts, and two industry contributions.

In 2020, the output increased to 12 academic publications, six collaborative articles, and two industry articles. This growth continued in 2021, with 17 academic articles, nine collaborative projects, and four industry publications. The upward trend continued in 2022, reaching 27 academic articles, six collaborative efforts, and two industry contributions.

By the end of July 2023, early data indicates 19 academic papers, three collaborations, and one industry publication. This sustained increase in academic involvement highlights the ongoing growth and interest in this research domain.

The increasing number of publications, especially from academia, shows growing interest and involvement in this research area. The steady yearly growth and numerous collaborations indicate an active and expanding research community. This ongoing momentum is evident even in the partial data for 2023, highlighting the field’s importance and the key role of the academic community in driving innovations forward.

Considering the publishing venue, 70 out of 126 (\(54.9\%\)) papers were published at various conferences, whereas, 28 (\(33.3\%\)) were at workshops, and 28 (\(11.8\%\)) were in journals. Around 2021, workshops became more popular as an RE4AI venue, but conferences still account for a higher proportion. We can also observe that 23 out of 28 papers in the journals were contributed by the academic community. In contrast, the industry’s preferred venues are conferences. The most recurring conferences in this domain are the International Requirements Engineering Conference (RE) with 7 papers. It is followed by the International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ), and the Conference on Human Factors in Computing Systems (CHI), with five papers each. Whereas in the workshop category, the most preferred venue is the International Requirements Engineering Conference Workshops (REW), followed by the Joint Proceedings of REFSQ-Workshops with nine and four publications, respectively. Another main workshop is the Workshop on AI Engineering – Software Engineering for AI (WAIN), which has two publications. Moreover, only 28 journal publications were found, out of which three were published in Requirements Engineering Journal and three in IEEE Computer.

In conclusion, we can see that the field of RE4AI has grown over the years, with a majority of papers published in conferences where workshops and journals have equal numbers. Furthermore, the industry has shown a significant interest and involvement in research within this field. Although less than half of the papers each year have industrial involvement, we believe that the industrial adoption of AI is experiencing consistent interest, primarily due to the significant adoption of Large Language Models (LLMs) within the industry. This trend is not only catalyzing a shift from conventional software paradigms towards AI-based applications but is also likely to amplify the demand for specialized RE approaches tailored to AI within the industrial sector. Consequently, traditional RE methodologies must be adapted to meet the unique demands and complexities of AI-based systems.

4.2 Distribution of RE topics for AI-based systems (RQ2)

Systematic Mapping Studies are typically employed to present a classification scheme for research topics within a specific field of interest. Analyzing the distribution of publications across these topics can provide insights into the breadth and depth of research, indicating the field’s scope and its level of maturity.

Fig. 5
figure 5

Breakdown of Topics for the Software Requirements KA [11]

To answer this research question, we used the well-established classification scheme of SWEBOK [11] for RE topics as shown in Fig. 5. It allows us to observe where RE4AI research has been focused and which topics may still require attention. One paper can be classified under more than one topic, depending upon which RE topics they addressed in their research. In addition, we analyzed topics and their sub-categories, such as which sub-topics have been addressed or remained unattended. It can help researchers identify further gaps in the current research landscape. Figure 6 visualizes our general findings for this RQ.

Fig. 6
figure 6

Number of papers in each category

Within our analysis, we identified that 104 studies concentrated on requirements analysis, marking it as the predominant category in our classification. This category is notable for the introduction of 33 RE practices, detailed in Sect. 4.4.2. The research work within this realm has primarily explored the integration of conceptual modeling [27], the classification of new(novel) requirement types [28], and the assimilation of human-centric requirements into ML systems [29, 30].

Moreover, our review reveals that 87 studies were dedicated to requirements elicitation, representing the most substantial segment where established practices have been applied, as discussed in Sect. 4.4.1. This highlights the field’s ongoing efforts to refine and utilize traditional RE methodologies within the context of evolving technological frameworks. Furthermore, interviews [31,32,33], questionnaires [34], and scenarios have been used as practices in this area [35,36,37].

Further, we found that 77 papers discussed topics related to requirements specification, with 30 of these studies introducing new practices for specifying requirements. It is observed that the recent literature leans towards proposing practices specifically tailored to meet ML-related requirements, emphasizing stakeholder needs [38, 39]. Notably, only three studies adhered to existing practices for requirements specification.

Shifting the focus to the Requirements Validation Knowledge Area, 53 studies were identified that delve into requirements validation, out of which eight introduced novel practices for conducting requirements validation. Thus, aiming to enhance the validation processes in line with AI-based systems. Figure 6 shows 22 studies focused on practical consideration, whereas 5 studies proposed new practices for practical consideration. These studies highlighted requirements attributes regarding explanatory capabilities, ethical guidelines, and quality characteristics. 16 studies covered RE processes and focused on tailored RE processes for ML-based systems. These processes aim to incorporate ML-specific needs and additional types of requirements. Few studies highlighted the different perspectives in the business context during the RE process. We can observe that 13 studies proposed new practices for RE process, where 9 applied existing RE processes to AI-based systems, primarily focusing on Goal-Oriented Requirements Engineering (GORE) [40, 41]. The focus on requirements tools has been relatively limited, with only 4 studies identified in this domain proposing innovative tools to support the RE process for AI-based systems (Sect. 4.4.2). These tools are particularly aimed at streamlining tasks related to the elicitation and specification of requirements. In the last category, i.e., software requirements fundamentals. Only 4 studies specifically addressed the topic of requirement definitions.

The trends we observed from these statistics suggest a field in transition, grappling with the unique challenges posed by AI and machine learning systems. The use of existing practices in elicitation and modeling points to a reliance on traditional RE strengths. However, the need for new practices, especially in analysis, specification, and the requirements process, suggests that AI-based systems introduce complexities and challenges that transcend the capabilities of traditional RE methods. These new practices likely address AI-specific concerns such as ethical considerations, data quality and sourcing, model transparency, and the dynamic nature of learning algorithms.

4.3 RE4AI research maturity (RQ3)

To address this question, we classified the papers according to the taxonomy by Wieringa et al. [12].

We aim to highlight the research methods so far used by researchers in the RE4AI directions and how these practices have been evaluated.

Fig. 7
figure 7

Distribution of papers according to Wieringa’s classification

Figure 7 shows that 74 studies fall into the proposal of solution category, i.e., papers proposing a solution and establishing its relevance. Either the proposed solutions should be novel, or an existing solution should be adapted and applied to a new domain. A new conceptual framework has been proposed in 21 studies, and we classify them as philosophical papers, as some of them do not provide a direct solution but all of them offer a new way of understanding and categorizing requirements.

Papers that investigate proposed solutions’ properties while the solution still requires implementation in RE are classified as validation research. 25 papers are in this category that validate a solution proposed in the same paper or elsewhere. Papers that apply RE techniques in practice or investigate the usage of RE practices are classified as evaluation research. The novelty of the practice is not essential in this case. Instead, the knowledge claim of the paper should be novel. 18 among our primary studies describe the authors’ position, primarily to provoke discussions about RE4AI topics. These types of papers are categorized as opinion papers. Lastly, 11 studies reported personal experiences and were labeled as personal experience. Papers could be classified with more than one of these categories.

Fig. 8
figure 8

Frequently Combined Research Methods

Our analysis (see Fig. 8) reveals a notable trend in research practices. Specifically, we observe that proposal of solution papers predominantly incorporate validation research, with 20 studies validating solutions and 22 engaging in evaluation research. Additionally, 5 papers combine proposals of solutions with opinion and philosophical discourse, while 3 include personal experiences. In validation research, a common pattern emerges where 20 studies both propose and validate solutions, highlighting a preference for self-validation. Philosophical and opinion papers often intertwine, sharing a focus on conceptual framework. Evaluation methods vary, with case studies (43) being predominant, followed by surveys (15) and minimal use of mixed methods (1). This reflects a broader inclination towards practical validation in the proposal of a solution, while opinion and experience papers typically lack such research.

Further, to assess the maturity of the research, we investigate which type of research is conducted in each SWEBOK KA. Figure 9 shows how RE SWEBOK [11] topic are addressed using Wieringa’s classification [12]. It should be noted that one paper can be in more than one publication type, so the total adds up to more than 126. This analysis highlights that significant focus has been placed on requirement analysis, elicitation, specification, and validation. However, foundational aspects of requirements, such as their fundamental principles, processes, and practical considerations, have been overlooked. Additionally, there is a notable shortage of tools to support the development of AI-based systems. The data reveals that while the initial activities of RE receive considerable attention, there is still a deficiency in managing the overall RE process for AI-based systems effectively

Fig. 9
figure 9

Maturity of the research area by analyzing the prevalent research methods utilized in each KA

4.4 RE practices for AI-based systems (RQ4)

This question aims to highlight the use of existing RE practices and the direction in which new RE practices specific to AI have emerged. We extracted the practices according to the classification scheme provided in Sect. 3.1.

The bar chart in Fig. 10 indicates literature proposing new processes, techniques, models, and tools. Though more research is focused on techniques, the novelty can be seen more in model, process, and tool-related research. It is evident that current techniques are more frequently used. One major takeaway is that existing RE techniques could be adapted for AI-based systems, however, standard RE processes and tools are not adapted for the development of AI-based systems. In the subsequent sections, we will elaborate on how existing practices have been used and what new practices have been used.

Fig. 10
figure 10

Existing vs new practices

4.4.1 Usage of conventional RE practices for AI

We analyzed the suitability of current RE practices for AI by determining what RE practices have been used for such systems. The resulting model is shown in Table 2. We group the practices according to the RE topics in SWEBOK [11]. Since we found requirements modeling, which is not part of SWEBOK, to be an important topic among our papers, we included it as a distinct group. Further, we found requirements elicitation, requirements process, requirements validation, requirements analysis, and requirements specification KAs using existing practices of RE. Each paper may have multiple RE practices and could fall into multiple software requirements KAs.

Table 2 Existing Practices Used

Requirements elicitation Out of 126 papers examined, 40 employed various practices for eliciting requirements. Among these practices, interviews emerged as the predominant method, with 16 of the 40 papers utilizing them for gathering requirements. Notably, a significant number of researchers favored semi-structured interviews as a tool to initiate conversations around generative AI [33] and to foster a collaborative design process [29, 42]. Similarly, researchers [31, 32, 36, 42] also conducted semi-structured interviews for requirements elicitation. Additionally, there are studies [47,48,49, 52] concentrated on requirements elicitation that prioritize stakeholders’ perspectives and needs.

Other methods identified for requirements elicitation across the reviewed literature encompass surveys, scenario-based elicitation techniques [36, 37, 53,54,55,56], questionnaires [34, 45], think-aloud protocols [57], focus groups [32, 49, 59, 60], and controlled experiments, showcasing a diverse range of strategies for gathering and understanding project requirement.

Requirements modelling Twenty-six among 126 papers mentioned an existing practice for requirements modeling. The most frequently used practice for this topic was conceptual modeling, which 8 different articles have addressed. In [66], conceptual modeling is used for requirements elicitation, design, and development of ML solutions. Similarly, [27] incorporated conceptual modeling into a data science project and applied it to a healthcare application. The authors of [65] use conceptual modeling to illustrate the business view, analytics design view, and data preparation view. These perspectives are used to relate the corporate strategy to analytics algorithms and data preparation operations. Authors in [64] argued conceptual modeling could support the application of ML within an organization while improving usability and optimizing the performance of ML algorithms. Additionally, in [30] the authors demonstrated that conceptual modeling can be used to map human mental models to model AI-based systems. Other frequently used modeling techniques are scenario-based design and the Unified Modeling Language (UML). Husen et al.  [77] use UML for analyzing ML safety requirements top-down from higher-level business requirements, whereas [40] provides comparison using UML diagrams and aims to propose effective design practices for planning problems, with a focus on the early requirements phase.

Furthermore, within the scope of the requirements process, Goal-Oriented Requirements Engineering (GORE) and Softgoal Interdependency Graphs (SIGs) have been employed in 8 and 1 studies, respectively, as detailed in Table 2. It’s also worth noting that there has been a lesser focus on utilizing existing practices for requirements validation, specification, and analysis, with only 3 instances identified in each of these areas. It can also be observed that the researchers paid attention to using existing ISO standards, including ISO 26262 [92], ISO 25012 [90], and ISO 25000 [91].

4.4.2 New practices employed in RE4AI

For new practices, Table 3 provides a brief description of each practice and the type of practice, i.e., model, process, technique, or tool. We can observe that numerous studies have proposed new models, while a significant number also introduced new processes. Further, we will elaborate on how each KA has been addressed by researchers.

Table 3 New practices proposed

Requirements analysis This category deals with the comprehensive process of validating and managing stakeholders’ needs and constraints to ensure a clear understanding and agreement on what the system or project must achieve. It includes activities such as conflict detection, prioritization, and scope definition. The main themes found in requirements analysis were explainability and human-centric requirements analysis. 33 papers proposed different practices in this area, with 5 focused on requirement analysis for explainability needs. Sheh and Monteath [28] categorized explainability requirements while considering the source, depth, and scope of the explanation. Where [76] introduces an explainability framework that automatically recommends methods to improve system design’s explainability and efficiency for developers, thereby reducing the conflict between explainability and usability. Köhl et al. [88] provided a conceptual analysis that unifies the different concepts of explainability and the corresponding explainability demands. Suresh et al. [38] provided a framework for identifying stakeholders for interpretability and using the human cognitive process to derive requirements for explainability [96]. Hall et al. [95] outlined a systematic method to build an explainable artificial intelligence (XAI) system, which focuses on understanding specific explanation requirements and assessing existing explanation capabilities. 4 among 33 papers were focused on human-centric requirements analysis, including human-centric design requirements [29], or proposing RE frameworks that map the human mental model to ML models [97, 100, 105] focused on ethical and legal requirement analysis. Other papers proposed requirement analysis for risk modeling and frameworks for requirements analysis.

Requirements process This section of RE KA illustrates how the requirements process aligns with the overall SE process. In the requirements process category, we find a paper proposing the process model of evidence-driven RE to capture the requirements specific to ML-based systems, i.e., uncertainty [83]. Ries et at. [80], tailored traditional RE to improve dataset requirements engineering. At the same time, Vogelsang and Borg [107] highlighted; the need to integrate ML specifics in the RE process and new types of quality requirements such as explainability, freedom from discrimination, or specific legal requirements. Further, [112] outlines challenges in Explainable AI (XAI) and proposes a framework for using RE practices to address these challenges. Similarly, [69] proposed an artifact-based approach for the development of data-centric systems while [113] introduced a data-driven engineering process featuring hierarchical RE. [108] provided an overview of how research in the RE discipline can support building effective ML systems. Other authors proposed a modeling framework for analytics algorithms and data preparation activities [56, 65] or an agile data mining framework [109] in the context of business objectives. However, another notable work [110] proposed a methodology for developing and assessing legal, privacy, social, and ethical requirements.

Requirements specification New specification practices have been proposed for AI-based systems to capture domain-specific and component-level requirements. Czarnecki and Salay [146] proposed an approach to specify safety requirements based on uncertainty, whereas Rahimi [116] proposed an approach to specify requirements for ML components explicitly specifying domain-related concepts. Furthermore, [114] and [115] focused on specifying requirements well suited to ML components and testing these requirements [71]. Others focused on specifying requirements using user knowledge [39] and providing the framework with a more granular and composable vocabulary to formulate user needs [38]. Requirements Documentation and Evaluation has been highlighted by [123] by providing a list of shared requirements. However, [62] details usage view activities/scenarios, top-level requirements, and detailed sub-requirements. In this context, [128] proposes a transparency playbook for developing AI systems that meet legal, regulatory, and user requirements. Furthermore, several studies have made contributions to defining specific requirements. For instance, [117] extracts unique requirements from a case study, while [43] focuses on the necessities for AI documentation that bridge technical aspects with business processes. Berry and Daniel [118] highlight the importance of detailed evaluation measures and criteria in the specification process. Grüning [47] and Bartlett [120] offer insights into the requirements for AI-enabled medical devices, with the former providing a comprehensive overview and the latter specifying sensor accuracy for stability scoring. Elshan et al. [50] delve into what is needed for an AI-based team member, and Noda [72] uses user personas from AI medical interviews to specify usability and reliability needs.

Requirements elicitation Elicitation considers the origin of requirements and how they can be gathered. We identified among 16, six papers proposing different models for requirements elicitation, of which four papers were focused on elicitation of explainability requirements [34, 37, 53, 130, 132]. The authors of [129] proposed a method to identify requirements to ensure quality characteristics. Moreover, [58] and [60] provided a guide on how to elicit ethical requirements for AI-based systems. Further important requirement elicitation challenges related to data requirements are highlighted by [133] and [52].

Requirements validation While validating requirements is considered a crucial part of RE, we identified 8 studies proposing new practices in this area. The Challa et al.  [90] used a metamorphic testing approach to validate data quality requirements. Similarly, Banks and Ashmore [137] established that training data provides the functional requirements for AI-based systems. Using traditional assurance concepts, they developed nine areas where confidence is required in training data. Barzamini et al. [139] and Pradhan et al. [140] presented a framework to evaluate data quality. However, [61] suggested a model for examining the system’s output in scenarios that both align with and deviate from user expectations.

Practical considerations The requirements process spans the whole software life cycle. This KA aimed to maintain stability in requirements to ensure they accurately reflect the software to be built or that has been built. To support that, Sheh [142] presents traceability between the explanations and the capabilities of underlying AI techniques to help users and developers. Authors in [129] proposed a methodology to derive quality characteristics and measurement methods for MLS. Furthermore, a general methodological approach for quality modeling of ML has been proposed by [143]. Further, [144] proposed a method to deal with ethical requirements, and [86] addressed the interaction between RE and Software Architecture in the context of machine learning.

Software requirements tool In total 4 tools have been developed to address distinct needs. One such tool [145] offers a multi-layered approach, enabling users to articulate their demands for explanations, facilitating a deeper understanding of AI systems. Another tool [122] focuses on implementing tools for audit requirements, showcasing its utility with a mobility application to ensure compliance and functionality. Additionally, a tool [144] has been created with the objective of educating SE stakeholders on utilizing the Ethical Requirements Stack, aiming to integrate AI’s ethical requirements comprehensively and contribute to the advancement of AI ethics research. Lastly, there is tool support [67] dedicated to modeling requirements, which assists in the precise definition and management of system requirements, underscoring the importance of clear and structured requirement specifications in successful system development.

5 Open challenges and future research directions (RQ5)

This section highlights the prevailing challenges in the RE4AI literature and presents future research directions outlined among 126 primary studies. We used the thematic synthesis approach recommended by Cruzes et al. [26] to answer challenges and future directions in RQ5.

5.1 RE challenges for AI-based systems

In this section, we underline the challenges in RE4AI. We identified 27 challenges classified into 9 categories as seen in Fig. 11. In the following subsections, we discuss them one by one.

5.1.1 Requirements specification

In the requirements specifications, we encountered the most challenges, categorized into five types:

Hard to specify requirements concretely The necessity for requirements engineers to adopt new methods to deal with data biases and the challenge of developing requirements when the data is not yet available highlight the difficulty in specifying requirements concretely for AI systems [90, 147]. The challenge of ensuring that legal regulations and ethical considerations are adequately considered requires a shift in perspective towards a data and analytics viewpoint [107]. The complexity of specifying non-functional requirements (NFR) on overall ML system performance and the difficulty of rigorously specifying requirements due to a lack of domain knowledge [107, 114]. The challenge of specifying explainability requirements and functionality that depends on input data underscores the difficulty of concretely specifying requirements in AI-based systems [71, 88]. The difficulty of specifying unambiguous requirements, such as for a pedestrian detector component, further illustrates this challenge [116].

Incomplete and incorrect knowledge Challenges around less tangible characteristics are hard to express meaningfully, leading to overlooked and misconstrued requirements [142]. Incomplete, incorrect, and inconsistent knowledge encompassing missing or insufficient entities, mislabeled entities, and differing labels for the same entity or merged entities, highlighting issues of knowledge integrity [27].

Emergent functionality hard to specify in advance The entanglement of requirements where even minor changes can dramatically alter other requirements illustrates the challenge of specifying emergent functionality in advance [82].

New type of quality requirements The explicit specification of explainability as a quality requirement presents a new challenge due to the lack of a systematic and overarching approach [88]. Standard requirements specification techniques become less applicable in AI-based systems where requirements are informed through training data, indicating a shift towards new types of quality requirements [137, 148].

Lack of suitable guidelines for AI documentation Königstorfer [43] and Treacy [110] underscore the issue of insufficient guidance on documenting AI, noting that many guidelines do not effectively connect principles with actionable requirements.

5.1.2 Explainability challenges

Many studies have identified explainability as a noteworthy challenge. We arranged these challenges into three major categories:

Explainability as a new requirement Ishikawa et al. [83] and Kuwajima et al. [91] highlighted explainability as an emerging requirement, aligning with the European Commission’s ethical guidelines for trustworthy AI, which advocate for fairness and explainability. This category underscores the recognition of explainability as a crucial aspect of ethical AI development.

Fig. 11
figure 11

Challenges Identified in Literature

No consistent definition for explainability A significant challenge in the domain of explainability is the absence of a unified definition, making it difficult to pinpoint what ’explainability’ precisely entails [124]. This ambiguity is emphasized by studies like those of Köhl et al. [88], and Suresh et al. [38], who note that different stakeholders have varying interpretations of explainability. Furthermore, Jansen et al. [53] and Kim et al. [149] discuss the gap between stakeholders’ expectations of AI explanations and their understanding of AI system actions, illustrating the complexity of achieving a common understanding of explainability across diverse groups.

Lack of stakeholder-centric approaches for explainability The necessity for stakeholder-centric approaches in explainability is underscored by the challenges in ensuring AI-based systems are transparent enough to foster trust and accountability [141]. Suresh et al. [38] and Wang et al. [96] address the difficulties in creating AI-based systems that can effectively communicate their reasoning to users, particularly in critical situations. The literature suggests that existing model interpretability methods often fail to consider the end-user, typically being most comprehensible to those who develop them, such as ML researchers or developers [76]. This point is further elaborated by Dhanorkar [31], who argues for the need to extend beyond current explainability techniques to accommodate the diverse explanations required by different stakeholders in an AI system. Henin and Metayer [145] highlight the challenge of developing explanation methods that cater to various explainees with distinct interests, advocating for personalized approaches to explainability.

Collectively, these challenges indicate a growing awareness of the importance of explainability in AI, the need for a clearer definition and understanding of what explainability means to different stakeholders, and the importance of developing approaches that prioritize the perspectives and needs of those stakeholders

5.1.3 New requirements engineering practices

The literature identifies critical areas where new Requirements Engineering (RE) practices are essential to address the unique challenges posed by AI-based systems. These areas are categorized into four key segments.

Integrating AI components in system The integration of AI components into systems presents novel challenges for RE, necessitating new validation techniques beyond traditional inspection and static reading, especially where data quality is paramount [90]. [150] highlights the need for a revised RE process pipeline to effectively address and evaluate the requirements for these AI components, underscoring the importance of safety, reliability, and effectiveness in AI systems [108].

Data as a new source of requirements Data quality and its role as a source of requirements for AI-based systems emerge as significant concerns. The traditional principles and techniques of RE are found inadequate in addressing the unique requirements of ML-based systems, prompting a reevaluation of existing RE practices [83].

Integrating new concepts into existing practices The challenge extends to integrating new concepts into established RE practices. Existing RE frameworks must evolve to accommodate the distinct needs of AI-based systems, requiring a comprehensive approach that includes strategic planning, technology selection, system validation, and maintenance processes [69].

Lack of suitable RE concepts and methodologies for ML-based systems There is a conspicuous gap in RE concepts and methodologies tailored to ML-based systems. This deficiency points to a broader issue within the field, where RE practices fail to align with the legal and regulatory demands specific to ML systems. Ensuring compliance with relevant laws and regulations remains a primary concern for requirements engineers in this domain [107].

5.1.4 Human-centric requirements evaluation

Lack of human-centric approaches The challenges across papers [46, 47, 112, 121, 132] collectively highlight a significant shortfall in human-centric approaches within AI system development. Habiba et al. [112] outline issues such as the lack of a mediator role for effective communication among stakeholders, the absence of a unified explainability definition, and the shortfall in stakeholder-focused development methodologies, alongside a missing common language for all involved in ML projects. These issues underscore a widespread neglect of human-centered perspectives in AI’s technical evolution.

Ahmad et al. [132] point out the increasing reliance on AI in software solutions that unfortunately often overlook essential human-centered considerations in favor of technical priorities, indicating a misalignment between technological progress and human values. Similarly, Yu and Yong [46] expose a specific lack of engagement with the needs and perspectives of Korean stakeholders in AI for Health, revealing both a geographic and cultural oversight in stakeholder engagement. Grüning et al. [47] discuss how companies frequently miss integrating user requirements in the innovation of business models and the creation of new AI products, especially in healthcare, leaving uncertain how AI might shape future business models in this vital sector. Lastly, Wang [121] criticizes the dominant focus on technical strategies like model extraction for interpretability, which neglects user expectations, highlighting a critical gap in aligning AI system development with actual user needs.

Hard to evaluate requirements Habibullah et al. [44] underscore the importance of NFRs in maintaining ML system quality, noting differences in definitions and measurements of NFRs between traditional systems and ML systems, such as adaptability and maintainability. The difficulty in measuring NFRs like fairness and explainability due to their qualitative nature is compounded in safety-critical situations where both human and machine judgment are crucial. Additionally, challenges in NFR measurement are identified, including gaps in knowledge or practices, absence of measurement baselines, complex ecosystems, data quality issues, testing costs, bias in results, and domain dependencies. However, Bartlett [120] points out the complexity of defining sensor accuracy requirements to ensure reliable algorithm outputs, indicating a lack of straightforward or well-defined processes. Similarly, Dey et al. [133] observe that while there is an emphasis on specifying ML-specific performance requirements, there is insufficient guidance on systematically engineering data requirements that involve diverse stakeholders.

5.1.5 Gap between ML engineers and end-users

This section focuses on the challenges arising from the gap between ML engineers and end-users. We categorized this gap into three distinct groups.

Lack of a collaborative approach to requirements and design Initiating from a lack of collaboration, Vogelsang and Borg [107] underlined that it is challenging for data scientists to explain performance measures and their relevance to the client in an effective and understandable way. Furthermore, to ensure that customers understand the performance measures, data scientists should also have skills in communication and customer education. Likewise, Shergadwala and El-Nasr [29] underscored the need to understand the shared mental model of design teams during human-AI collaboration. Liao et al. [34] felt the need for explainability to make AI algorithms understandable to people. In contrast, Nalchigar and Yu [65] questioned the huge conceptual distance between business strategies, decision processes, and organizational performance. Lastly, Brennen [32] stressed it is essential to define a common terminology when discussing XAI to enable meaningful, productive conversations that can move the field forward. It could include establishing a shared vocabulary and clearly defined concepts and providing guidance on how to classify and rank models based on their explainability.

Lack of communication Secondly, in lack of communication, a key challenge for software engineers developing ML systems is to determine how to capture customer requirements effectively and design user interfaces that effectively convey data to the user [36]. Similarly, another significant challenge is overblown expectations identified due to a lack of communication [37]. To bridge user needs and technical capabilities to develop explainability systems that are flexible, responsive, and resilient to changing conditions is also a challenge [34]. Qadadeh and Abdallah [109] stated that understanding the language and terminology used by data scientists and business users is a challenge in the context of data mining. They added that improving organizational communication between data miners and business analysts and finding a way to bridge the gap between theoretical research results in data mining and realistic project goals is also a challenge.

Knowledge gap The third challenge is to build a shared understanding among stakeholders of the potential ML technology. Most importantly, it involves addressing the challenges in data collection and processing techniques, as well as implementing appropriate algorithms and models to ensure the overall effectiveness and reliability of the AI-based systems [27, 66]. Another challenge related to requirement elicitation for data analytics systems is to determine how to translate the business objectives into tangible and measurable analytics requirements. Additionally, there is a gap between non-technical stakeholders, who often have difficulty expressing their needs, and technical stakeholders, who need to understand and implement the requirements [65].

5.1.6 Uncertainty

Uncertainty in AI-based systems presents significant challenges to the Requirements Engineering (RE) process, categorized into three distinct areas: uncertain environments, changing requirements, and the uncertain nature of outcomes.

Uncertain environment The uncertain environment encompasses challenges stemming from reliance on large volumes of data, where the accuracy and reliability of this data cannot always be assured [148]. This situation is further complicated by unpredictable external conditions that might affect the system’s performance and decision-making capabilities [108], making it difficult to guarantee system behavior under varying conditions.

Changing requirements Changing requirements pose a persistent challenge, reflecting the dynamic nature of business and operational goals. As business requirements evolve, the technical side struggles to keep pace, especially in understanding and processing data effectively [66]. This fluidity can lead to discrepancies between expected and actual system capabilities, necessitating ongoing adjustments to the RE process.

Uncertain nature of the outcome The outcome’s uncertain nature is particularly pronounced in AI-based systems, where the behavior on unseen data can significantly differ from expected results. This unpredictability complicates the RE process, as it undermines the ability to predict the system’s performance accurately and, by extension, its development timeline, cost-effectiveness, and overall feasibility [83, 114, 143]. The inherent unpredictability of AI models demands a flexible and adaptive approach to requirements engineering, capable of accommodating unforeseen changes and outcomes.

5.1.7 Requirements analysis

We identified three primary areas of concern in Requirements Analysis where each category reflects specific issues encountered in the development of AI-based systems, as delineated by primary studies.

Hard to classify The classification of requirements for AI-based systems into FR and NFR presents a significant challenge due to the inherent complexity of these systems. They leverage intricate algorithms and vast datasets, complicating predictions of system behavior in various scenarios or environments [82]. The interaction of AI systems with external elements further amplifies this complexity, rendering traditional requirement analysis methods less effective. Moreover, the predictive nature of AI learning models complicates the advanced definition of system behavior, underscoring the classification challenge [129].

Requirements management throughout the life cycle Effective requirements management across the lifecycle of an AI-based system is crucial yet challenging. [148] emphasizes the importance of understanding the impacts of ML algorithms not only during the design phase but also post-deployment, advocating for a broader consideration of non-functional requirements (NFRs) beyond the integration of ML solutions. The non-deterministic behavior at runtime, influenced by the learning algorithms, complicates the classification and management of requirements for AI/ML-intensive systems [69]. This dynamic behavior necessitates a flexible and adaptive approach to requirements management throughout the system’s lifecycle.

No clear perception "Clear perception" in the context of requirements analysis for ML-based systems refers to the precise and accurate understanding of how these systems perceive and interpret data from their environment. The lack of a clear perception during requirements analysis for ML-based systems poses a significant risk, potentially leading to the violation of other system requirements, such as data dependencies. This is particularly concerning for safety requirements in ML-intensive systems, where unclear or inaccurate perceptions can undermine the achievement of top-level safety goals [115]. The challenge lies in adequately capturing and specifying these requirements in a manner that accounts for the nuanced and often unpredictable nature of ML-based perception.

5.1.8 Contextual requirements

The concept of contextual requirements highlights the necessity of incorporating the specific environment or context in which an AI system operates into its design process. However, this presents two main challenges: accurately capturing and defining these contextual requirements, and integrating them into the design and development process. The variability and dynamic nature of real-world environments make it difficult to ensure that the AI system will perform optimally across different contexts. Traditional requirements engineering practices often fail to address these complexities, underscoring the need for new methods to effectively handle contextual requirements:

NFR attributes change in the ML context In the ML context, NFR attributes undergo significant transformations, necessitating a nuanced approach to their elicitation, specification, and validation. The complex nature of AI systems, coupled with the specific demands of their application domains, often results in a shift in the prioritization and characterization of NFRs. These shifts can be attributed to various factors, including emerging stakeholder expectations, evolving legal and ethical standards, and the technical requirements of integrating AI components. The challenge is compounded by a frequent lack of domain-specific knowledge, implicit stakeholder needs, and ill-defined problem scopes, making the accurate definition and management of these attributes particularly challenging [27, 147].

Context changes over time Contexts within which AI systems operate are not static; they evolve over time, affecting the relevance and accuracy of the initially defined requirements. This dynamic nature of contexts can lead to significant alterations in requirement attributes, necessitating ongoing adjustments to both functional and non-functional requirements to maintain system efficacy and compliance. The ability to anticipate and adapt to these changes is crucial for the long-term success of AI systems, highlighting the need for flexible and responsive requirement engineering processes [148].

5.1.9 Legal and ethical challenges

Challenges in data dependence and interpretability One significant challenge highlighted by Gabriel et al. [48] is the reliance on extensive datasets and the domain expertise required to develop models while adhering to regulatory and ethical standards. There is a particular emphasis on the necessity to encapsulate implicit knowledge, especially from employees, and to ensure the AI system’s operations are interpretable to them. The lack of practical experience with AI applications in many companies further complicates these challenges. Additionally, Silva et al. [135] highlighted for ML systems, the inherent opacity poses a significant barrier to explainability, compounded by issues like ensuring non-discrimination, navigating legal restrictions on data usage, and the complex task of specifying data requirements.

Fairness, regulation, and ethical accountability challenges Treacy [110] and Barclay [73] highlighted that current approaches lack mechanisms to extract protected attributes from legal requirements and to assist in the definition and interpretation of fairness in AI models, indicating a gap in developing fair AI systems. Further, Grüning [47] stated companies aiming to offer AI solutions must navigate complex product requirements and complex regulatory landscapes, presenting significant operational challenges. Similarly, Cerqueira [58] emphasized that developers often lack adequate training in AI ethics, both in academic settings and within development projects. Furthermore, the absence of legal consequences for failing to implement ethical guidelines – often because these guidelines are non-binding – results in a lack of motivation or accountability among developers regarding AI ethics.

5.2 Future research directions

We propose future research in the directions of RE4AI, as outlined and summarized in Table 4. This proposal is founded upon a selective extraction of insights from primary studies.

Table 4 Future research directions

RD1: How to incorporate human knowledge in building AI-system? New sophisticated and AI-enabled safety systems, such as automatic emergency braking (AEB), have dramatically transformed the relationship between human drivers and their respective cars. It frees up mental resources, enhances driving quality, and impacts other traffic participants and their conduct. While AI-powered driving assistance has evolved considerably recently, humans have remained the same over the previous millennia. So, while building such features, we must consider several crucial factors (limitations and capabilities) from a human perspective. The fact that people may override or deactivate AEB capability, for example, has become a key constraint in its potential to make traffic safer. In this regard, considering to which extent human aspects must be included when examining the desired quality and needed functionality of the system and its components is a fruitful research opportunity [147]. Also, how knowledge about human factors can be effectively incorporated into AI-intensive system development methodologies would be a promising research opportunity [29].

RD2: How can requirements modeling be used for understanding AI-based systems? Requirements modeling enables the connection between domain problem understanding and technology solution, describing and justifying the step-by-step progression from problem to solution. Like conventional software, ML applications may benefit from well-known RE methodologies such as goal- and agent-oriented RE, ensuring that the final systems meet the goals and desires of end-users and other stakeholders [84]. Furthermore, conceptual modeling can be seen as worthwhile to improve business understanding and enhance systems transparency [64]. Where [115] and [147] highlighted efficient top-down requirement formulation and deriving contextual requirements from use cases as a new research avenue.

RD3: How can existing RE practices be adapted for AI-based systems? One of the major causes of poor ML system quality is the lack of requirements specification [152]. The main reason for this is the change in development paradigm and new types of requirements. [82] and [66] outlined that research should be conducted to investigate how existing RE practices, e.g., GORE, data-driven and model-based design (MDM), can be adapted for AI-based systems. The same applies to explainability: Kohl et al. [88] emphasized that further research is needed to investigate how RE techniques can be applied to design explainable systems.

RD4: How to identify the need for new RE practices specific to AI-based system? Several studies have shown that RE for ML systems is different because of the different ways these systems are developed; therefore, RE practices for these systems should also evolve. In this context [107] outlined the issues that future research should address: Is RE for ML distinct? If yes, what distinguishes it? If not, what are the reasons and consequences? Further research is needed to find how RE for ML can be integrated with the RE of a traditional software system [148].

RD5: How to address non-functional requirements? A rigorous RE approach is required to assure quality. NFRs are requirements placed on system quality and are articulated over many quality characteristics [153]. Further, the authors stated that our knowledge of NFR from the traditional system is no longer applicable to AI-based systems due to the non-deterministic behavior and additive performance requirements. One of the most critical aspects is data and its representation in ML systems since there needs to be an adequate mechanism to identify and manage the needed quality and amount of data [147]. Future research should focus on what are specific quality requirements related to ML systems, how these requirements can be specified [129, 148], particularly data quality requirements [147], safety requirements [107], and compliance requirements [108].

RD6: How to validate ML requirements? Some studies have dived into requirement validations for AI, but it is still in its infancy. Some possible future research directions are identifying appropriate performance metrics or key performance indicators (KPIs) for trained ML models in a particular context. Defining and monitoring the performance of ML systems ensures the system stays within its intended behavior [147]. In the context of requirements validation, [145] pointed towards validating explanation as a potential future work. Some frameworks are provided to specify quality objectives as constraints or as criteria. Nonetheless, they need an evaluation of the relevance of such objectives, e.g., assessing the understanding of stakeholders, which can be a promising future research direction.

RD7: How to address new types of requirements? This future research avenue should consider how to specify new types of requirements [81, 154]? How to incorporate these requirements in the current development scenario [80]? How do we formulate the specifications for alterations that defy expression as transformations, such as a person altering their attire, or when shadows obscure a portion of an object, which are not exemplified by the preservation of structure [114]?

6 Discussions

This study systematically mapped the landscape of RE for AI-based systems, guided by five pivotal research questions. In this section, we will explore how both existing and new RE practices pose challenges for RE4AI and discuss future research directions to address these challenges. Our investigation reveals a multifaceted view of the current state, challenges, and future directions of RE practices in the domain of AI.

RQ1 and RQ2: landscape and distribution of RE practices According to our findings, academia has remained dominant throughout the years. However, the growing industry interest in RE4AI and the advent of industry contributions highlight the necessity for a specific RE process for AI-based systems development. Collaboration and industry contribution started appearing in 2019. Conferences and seminars that foster sharing of ideas and best practices might facilitate effective cooperation.

RQ3: Maturity of research Our analysis of RE publication types and evaluation methodologies found that more publications are proposing solutions, with the majority requiring additional industry validation. This underscores the importance of additional research into evaluating AI-based system requirements as a significant opportunity for RE researchers. More research may be done to develop practices that help requirement engineers elicit the requirements of end users. Furthermore, new tool support for AI-based systems might be built to aid the RE process. Research related to RE4AI is still relatively immature and dominated by proposals for solutions. Philosophical papers, opinion papers, and personal experience papers are fewer and have no evaluation. The research is primarily compelled by studies that validate the findings. Conducting experiments, case studies, and surveys can help researchers further develop the field.

RQ4: Evolution of RE practices According to our findings, there has been a strong emphasis on the requirements analysis and elicitation phases in RE4AI, with many existing and new approaches being used in these areas. However, the requirements process phase appears less mature than the other phases, particularly when adopting the requirements process for AI-based systems. This is also notable no existing RE tool has been used to support RE4AI.

Table 5 Linking future directions, challenges and practices

RQ5: Challenges and future directions We identified nine main challenges and their subcategories that are frequently highlighted in the literature. We also extracted future research directions from selected primary studies during our analysis. The results indicate that there are still significant gaps, as illustrated in Fig. 11 and opportunities (see Sect. 5.2) to be addressed in RE practice for developing AI-based systems. Overall, RE4AI is still evolving and is a young field with the emergence of new types of AI-specific requirements. There is a strong need for specific RE processes for AI-based systems development, applying existing RE practices to AI-based systems, and addressing requirements specific to these systems.

Research, practitioners, and stakeholders in the field of RE for AI can benefit from our results as they allow them to understand the current state of research and identify areas where further research is needed. It provides a valuable resource for researchers planning to conduct future studies in this field by pointing out the challenges and research directions that still need to be explored. Furthermore, this study can be used as a reference for those who are planning to develop AI-based systems, as it provides an overview of the RE practices that are currently being used and those that are proposed for AI-based systems.

Detailed analysis of RQ4 and RQ5

In this section, we analyze RQ4 and RQ5, using Table 5 to link future research directions with identified challenges and relevant practices from Sects. 4.4.1 (existing practices) and 4.4.2 (new practices). This approach highlights how specific research directions address identified challenges and their correlations with old and new practices (see Tables2 & 3).

The studies emphasize the need for human-centric and stakeholder-centric approaches in RE4AI. For instance, Ahmad et al. [132] and Yu et al. [46] propose frameworks to collect human-centric requirements, addressing stakeholder engagement challenges. This aligns with new practices like those by Gruning [47] and Wang [121] that enhance explainability. The future direction of incorporating human knowledge in building AI systems addresses the challenge of lacking human-centric approaches. Traditional structured methods for managing AI requirements can be adapted, and new practices enhancing explainability frameworks can be implemented.

Additionally, Ishikawa et al. [83] introduce evidence-driven RE principles to manage the dynamic nature of data, linking goal-oriented RE for ML operations and hypothesis modeling. This adaptation of existing practices highlights the integration of new concepts into established frameworks, as suggested by Chuprina [69]. The future direction of adapting existing RE practices for AI-based systems tackles the challenge of integrating new concepts into existing practices. Traditional goal-oriented RE methods can be combined with new practices like hypothesis modeling for continuous updates.

Furthermore, Heyn [147] and Maass [27] discuss the evolving nature of NFRs in ML systems, linking these challenges to traditional quality requirements and new practices focusing on transparency and fairness. The future direction of addressing non-functional requirements in AI systems is crucial due to the changing NFR attributes in the ML context. Traditional quality requirements must evolve to include new quality requirements for transparency and fairness.

This analysis underscores the importance of adapting RE practices to meet AI-specific challenges, providing a comprehensive overview of how current research directions can address identified gaps and advance the field of RE4AI.

7 Threats to validity

Like other secondary studies, our study is also prone to threats to validity [155]. We report them in the following sections along with the actions taken to mitigate them. We use guidelines provided by Petersen et al. [156] to assess the relevant threats to validity. As we adopt a pragmatist worldview, our study is prone to two main types of threats: (i) internal validity and (ii) external validity.

7.1 Internal validity

Multiple factors can threaten internal validity. For our study, the selected meta-search engines and digital libraries, the choice of articles, and the screening of articles are of particular interest. Internal validity threats can occur while developing search queries, choosing appropriate search engines, and applying inclusion and exclusion criteria. We piloted multiple search queries with different search engines. Then we selected the query with the most relevant results and the least noise. Two researchers independently ran the query on different search engines for this purpose. For the screening of articles, each article was reviewed by three researchers independently to reduce bias. Potential conflicts were resolved in a synchronization meeting. To avoid subjective bias during data extraction, we piloted the data extraction form and reviewed potential data extraction concerns. To guarantee that all researchers followed a similar data extraction process, the first 15 studies were independently extracted by the first three authors. We conducted weekly meetings with all researchers, during which, after each week, we discussed extractions for 5 studies, after a week, the next 5 studies, and so on. Further, the data presented can be reviewed by anyone who wishes to do so by accessing our replication package,Footnote 1 increasing the chance of identifying any errors in reporting (if any). Moreover, data analysis was conducted by a single researcher, the results were extensively discussed and refined by the research team.

7.2 External validity

Threats to external validity are conditions that can affect the generalizability of our results. We ensured external validity by formulating an encompassing search query and applying rigorous inclusion and exclusion criteria. Although there is a chance of overlooking primary studies, we have tried to minimize this possibility. We identified the majority of relevant papers and applied snowballing until saturation, which resulted in a diverse set of articles that covered a significant and adequate part of the study topic. It helped us to increase the generalizability.

8 Conclusion

In this systematic mapping study, we presented a comprehensive overview of the research on RE in an AI-based system. We extracted 126 studies using a hybrid search strategy with iterative backward and forward snowballing and applying rigorous inclusion and exclusion criteria. We outlined the current research landscape based on our RQs among these studies, such as bibliometrics, including years, publishing venues, and author affiliations. We identified the contribution of the RE discipline towards AI and mapped RE topics addressed within RE4AI literature. Furthermore, the maturity of this research topic was evaluated using the RE publication classifications provided by Wieringa. We also highlighted how frequently these types occurred together and which evaluation methods were utilized. We also reported that existing RE practices have been used for AI-based systems and how many new practices have been proposed. Moreover, we underscored seven significant challenges in the RE4AI and proposed seven potential future research directions based on our analysis. Finally, the key contributions of this work are: (i) a mapping of the current state of research for RE4AI and (ii) the identification of challenges and prospective research avenues to further enhance RE4AI Researchers and practitioners can use these results to familiarise themselves with the current state of this field.