1 Introduction

Strengthening of the competitiveness of manufacturing companies is a constant source for innovation and renewal. Today’s emergence of increasing horizontal and vertical interconnectedness of virtually all production and managerial systems of companies is often referred to as the “Industrial Internet” or “Industry 4.0” [1]. We are in the midst of this next industrial revolution. Currently it is neither sufficiently understood how the human factors perspective relates to the increasing wealth in detail, complexity, and amount of information in companies, nor, how human operators can be adequately supported to harness this rich source of information.

In this article we argue that Decision Support Systems (DSS) are a necessity to help human operators to handle the flood of information and that these system positively influence decision speed and accuracy. However, we also argue that these systems have detrimental consequences on decision speed and accuracy, if they suggest wrong or inadequate decisions, as operators are easily deflected by these systems.

The remainder of this article is structured as follows: Sect. 2 outlines related work and the current state of the art of the human factors perspective on decision support systems. Section 3 describes our experimental approach. Section 4 presents the results of the experiment. Section 5 discusses the significance of this study and the recommendations for actions for the design and evaluation of decision support systems. Finally, Sect. 6 touches the limitations of this study and Sect. 7 outlines the future research agenda.

2 Related Work

In order to understand the focus of this research we first outline what our understanding of a decision support system is and what the purpose of such a system it serves. We then elaborate on the human processing of tabular data to provide the quantitative background for our research hypotheses.

2.1 Decision Support System

The term decision support system (DSS) has been coined since the 1970’s [2, 3]. The early precursors of modern DSS can be found in research on organizational decision-making and technical work in the 1950’s and 1960’s. The idea was to integrate the computational and storage power of computers into the complex decision making tasks of workers. In order to do so, it was necessary to encode knowledge, models, and rules into a computable form. Decision support was reached by providing querying systems, reports, and visualizations.

DSSs aid in solving problems by automatizing the programmable part of a decision problem [3]. This part can be routine, repetitive, well structured or easily solved by a computer. The user can then find a suitable decision by integrating the output of the DSS in the non-programmable part of the problem, which is new, novel, ill-structured or difficult to solve. As reasons for avoiding full-automation can be various (e.g., responsibility, ethical, political, and organizational considerations), the user must make the final decision. Typically a DSS provides the user an indicator for the choice to be made or presents an argument for the specific solution.

Modern forms of DSSs are data warehouses [4], OLAP [5], data-mining [6] and web-based DSSs. Data-centric DSS gain importance as the amount of available data increases, while at the same time the amount of data becomes too overwhelming for any user to process. Modern DSSs implement methods from artificial intelligence to help processing the large amount of information and thus provide fuzzy output that needs further interpretation [7]. The success of DSSs has been shown in various fields, in particular in medicine [8] and defense [9].

The need to study the effect of human factors on the use of DSSs has been mentioned in medicine [8, 10] or stock trading [11]. In particular, user factors, such as gender [12], are likely to influence decision accuracy and thereby performance. Moreover, trust and perceived effectiveness play a decisive role regarding the use of DSSs: A possible imbalance between users’ evaluations and actual performance of the decision support can lead to a detrimental neglect of such systems [13, 14].

An application of DSS in Industry 4.0 settings would face similar challenges and should therefore be investigated in a similar fashion to understand the benefit and pitfalls of using a Decision Support System. But, where to start?

One way of supporting the user is achieved by providing indicators of tabular data for inventory control, which takes a significant toll on human cognitive processing [15, 16]. These indicators can help, but can also be deceptive in unclear unexpected scenarios (i.e., black-swan effects). In such cases, a DSS can by defective or dysfunctional, as it provides correct support for an incorrect scenario. Nevertheless, providing support in processing tabular data is in general considered helpful. Yet, the effects of having a defective DSS must be weighed against the benefits of a functional DSS. To quantify this effect, one must first understand the toll of processing tabular data without a DSS.

2.2 Processing of Tabular Data

Speicher et al. investigated the influence of number of lines, number of digits, as well as concreteness of the given information on speed and accuracy in table reading tasks [15]. All factors had a positive influence on performance, whereas accuracy was not affected. The study also revealed that practice had a positive influence on both, performance and accuracy. Furthermore, participants with higher perceptual speed were faster, whereas they did not achieved a higher number of accurate results.

Mittelstädt et al. further considered task complexity and readability while processing tabular data. As in the previous work, all factors influenced performance [16]. Also, the study revealed that people with higher perceptual speed were able to compensate the detrimental influence of lower readability on task performance.

How can we integrate these findings into the usage of a DSS? How do users perform in similar tasks when a DSS is present? Understanding the efficacy of a DSS is crucial when deciding where, when, and how a DSS should be used.

3 Method

To understand the efficacy of decision support systems for manufacturing companies we conducted an experiment and measured the influence of different decision support systems on efficiency and effectivity. We used a simplified business simulation game as a basis for measuring the influence of the system on the efficiency and effectivity of the decisions. The following sections describe (1) the underlying business simulation game, (2) the considered experimental variables, (3) the experimental procedure, and (4) the characteristics of the sample.

3.1 Business Simulation Game

The game is based on Forrester’s Beer Distribution Game [17] and our own Quality Intelligence Game [18]. Both resemble decision-making tasks in a supply chain environment. The former addresses communication across the supply chain whereas the latter focuses on multi-criteria decision making, as it also includes investments in quality management. In contrast to the originally turn-based games, we reduced the complexity of the underlying simulation to singular decision tasks for this baseline experiment. Hence, a wrong decision had no influence on the subsequent decisions, whereas poor decisions at the start of the real games had to be compensated later on.

The task of the participants is to read the tabulated stock levels and projected customer demands for a set of products and to decide if enough stocks are present or if new products must be ordered. For reasons of simplicity, only binary answers are requested (i.e., procurement if at least one product is out of stock, no procurement if enough stocks are present for every product).

3.2 Experimental Procedure and Investigated Variables

The experiment consists of a three different sets (no DSS, correct DSS, defective DSS) with 30 decisions situations within each set (150 trials in total) for each participant. For each situation the participants decide whether new materials need to be procured by pressing one of two distinctly marked buttons on a keyboard (procurement/no procurement) and time and accuracy for each decision is recorded. Feedback on the correctness is given after each trial and the participants take a short break between the sets. The overall experimental procedure is illustrated in Fig. 1.

Fig. 1.
figure 1

Schematic representation of the experiment.

Independent variables (between-set):

Decision Support System: Three different types of DSSs are presented: No DSS, a correct DSS, and a defective DSS. If a DSS is available, a traffic light next to the data table suggests a procurement decision. A red light indicates that insufficient stocks are available and that new supplies must be ordered. A green light indicates that enough stocks are present. If a DSS is present, this system is either working correctly or malfunctioning. A malfunctioning system presents the wrong suggestion with a 50 % (sic!) chance. As feedback is given immediately after each table the participant should notice the defective DSS quickly.

Independent variables (within-set):

Procurement: Half of the presented tables contain data that require a procurement decision, as the stock of exactly one good is lower than the demand. Otherwise, no procurement is required.

Data amount: The presented tables differ in length (2, 6, 12 lines of data). The amount of information is varied as previous studies suggest that many flaws in user interfaces can be compensated for easier tasks with less information, but that the difficulties arise only for more complex or information dense tasks [15, 16].

Explanatory variables:

Perceptual speed is measured by the Number Comparison Test [19] as a possible moderator variable. A recent study identified that people with higher perceptual speed can compensate inferior user interfaces better than people with lower perceptual speed [16]. Thus, we assume that higher perceptual speed is also beneficial for compensating the influence of deflective decision support systems.

Trust in Automation: This self-developed scale assesses an individual’s trust in automation with questions such as “I trust the system”, or “The system will always create the same result for the same input”. We speculate that this measure influences the achieved accuracy for the sets with the malfunctioning DSS, as people with lower trust in automation might base their decision on the actual data than on the DSS.

Dependent variables:

The reaction times in [ms] and the accuracy (correct/incorrect, in [%]) is recorded via log files for each trial. These were aggregated to median reaction times and mean accuracy scores for each of the considered factors (decision, length, DSS). Due to the simplified environment each decision could unambiguously classified as correct (i.e., no procurement as enough supplies are in stock, or procurement as at least one supply was not available in sufficient quantities) or incorrect (e.g., procurement of new supplies although enough are in stock).

The results are analyzed with parametrical and non-parametrical methods, using bivariate correlations, single, and repeated uni- and multivariate analyses of variance (ANOVA/MANOVA), and multiple linear regressions. The type I error rate (level of significance) is set to α = .05 and findings .05 < p < .1 are reported as marginally significant. Pillai’s value is considered for the multivariate tests and effect sizes are reported as η2. If the assumption of sphericity is not met, Grenhouse-Geisser-corrected values are considered (but uncorrected dfs are reported for readability).

3.3 Description of the Sample

20 people aged from 22 to 54 years (avg. 29.6 ± 7.2 years) participated in the experiment (8 female, 12 male). The perceptual speed (PS) score of the participants ranged from 18 to 39 points with an arithmetic mean of 24.7 ± 5.0 points (higher values indicate a higher perceptual speed). The trust in automation (TA) of the sample ranged from 2.5 to 5.2 points (scale from 1 to 6, higher values indicate a higher trust) with an arithmetic mean of 4.0 ± 0.9. The scale has an outstanding internal reliability (Cronbach’s α = .906, 6 items). A correlation analysis reveals a strong significant relationship between Sex and PS, with women having a higher perceptual speed than men [ρn=20,2 = .581, p < .01].

4 Results

The results section is structured as follows. First, the findings for the baseline experiment (i.e., no decision support system) are presented. Then, the influence of a correctly working DSS is shown, followed by the effect if a faulty DSS.

4.1 Baseline (No Decision Support System)

The average median reaction time without DSS is 5.186 ± 1.49 s with an accuracy of 92.6 ± 7.3 %. The fastest participant completed the set without DSS in 2.9 s/table whereas the slowest person took 9.7 s/table. The lowest achieved accuracy score is 77 % compared to 100 % (no errors) as the highest score. Overall, time and accuracy are not related [ρn=19,2 = .104, p = .673 > .05], which indicates that the participants took the task seriously and did not rush through the trials at the cost of accuracy.

The average median reaction times were neither influenced by the agen=18,2 = −.172, p = .495 > .05], the sexn=19,2 = .097, p = .692 > .05], the TAn=19,2 = −.041, p = .869 > .05], nor the PSn=19,2 = .066, p = .789 > .05] of the participants. The average accuracy is neither influenced by the agen=18,2 = .385, p = .114 > .05], TAn=19,2 = .135, p = .582 > .05], nor PSn=19,2 = −.023, p = .927 > .05], but by Sexn=19,2 = −.595, p = .007 < .01] with women achieving lower accuracy scores than men.

Despite the small sample size, all investigated within-subject factors and their interaction terms were significant as presented in the following:

Factor Data amount: The number of lines of the tables significantly influences the reaction times and accuracy [V = .966, F4,15 = 100.482, p < .001, η2 = .966]. As expected, tables with 2 lines were completed significantly faster (M = 2.438 s) than tables with 6 lines (M = 4.683 s), or tables with 12 lines (M = 8.592 s) [F2,36 = 255.97, p < .001, η2 = .934]. The accuracy of short tables was slightly higher (95.8 %) than for medium (93.2 %), or for long Tables (88.9 %), though this difference is only marginally significant [F2,36 = 2.764, p = .092 < .1, η2 = .133]. Furthermore, reaction times for medium and long tables were strongly correlated [ρn=19,2 = .795, p < .001]. Likewise, accuracy for long and medium tables [ρn=19,2 = .795, p < .001], as well as long and short tables were strongly correlated [ρn=19,2 = .795, p < .001]. Figure 2 (left) illustrates this effect.

Fig. 2.
figure 2

Effect of data amount and procurement decision on speed and accuracy.

Factor Procurement: The decision (procurement/no procurement) also sig. influences decision time and accuracy [V = .870, F2,17 = 56.883, p < .001, η2 = .870]. The time for a procurement decision is significantly lower than for no procurement decision (4.470 ± 2.06 s vs. 5.904 ± 1.73 s) [F1,18 = 22.059, p < .001, η2 = .551] and surprisingly the case of a procurement decision is significantly less accurate (86.6 %) than the case of no procurement (97.2 %) [F1,18 = 16.003, p < .001, η2 = .471] (see Fig. 2, right).

Interaction Procurement × Data Amount: As Fig. 3 illustrates procurement decision and amount of data show a significant interaction [V = .635, F4,15 = 6.525, p = .003 < .05, η2 = .635]. The reaction times diverge depending on the procurement decision and order decisions are entered significantly faster than no-orders with increasing data amount [F1,36 = 10.059, p < .001, η2 = .359]. However, the accuracy of the procurement decision decrease with increasing data amount, whereas the accuracy of no-procurement decisions seems unaffected [F2,36 = 2.866, p = .082 < .1, η2 = .137].

Fig. 3.
figure 3

Sig. interaction between procurement × data amount on reaction time and accuracy.

4.2 Effect of a Decision Support System

The availability of a decision support system (DSS) has a significant influence on both speed and accuracy [V = .681, F4,15 = 8.006, p < .001, η2 = .681].

Specifically, the average median reaction times with the correct DSS were significantly lower (3.023 ± 1.405 s) than without DSS (4.715 ± 1.486 s). Even the erroneous system reduced the reaction times (4.101 ± 0.931 s) compared to the baseline.

The DSS also significantly influenced the attained accuracy. The correctly working DSS yielded in higher accuracy scores (95.1 ± 9.3 %) than the baseline system (92.6 ± 7.3 %). Yet, the defective system yielded in a reduced accuracy (86.5 ± 9.3 %). Figure 4 and Table 1 illustrate the overall influence of a correctly working and a defective DSS on performance and accuracy.

Fig. 4.
figure 4

Influence of no, correct, and defective decision support systems on performance and accuracy.

Table 1. Mean median reaction times and accuracy for different decision support systems.

Interaction DSS × Data amount: There is a significant overall interaction between the DSS type and the length of the tables [V = .718, F2,17 = 3.495, p < .001, η2 = .718]. As Fig. 5 (upper row) illustrates, the correctly working DSS has a significant positive influence on the reaction times, especially for larger data tables. However, the reaction times for the defective DSS were just a nuance smaller than for the baseline experiment, even for the large data sets (left). On the other side (right), the correctly working DSS had a positive influence on the overall accuracy of the decisions compared to the baseline experiment, again, especially for larger tables. Strikingly, the accuracy for the defective DSS drops compared to the baseline experiment. This effect is observable for all table sizes, but it its particularly strong for large tables.

Fig. 5.
figure 5

Effect of no, correct, and defective decision support systems on performance and accuracy depending on data amount (upper row) and procurement decision (lower row).

Interaction DSS × Procurement: The study reveals also a significant overall interaction between DSS and the decision on performance and accuracy [V = .597, F4,15 = 5.548, p < .001, η2 = .597]. As the lower row of Fig. 5 illustrates, the performance differences between procurement and no procurement decisions are leveled out by a correctly working DSS, whereas the difference persists for the defective DSS (see Fig. 5, lower-left). Likewise, the accuracy of the two decision alternatives converge for the correctly working DSS, whereas the absolute difference between the accuracies of the two decision alternatives remain at the same, but now remarkably lower level for the deflective DSS (see Fig. 5, bottom-right).

5 Discussion

The presented experiment provides valuable insights for the design of Decision Support Systems (DSS) for inventory control, supply chain management, and probably a multitude of other managerial decision tasks.

Although the given task of comparing a set of numbers is exceptionally easy, the participants’ accuracy is affected by the amount of information (i.e., length of the tables) and surprisingly, also by the type of the decision (procurement vs. no procurement). The finding that the reaction times increase with increasing information amount is in lines with previous research [15]. Yet, the current study found that the accuracy of the decision decreases when more data has to be considered, which was not found in the previous work.

The evaluation of the influence of the Decision Support Systems on effectivity and efficiency shows that providing decision support to workers in material disposition is possible and that performance and accuracy increases with correctly working systems. Hence, it is advisable for developers of software for production systems to carefully design such support systems, as they reduce the error rate, increase the performance, and probably also increase job satisfaction of the workers.

However, if the Decisions Support Systems work incorrectly, they seem to deflect operators from performing their tasks correctly. The participants of the experiment received feedback on the correctness of their actions immediately after each decision; hence, they must have noticed the defectiveness of the system. This is also confirmed by the achieved performances for the defective DSS, which is decreased compared to the correct DSS and en par with the baseline. Yet, the lower achieved accuracy (compared to the baseline) indicates that the participants followed the DSS’s suggestions, despite knowing about its defectiveness.

The diabolical finding of this article is that these shattering effects are concealed by the simpler tasks of the experiment and only emerge clearly for the longest tables with the highest information density. This underlines that during design and development of any interactive systems, the complexity of the later actual tasks should never be underestimated. Otherwise, the interactive system might work perfectly well in a clean development setup with simplistic dummy data, but will horribly fail if people start using the system for actual work with real data. Especially in the context of manufacturing companies, this might not only lead to a decrease in employee satisfaction, but also to competitive disadvantages through reduced effectivity as shown in the experiment above.

Concluding, the article highlights the rising importance of including the human factors and human-centered design in the development of managerial systems.

6 Limitations

Obviously, the prototypic task of this experiment is easy to automate and implementing a perfectly working DSS for this task is trivial. Nevertheless, this study exemplifies that performance and accuracy of human operators in managerial tasks is influenced by the amount of information and possible support systems.

The current experiment focused on an easy number comparison task. We suspect that the positive influence of a correctly working DSS and the negative influence of a defective system is even stronger for more complicated tasks. Correspondingly, a previous publication suggests that task performance is affected by task complexity and interface usability: Poor interface usability can be compensated in simple tasks (i.e., similar performance), but poor usability in combination with complex tasks yields in significantly lower performance [16]. Furthermore, the presented data was static, but how will the findings shift if the underlying data is unclear or tainted with uncertainty? Consequently, a follow-up must address the influence of different DSSs, complexity, and uncertainty on speed, accuracy, and the participants’ perceptions.

The presented findings are based on a sample of 20 participants. Yet, the low p-values and the strong effect sizes suggest that this within-subjects experiment identified various crucial aspects that must be considered when implementing decision support systems. Contrary to previous studies and contrary to our expectations, the considered user-diversity factors (i.e., age, the well-validated number comparison score [19], and trust in automation) did not significantly influence accuracy and performance. We suspect that this is caused by the rather small sample and their homogeneity (university students or graduates) and suggest a validation of this study with a larger and more diverse sample.

Within the limited space of this article, we did not immerse into the analysis of post-hoc tests for the more than n-nary factors, such as the amount of data.

7 Outlook

The present article showed that business simulation games and game-based learning environments are a suitable method to understand human-factors and individual decision making abilities in simulated though complex environments. We also showed that these environments could be used to study the benefits and pitfalls of decision support systems in their respective contexts.

The findings presented above are based on a small and rather young sample. The current shift in the demographic structure of many developed societies demands that future studies address the interrelationship between age and the ability to understand and control complex socio-technical production systems. Future studies must therefore investigate how age and the associated decline in sensomotoric ability and decelerated cognitive control influences efficiency and effectivity in inventory control tasks [20]. Although age is often considered as a barrier for interacting with technology, it can also be speculated that the vaster experience and increased serenity of older employees may yield in less exaggerated decision behavior. This, in turn, may then yield in a reduced variance across the whole supply chain (cf. Forrester’s bullwhip effect [21]).

The current findings are based on the simplified supply chain game. To ensure ecological validity further studies will need to quantify the influence of different types of decision support systems in more complex and realistic decision making tasks. A companion study on the before-mentioned Quality-Intelligence Game already identified that properly designed user interfaces in combination with a DSS can significantly increase the overall performance and satisfaction of players [22, 23]. Yet, the influences of defective systems in complex environments have not yet been investigated.