Introduction

Context

Thesis writing is a challenging task for students. No less challenging is the task of instructing and tutoring thesis writing for supervisors and institutions. Increasing student numbers and stagnating resources pose management problems as well as a threat to the quality of thesis supervision. At the School of Management and Law at the author’s higher education (HE) institution in Switzerland, for instance, nearly 1000 undergraduates annually face the task of writing an undergraduate thesis, and a similar number of first-year students are required to learn the necessary skills. Due to the high cost of Swiss labor, providing just 1 h of supervision to 1000 students takes around US$ 100,000. An additional challenge is the reduced number of study years since the introduction of the Bologna Process in Europe, which demands that a first academic thesis is produced after 3 years of study, compared to 4 or 5 years previously, depending on university type. Academic writing centers are rare and writing programs still the exception at Swiss universities (Kruse et al. 2016). Thus, by the time of choosing a topic for an undergraduate thesis, students are still inexperienced writers with little understanding of the genres, procedures, or conventions of academic writing.

This situation led to the decision to address these deficits by developing an electronic instruction and support system to scale the instruction of thesis writing. The resulting product was Thesis Writer (TW), a cloud-based tutoring and support system that scaffolds student learning in a two-step approach, from the first idea to the research proposal and finally to the completed structure of a thesis.

Given that TW is still at an early stage of development with only small group pilot testing carried out so far, it was decided to run a large-scale test with first-year business administration students enrolled in a course called “Skills” (core competencies), which includes an introduction to academic writing and methodology. First-year students were selected for this study, rather than those writing their BA thesis, for several reasons: Firstly, at the business school in question, the majority of instruction in academic writing for BA students is provided in the first year. Secondly, the majority of functions currently used in TW support the development and finalization of a research proposal; there is still limited additional functionality or instructional support for moving from a research proposal towards a full thesis.

The course is taught each semester to around 300 students in five or six parallel classes and was consequently considered a good fit to assess the scalability of tools for learning and instruction. As part of their assignment, students have to submit a research proposal based on an extended version of the IMRD model (Introduction, Methods, Results, and Discussion) (Swales 1990). In order to reduce the number of proposals to be graded by the instructors, the assignment is designed for groups of four students.

The following two sections discuss relevant literature from the field of writing research, locate TW in a model of writing, and discuss instructional strategies aimed at developing student writing skills. This is followed by an examination of existing electronic solutions aimed at supporting the instructional strategies. Subsequently, it will be argued that technology acceptance models (TAM) provide a useful starting point for framing this research on the use, usability, and usefulness of TW, which might serve as a basis for assessing TW’s potential for scaling writing instruction.

Academic Writing, its Instruction and Current State of Respective Electronic Support

Writing research (e.g., Jakobs and Perrin 2014; MacArthur et al. 2006; Torrance et al. 2012) provides analytic and empirical tools for studying writing processes, collaborative writing, writing practices, and genre usage. From initially dominant process approaches focusing on writing strategies and cognitive aspects (Bereiter 1980; Bereiter and Scardamalia 1987; Flower and Hayes 1980, 1981; Hayes and Flower 1980), research has opened up to socio-cultural theories (e.g., Kent 1999), disciplinary approaches such as the WID (Writing in the Disciplines) movement (Russell 2002), and, more recently, literacy approaches (Lillis et al. 2015), which include not only reading but also media-related aspects of writing in the digital age such as “new literacy” (Coiro et al. 2008) and the use of multiple languages such as in “multi-literacies” as suggested by the New London Group (1996). For this research, mainly the process approach and the WID approach are relevant and are discussed in the following.

Process approach: Research owes numerous insights to the connections of writing and thinking, as well as to the sequential organization of the writing process. Emig (1971) demonstrated that text develops in several steps of revision, which are organized recursively, not linearly. Writers often return to text written earlier and enhance the content, thus slowly developing the overall text. What students want to express with their text is not fixed beforehand but developed during the writing process. This has been picked up by cognitive writing research, most clearly by Hayes and Flower (1980) and Hayes (2012), who isolated several interrelated “processes” which seemed responsible for the recursive method of text production; their research was also reviewed by Ruhmann and Kruse (2014). Organizing the writing and research process and helping students understand how to navigate through it is a main intention of TW.

Social and disciplinary context: WID (Writing in the Disciplines) approaches originated in the US. Numerous studies discuss the influence of disciplinary contexts on academic writing and the close interconnections between writing practices and disciplinary teaching (Russell 2002). WID approaches demonstrate the impact of disciplinary affiliation on how writing is valued, used, and taught (Langer and Applebee 2007; Poe et al. 2010; Thaiss and Zawacki 2006; Walvoord and McCarthy 1990). All these studies demonstrate that writing is grounded on epistemic assumptions of respective disciplines and is used to introduce students to various forms of disciplinary thinking and argument, with more recent European adaptations coming from Deane and O’Neill (2011) and Doleschal et al. (2013). WID approaches view writing in close connection to disciplinary epistemologies and critical thinking (Bean 2011; Kruse 2010, 2011, 2013). Against this background, thesis writing may therefore also be seen as training in disciplinary thinking.

Graham (2018) developed a community model of writing that synthesizes the two strands discussed (writing process, WID) and strives to overcome shortcomings of previous models of writing that tended to concentrate on one or the other of these aspects. TW offers functionality that addresses both aspects. Therefore, Graham’s model, which conceptualizes multiple facets of academic writing is a promising location for TW in the two-strand approach previously discussed.

Writing instruction: In an extensive meta-analysis of the effects of instructional practices on the quality of adolescent student writing, Graham and Perin (2007) listed five classes of instruction: (1) “Process writing approach”; (2) “Explicit teaching of skills, processes, or knowledge (grammar, sentence combining, strategy instruction, summarization, text structure)”; (3) “Scaffolding of students’ writing (prewriting, inquiry, procedural facilitation, peer assistance, study of models, product goals, feedback)”; (4) “Using word processors for producing texts”; and (5) “Extra writing (additional practice)” (p. 449). Kellogg and Raulerson (2007) identified similar strategies for developing academic writing skills, practicing and providing feedback. Based on their findings, Graham and Perin (2007) provided ten recommendations, ordered according to the average weighted effect size, with the following deemed most relevant to TW: (1) “Teach adolescents strategies for planning, revising, and editing their compositions (strategy instruction)”; (3) “Develop instructional arrangements in which adolescents work together to plan, draft, revise, and edit their compositions. Such collaborative activities have strong impact on the quality of what students write”; (4) “Set clear and specific goals for what adolescents are to accomplish with their writing product”; and (5) “Make students use word processors” (2007, p. 449).

Electronic solutions for writing instruction: In an extensive up-to-date review of computer-based writing instruction, Allen et al. (2015) concluded that effective writing instruction requires significant (time) resources, which are often unavailable. They reported that the majority of research consequently focuses on automated essay scoring (AES). Such systems use natural language processing, artificial intelligence and latent semantic analysis for analyzing essays (for discussion, see Shermis and Burstein 2013).

Automated writing evaluation (AWE) systems extend the functionality of AES systems by providing formative as well as summative feedback on student papers; in other words, they go beyond the simple scoring of essays (Allen et al. 2015). A common feature of AWE tools is the provision of an editor for students’ text writing and feedback, which aims to increase opportunities for text revision (Stevenson and Phakiti 2014). Systems vary with the manner of feedback provided, ranging from “global writing skills and language use” to feedback on language use and information on content knowledge. In some tools, “model essays, scoring rubrics, graphic organizers, and dictionaries and thesauri” are incorporated (Stevenson and Phakiti 2014, p. 52). Findings on the effectiveness of such systems, as analyzed in a critical review by Stevenson and Phakiti (2014) are, however, not yet encouraging, with evidence of supportiveness only considered to be “modest”. Also, their study criticizes a lack of clarity about what the “more general improvements in writing proficiency” (p. 62) might be. In contrast, Allen et al. (2015) came to a slightly more optimistic assessment stressing, among others, the advantage of AWE systems giving timely feedback - also on fragments. They, in contrast, point towards the problem that so far, “little research has been conducted to examine the usability and most effective forms of automated feedback” (p. 321).

A third class of tools is intelligent tutoring systems (ITS). Steenbergen-Hu and Cooper (2014) considered them as “computer-assisted learning environments. They are highly adaptive, interactive, and learner-paced.… ITS are adaptive in that they adjust and respond to learners with tasks or steps suited to the learners’ individual characteristics, needs, or pace of learning” (p. 331). In their meta-analysis, Steenbergen-Hu and Cooper (2014) assessed the effectiveness of various ITS on students’ academic learning. According to the authors, while there are various ITS in the area of mathematically grounded subjects, few relate to reading and writing. Writing Pal (W-Pal, for an overview and assessment, see Roscoe et al. 2014) is one of the few ITS-supporting writing instruction. While its usability has been proven (Roscoe et al. 2014), W-Pal appears to be designed to develop essay-writing skills needed in secondary rather than higher education.

An approach not reviewed by Allen et al. (2015) are tools using trigger questions for critical reviews, such as G-Asks (Liu et al. 2012), for example. Providing a comprehensive literature review is a daunting task faced by all students (Liu et al. 2014), a fact also underlined by a recent comprehensive introduction to literature review writing by Onwuegbuzie and Frels (2016). Trigger questions can assist students in improving the quality of their literature reviews (Liu et al. 2014). Due to the limitations of posing generic questions, research focus has shifted towards automated question generation (AQG). There are two technical approaches involving AQG: (1) Key phrases are used for generating questions; and (2) citations are used on generating the questions (Liu et al. 2014). However, as Liu et al. (2014) pointed out for the second approach towards AQG, while yielding better results than generic questions in some cases, it is still affected by various (technical) problems. For example, the quality of questioning relies in part on the quality of text provided. Still, it is a promising approach, despite the fact that it is tailored to a single, albeit specifically challenging, part of writing instruction.

In relation to the current study, it is important to stress that AES systems, and the research relating to them, focus primarily on the genre (Swales 1990) of the essay. TW aims to support instruction and practice of the genre of the research report following the IMRD model, which is the classical structure used for reporting research results in articles or theses (Swales 2004). While genre-based approaches are predominant in academic writing instruction (for an extensive discussion focusing on the European context, see Kruse (2013)), they are not beyond criticism (Bhatia 2016; Freadman 2016).

In short, TW has a different pedagogical aim and is tailored to a different genre in comparison to the aforementioned AES systems.

AWE systems are primarily tailored to providing (formative) essay feedback and their effectiveness has yet to be proven. Also, AWE systems assume the text requiring feedback has already been written. One major aim of TW is to introduce students to the IMRD structure and its conventions and considerable instruction is required before writing can actually start. TW is insofar deemed unique, in that it aims not at supporting essay writing instruction, but HE research proposals and reports; offering different functionality to AWE tools, which primarily focus on providing feedback on preexisting text. On the pedagogical side, TW aims at strategy instruction, scaffolding, and providing linguistic support, whereas AES and AWE systems primarily utilize giving feedback.

Technology Adoption Models

Considering that implementing and maintaining information systems is costly and that acceptance, usage and the resultant benefits often lag behind expectations (Yi and Hwang 2003), technology acceptance has become a significant field of research. A range of theories for technology adoption and use (Oliveira and Martins 2011) have been developed: the technology acceptance model (TAM) (Davis 1989) (the most prominent); Theory of Planned Behavior (TPB) (Ajzen 1985); Diffusion of Innovation (DOI) (Rogers 1995); and Technology, Organization, and Environment (TOE) (Tornatzky and Fleischer 1990). Venkatesh et al. (2003) consolidated the field by discussing eight existing theories of user acceptance before suggesting the Unified Theory of User Acceptance and Use of Technology (UTAUT) as a synthesis. Williams et al. (2015) reviewed 174 research papers using UTAUT that were published between 2004 and 2011, and provided an extensive overview of the various determinants that user acceptance models utilize. Within the technology adoption literature, adoption has been examined at both the individual and the organization level. Furthermore, the nature of how technology is used is also a key variable for investigation (Venkatesh et al. 2003, p. 427). With “understand[ing] usage as a dependent variable” the authors added a third focus of research. Learning Management System (LMS) usage in higher education generated similar questions concerning implementation, acceptance, usage, and results (Yi and Hwang 2003). Studies have focused on adoption and the use of e-learning systems by students (Park 2009; Yu and Yu 2010; Tarhini et al. 2013) and faculty (Fathema et al. 2015). Yi and Hwang (2003) worked on adapting technology acceptance models for the HE context. They included self-efficacy, enjoyment, and learning goal orientation as mediators for predicting use of web-based information systems. McGill and Klobas (2009) claimed focusing on LMS adoption as deficient, suggesting that “research on the factors that influence the impacts on…student learning is needed” (p. 496). Drawing on Goodhue and Thompson’s (1995) technology-to-performance chain (TPC), McGill and Klobas utilized the resulting concept of Task-Technology Fit (TTF) for assessing LMS success (McGill and Klobas 2009). Goodhue and Thompson (1995), finally, defined TTF as “the degree to which a technology assists an individual in performing his or her portfolio of tasks” (p. 216).

Study Aim and Research Questions

Given the early stage of development, the focus of this study is on the use, usability, and usefulness of TW from a student’s perspective within the context of the thesis proposal writing task. The practical goal is to gain first insights into TW’s potential to support scaling of academic writing instruction at the researchers’ institution. The study focuses on German language users, which is the native language (L1) of almost all of the study’s participants. Student outcomes and assessing the impact of TW’s use on student performance are beyond the scope of this current study given that, by a faculty decision, the use of TW is optional and exclusion of its use not allowed. Instructional and assessment conditions for all students have to be the same, making a control group study design not feasible.

The paper uses a technology adoption model to frame the study. Although TW is a unique system for the support of academic writing instruction, it has characteristics of both an information system and a learning system. As a result, two streams of theories guided the formulation of the current study’s research questions. The focus of Venkatesh et al. (2003) was followed by studying the usage of TW as a dependent variable. As a prerequisite of task-technology fit, it is important to assess if, and to what extent, the task poses problems to the user. This led to (R1): “If and to what extent did the different parts of the proposal pose problems to the students?” Usage and individual acceptance (as a dependent variable) are addressed in (R2): “Do students use Thesis Writer, and if so which parts? Where they do not use TW, what are the reasons?” Task-technology fit is examined in more detail by (R3): “Four aids are provided in the proposal editor (tutorial, phrase bank, examples, and linguistic support). To what extent are the aids provided by the proposal-editor judged supportive by users?” Overall user-satisfaction and usability was assessed by (R4): “How do the students perceive the usability of Thesis Writer?”

Thesis Writer: Description, Functions, and Technical Implementation

TW guides students through the whole thesis writing process, allowing for interaction with instructors and peers. TW’s dual-language interface (German, English) provides supports for native and non-native German speakers working in either German or English on their thesis proposal task.

TW offers different functions relevant for scaling learning and instruction in the area of thesis writing to its three main user groups, namely students, their instructors, and the institution in charge of setting the regulatory framework for thesis writing and its assessment. To students, TW offers a cloud-based (Software-as-a-Service) writing environment, similar to services such as Google Docs, enriched by a wide range of instructional tools, content and linguistic support. TW guides students through the process of thesis writing and allows for collaborative working on texts (e.g., with fellow students or the supervisor) and therefore also supports group assignments. Instructors are assisted in routine tasks by TW’s instructional functions (e.g., tutorials on the different sections of a proposal and support on how to formulate them) and can provide direct student feedback through TW. Furthermore, TW has a portal function (integrated content management system) that allows for one-to-many communication, acting as a single point of information (e.g., for disseminating institutional regulations). Finally, a forum has been integrated allowing the building of a community of practice (Wenger 1999) among students, with the potential to decrease feedback needed by instructors. This combination of different functions distinguishes TW from other systems supporting academic writing and its instruction (AES/AWE as discussed above), making it unique.

TW supports students with: (1) orientation, planning, and focusing; (2) proposal writing; (3) text production; and (4) collaboration and coordination between student and institution (tutors, instructors, and study program directors). Each of these aspects are described in further detail below. The respective underlying pedagogical strategies for effective writing instruction, as summarized by Graham and Perin (2007), are referred to in describing the functions provided by TW.

  1. (1)

    TW aids thesis writing through varied tools and tutorials, guiding students through the writing process from initial idea to completed thesis (see Fig. 1). The flexible structure provided, based on the common IMRD model (Swales 1990), fits most academic and scholarly papers and is usable across almost all disciplines. Following Kruse (2016), the IMRD model was further differentiated resulting in the research cycle structure (see Fig. 2), forming the basic units for instruction (both in-class and for TW) and the assignment for the example course. A basic project management functionality has been implemented allowing students to monitor their progress in each section of the research cycle. Here, the underlying strategies for effective writing instruction (Graham and Perin 2007) in TW are: explicit teaching of skills, processes, and knowledge (in particular, strategy instruction and text structure, scaffolding writing (in particular, procedural facilitation), and alternative modes of composing text (in particular, using an online word processor).

    Fig. 1
    figure 1

    Thesis writing process

    Fig. 2
    figure 2

    Research cycle

  2. (2)

    A proposal wizard guides students through the proposal writing process step by step, following the research cycle. The wizard aims to help students create their first proposal in just 45–60 min, adhering to a predefined structure. At this stage, brief tutorials state expectations for each section (e.g., the research question). To avoid unnecessary user distraction, no additional support functions have been implemented for this stage. Here, underlying strategies for effective writing instruction (Graham and Perin 2007) are: a process writing approach, scaffolding writing (in particular, product goals and procedural facilitation), alternative modes of composing text (in particular, using an online word processor).

  3. (3)

    After completing the draft proposal, users can elaborate on it using the proposal editor (see Fig. 3), into which the text generated in the previous step is automatically imported. Compared to the proposal wizard, much more instructional and linguistic support is offered at this point, which can be accessed via four buttons that are specific to each step of the research cycle: (a) comprehensive tutorials, with new linguistic tools available: (b) a phrase bank from which the user can choose from around ten typical phrases derived from a corpus to start off the respective section (e.g., “This study contributes to research in…” for the “Topic” section), (c) examples (e.g., an example of how a gap in the state of the art is argued), and (d) linguistic support from a large integrated open-source, discipline-specific corpus analysis tool. So far, the corpus search allows only the displaying of collocations (when users are unsure how a specific word (e.g., “shareholder”) is used in their discipline, they can highlight it, click the linguistic support button and review collocations from the discipline-specific business analysis corpus that are displayed). The potential of linguistic corpora for supporting academic writing was discussed by Chitez et al. (2015a, b). Users are free to decide in which order to work on each section of their draft proposal, and which of the four support functions provided in each section to use and in what order. Here, the underlying strategies for effective writing instruction (Graham and Perin 2007) are: process writing approach, explicit teaching of skills, processes, and knowledge (in particular, text structure), scaffolding writing (in particular, procedural facilitation, study of models, and product goals), alternative modes of composing text (in particular, using an online word processor).

    Fig. 3
    figure 3

    Proposal-editor in TW

  4. (4)

    As the example in this study (student group assignment) shows, TW allows for real-time collaborative writing of proposals or texts. Users can invite fellow students or supervisors to their personal writing space for collaboration or to elicit comments. Here, underlying strategies for effective writing instruction (Graham and Perin 2007) are: scaffolding writing (in particular, peer assistance when writing, and feedback. Building communities of practice (Wenger 1999) is another potential option. Furthermore, TW allows for one-(institution)-to-many (students) communication.

Rapp et al. (2015) discussed the technology of TW in detail. In summary (see Fig. 4), specifications include: TW runs on a LAMP server (Linux, Apache, MySQL, PHP). The PHP-based framework yii follows strict design patterns for object-oriented programming, utilizing the principles of model-view-controller. Running the software services of jQuery, Ajax and Twitter Bootstrap facilitate a fast, interactive user experience. The CMS-like backend allows for changing data displayed in the front-end by non-technical staff.

Fig. 4
figure 4

Overview of technology used for Thesis Writer

The writing module is based on CKEditor, a popular open-source text editor used, for example, in WordPress. TW is not intended as a substitute for a word processor (e.g., MS Word), but users can export their manuscript for final editing and formatting to a word processing software. CKEditor was selected because of its flexible plugin concept. Via a user interface in TW, a linguistic support function employs the IMS Corpus Work Bench open source tool.

Methods

A mixed method explanatory sequential design (Creswell and Plano Clark 2011; Plano Clark and Ivankova 2016) was employed in order to answer the four research questions (see Fig. 5). In the first stage, primarily quantitative and some qualitative data (QUAN+qual) were collected. While quantitative data showed, for example, how many students used the system and which functions they deemed valuable, it cannot answer the “why” questions. Consequently, the quantitative results formed the basis of the subsequent second stage of qualitative data (QUAL) collection.

Fig. 5
figure 5

Flowchart methods

Setting and Participants

TW was applied to a compulsory 14-week “Skills” (key competencies) course during the fall semester of 2015. Two hundred ninety-six students participated in the course, which was taught in classes of around sixty students each. Academic working/writing was taught over 5 consecutive weeks, each with four sessions lasting 45 min. First-year business administration students were introduced to basic research-based learning and academic writing including referencing, literature searches, and critical thinking. For a pass/fail three-part assignment, groups of four students from each class were given a scientific business administration article and instructed to: (1) summarize the article; (2) perform a literature research in given databases or library catalogues on the assigned article’s topic, documenting the results; and (3) write a research proposal on a topic related to the article.

TW was introduced by the instructors of the parallel classes of the skills course, and students were encouraged to use TW. Students were told that research papers generally follow the IMRD structure (Swales 1990) and that TW was built as an extended version (see “Research Cycle” in Fig. 2) (Kruse 2016). The proposal structure to be submitted by students for the assignment had to match the structure of the research cycle. Students were encouraged to use the proposal wizard to draft their basic idea proposal, and then finalize it using the proposal editor.

Data Collection and Sampling

For the first data collection phase, a questionnaire was used, consisting mainly of closed questions (n = 8) and a few open-ended questions (n = 3). In addition to data related to the research questions, demographic data were also collected. Closed questions consisted of binary (yes/no) answers, plus Likert-type scales ranging from 1 to 4 (1-strongly disagree, 2-disagree, 3-agree, 4-strongly agree) and from 1 to 9 (1-no difficulties, 3-rather easy, 5-okay, 7-rather difficult, 9-many difficulties; in-between values of 2, 4, 6, and 8 had no wording assigned). A questionnaire pretest was conducted with a convenience sample of eight students leading to minor changes of the initial questionnaire. Reliability of the questionnaire was ensured through the pretest, with common understanding of the scales among study participants fostered by using not only scale numbers but also the verbalized forms. The survey was sent to the whole population (all students enrolled in the course) using Limesurvey running on a university server. Several reminders were issued. A total of 102 participants (61 male, 41 female) completed the survey. The participant mean (M) age was 21.52 years (Mdn 21 years, 1st quartile 20, 3rd quartile 22 years, min 18 years, max 29 years).

As the group assignment had four students per group in order to reduce the amount of grading work and feedback potential, the researchers opted to obtain further data (QUAL) in a second stage by conducting focus groups (Barbour 2007; Krueger and Casey 2015). The focus groups facilitated in-depth discussion of the use, usability, and usefulness of TW, guided by the results of the quantitative data collection. Additionally, the use of TW in collaborative assignments was also discussed. Five open-ended questions guided the focus group discussions: (1) Did you use TW? If you didn’t, why not?; (2) What was the biggest challenge when producing the proposal?; (3) From what did you benefit most from TW?; (4) What was missing the most from TW?; and (5) What bothered you the most about TW? The focus groups were conducted after the groups received oral feedback on their assignment. This ensured that questions were asked in the proper context with students able to consider their graded research proposal. In addition, the researchers wanted to avoid potential bias of seeking opinions before grades had been issued, which might have resulted in students giving an unnecessarily complimentary assessment of TW.

The researchers decided that ten focus groups would collect adequate data before saturation would occur. For two reasons, the groups of only one of the parallel classes were interviewed: Firstly, the class already consisted of 15 groups, which was more than planned; and secondly, the classes were taught at vastly different times, with time schedules varying considerably. Choosing one class for the focus groups minimized coordination costs. The class consisted of 62 participants, with 15 assignment groups of four participants, and one with two. Groups were offered feedback on their graded assignments. Feedback took place within the weekly classes over a period of 3 weeks (i.e., 3–4 groups were given feedback at the end of each week’s class). Questions were asked after the feedback was given. Feedback took roughly 10–20 min per group, and the discussion of TW around 10–15 min. From the 15 groups, 12 asked for feedback, while the other groups either did not want feedback or were not present (there was no obligation to attend classes). In one case, a group participant was not present while feedback was given to the group. The participant obtained the feedback individually the following week and was asked the questions separately. Therefore, a total of 12 groups was interviewed, plus one individual student, resulting in 13 data sets. For each interview, the guiding questions were printed, and notes taken manually by the interviewer only when data related to the research questions. In some cases, in-vivo coding (Saldaña 2013) was used on literal quotes for later use. There was no data collected directly through TW.

Data Analysis

Data analysis used MS Excel (2011) and R (Version 3.3.0), together with RStudio (Version 0.99.902). Results are reported in terms of sample mean (M), median (Mdn), standard deviation (SD), and size, where numeric values are available, and absolute numbers where only categorical data were available. Statistical significance was assumed if p < 0.05. Since the data were not normally distributed (scales, limits), significance testing was performed using a Kruskal Wallis test (as implemented in R), where several groups had to be compared. Pairwise (post-hoc) comparison between groups was done using Wilcoxon tests with error inflation correction following Holm (as implemented in R). Results of pairwise comparisons were encoded in grouping letters, such that two groups that do not share a letter are significantly different (inflation corrected alpha = 5%). Table 2 illustrates this procedure: Answers to Q1 (letter c) differ from answers to all other questions. Answers to Q2 (letter a) do not differ from answers of Q2–4 since these share a letter with Q2. Answers to Q4 (letters ab) are not different from answers of Q2–3,5–7, but only from answers of Q1 (Q2–3,5–7 contain either letters a or b). For details of the statistical methods used, see Field et al. (2012).

Qualitative data analysis followed the principal procedures of Creswell (2013), Miles et al. (2014), and Flick (2014). Data analysis commenced immediately after data collection. In a first cycle, data were coded manually on paper. The aim of this analysis was not to reveal structures of deeper meaning for generating theories, as is the intention of grounded theory, but to support the evaluation of TW. Therefore, primarily structural, descriptive and In Vivo coding (Saldaña 2013) was utilized. Results were then transcribed to MS Word followed by a second coding cycle (Saldaña 2013), which utilized pattern and focused coding on condensing the data reported per guiding question in more abstract categories (highlighted in italics in the Results section). The two cycles of coding were conducted across all data sets, section by section, resulting in codes and emerging themes. After all data per section had been analyzed, emerging themes were compared and categorized according to frequency. Finally, data were analyzed across all sections for emerging overall themes. During the data collection, it was found that the group nature of the assignment posed organizational and technical problems. In some cases, conflicts also arose within the groups of students (e.g., because of free-riding or not meeting deadlines). These more emotional aspects also became evident in some focus groups. It was decided not to analyze or report these data within this current study, but to concentrate on technical aspects related to teamwork and TW (e.g., to what extent the system supported distributed work). Although noteworthy, a separate small study of the group dynamics developing within the assignment group was deemed out of scope for this current study.

Results

Results of the data obtained via questionnaire and focus groups are provided for each research question.

R1: If and to What Extent Did the different parts of the proposal pose problems to the students?

Most of the 102 survey respondents rated the difficulty of the different parts of the proposal on a scale from 1-rather easy to 9-many difficulties.

The difference between the questions was significant overall (p < 0.001, Kruskal Wallis test), with post-hoc pairwise comparisons localizing the source of the effect (column “grouping”) (Table 1).

Table 1 Results of research question R1

Corresponding focus group data were obtained via the guiding question, “What was the biggest challenge when producing the proposal?”:

Seven groups stated that finding a topic and/or research question was the most difficult task. One group added that the difficulty was not only in finding one but in revising it until it would be suitable.

Five groups stated problems in the area of dealing with various aspects related to literature: Finding literature in general and data for companies [comment: the students study business administration and therefore frequently seek company data] was mentioned one time each. Integrating the literature found in the proposal was mentioned twice. Citing correctly was mentioned once.

Four groups stated problems related to the sections of the research cycle. The problem was to ensure proper matching but also demarcation between different research cycle sections, such as between topic and research question. However, this finding could in part be due to the character of the group assignment – it encourages division of research cycle sections among group members which poses problems in ensuring consistency between the parts. In that respect, one group stated that TW helped as team members could see what group members wrote and could take that into consideration. One group stated they had an extra meeting where the relationships between parts of the research cycle were discussed and aligned.

Coordination of teamwork and finding a suitable time schedule were each mentioned three times. These data again reflect the nature of the team assignment and the fact that students had full workloads that rarely allowed meetings at university.

Formulating text was also mentioned three times as a major problem. This was qualified in that the groups had never written academic texts before and that this assignment differed greatly from their school/work experience.

Three further aspects were mentioned by two groups: First, two groups mentioned being faced with problems getting started, or starting with the assignment/writing. Second, two groups stated difficulties to understand the assignment and being unsure when it is good enough or complete. Finally, two groups stated that the original scientific article that was assigned to them was difficult.

R2: Do students use Thesis Writer, and if so which parts? Where they do not use TW, what are the reasons?

Sixty-six of the 102 participants indicated that they used TW, with 36 indicating they did not. Of the 66 participant users of TW, 38 read tutorials, 20 watched videos, 32 used the proposal wizard, and 44 used the proposal editor.

Participants who did not use TW were asked for their reasons (on a scale from 1-strongly disagree to 4-strongly agree).

The difference between the questions had overall significance (p < 0.001, Kruskal Wallis test), with post-hoc pairwise comparisons localizing the source of the effect (column “grouping”) (Table 2).

Table 2 Results of research question R2

In the questionnaire, there were 11 responses to the open question, “Was there another reason why you did not use Thesis Writer? Please explain.” Responses clustered around six topics: (1) Five respondents stated they did not see a necessity to use TW and/or that the assignment was rather small, and therefore a tool was not required. One group used “Switch-Drive” (a file-sharing service based on ownCloud) instead. Two respondents stated they would consider using TW in the future; (2) two respondents stated that they used TW as a group and that one person was in charge of it, therefore the reporting person (not in charge) did not use TW; (3) One respondent stated an unawareness of TW; (4) One respondent started using TW then stopped, giving no reason for doing so; (5) One respondent deemed the system too laborious to get used to and therefore did not use it. However, the respondent stated s/he could imagine using it in the future; and (6) One respondent stated initial technical problems with TW that discouraged the use of it.

Corresponding focus group data was obtained via the guiding question “Did you use TW? If not – why?” Eight groups stated they used TW. One group only used it partially as they only drew on phrases from the phrase bank, but no other function of TW. Four groups did not use TW: Two stated being under severe time pressure (one group started their assignment late; the other stated being overloaded with work). Three other reasons were stated for not using TW: (1) Group did not expect extra value/benefit from TW, but stated they had not tested it; (2) Group thought they can manage assignment without TW; and (3) One group copied the assignment in MS Word and distributed the work, deciding not to use TW.

R3: Four aids are provided in the proposal editor (tutorial, phrase bank, examples, and linguistic support). To what extent are the aids provided in the proposal editor judged supportive by users?

Students were asked to rate how supportive each respective tool was (on a scale from 1-strongly disagree to 4-strongly agree). Sixty-six participants answered the question: (1) Tutorial: yes = 44, no = 22; (2) Phrase bank: yes = 53, no = 13; (3) Examples: yes = 53, no = 13; (4) Linguistic support: yes = 23, no = 41, n/a = 2.

The difference between the questions had overall significance (p = 0.044, Kruskal Wallis test), with post-hoc pairwise comparisons however not being able to localize the source of the effect (column “grouping”) (Table 3)

Table 3 Results of research question R3

.

Corresponding focus group data was obtained via the guiding question, “From what did you benefit most from TW?”

The two aspects of TW most often mentioned (seven times each) as most beneficial were: (1) The structure provided by the research cycle mirrors the sections of TW. Students reported the task so complex that the structure provided focus and clarity (e.g., they were able to concentrate on the research question and received support, and they did not have to worry about other sections at that time); (2) TW supports collaborative writing. The latter has to be seen against the background of the assignment (teamwork) and the situation of the students, who, although they are on campus often cannot meet due to different time schedules. The aspect of teamwork support was further qualified by two groups who appreciated that because of TW, no e-mails with Word attachments needed to be sent around, which had often led to confusion in the past as to what the current version was and the problem of merging different files.

The most beneficial aspect of TW, as mentioned five times was the phrase book. This was further qualified by the fact that groups stated the phrases helped them starting to write, with two matched groups mentioning getting started as the most severe problem. Furthermore, two groups mentioned the phrase book helped a lot, as they had to write “scientifically” for the first time. That also matches the respective most severe problems stated above.

R4: How do the students perceive the usability of Thesis Writer

Participants were asked to assess various items related to the usability of TW (on a scale from 1-strongly disagree, to 4-strongly agree).

The difference between the questions had overall significance (p < 0.001, Kruskal Wallis test), with post-hoc pairwise comparisons localizing the source of the effect (column “grouping”). Questions 4 and 5 were formulated in terms of non-usability, which may be the source of the significant effect (Table 4)

Table 4 Results of research question R4

.

Seven respondents answered the open question, “Do you have further comments concerning usability?” Their answers clustered around two topics:

  • Topic 1 - Technical aspects: Three comments addressed technical issues. Two addressed specific teamwork problems when collaborating in real time on the text (in the proposal editor, the text written by other group members was not always displayed immediately across all devices). The technical issue being how often (time interval) data are sent to the server and, consequently, how much traffic is generated. One respondent reported problems in amending an existing project.

  • Topic 2 - Feature/content requests: One user asked for a feature to be added to allow for storage of references in the proposal wizard. Another user asked for a chat/messaging window to facilitate group collaboration. One respondent noted phrases in the bank as too generic, requesting tailoring the phrase bank in the different sections according to different use cases (e.g., when the research topic relates to companies or consumers etc.).

Corresponding focus group data was obtained via the guiding question “What bothered you the most about TW?” Three groups mentioned that in some cases entries made in TW were not stored and subsequently lost. Three cases were mentioned regarding loss of data: (1) Usage of the new MS Edge browser; (2) laptop was closed and reopened; and (3) network connection was lost. Two groups explicitly stated nothing bothered them regarding TW. One group stated a problem with the export function: Text that was number-list formatted in TW after export to MS Word was formatted as plain text (no numbering).

Further data obtained via the guiding question, “What was missing the most from TW?” What people missed most in TW revolved about two major topics: Functions supporting team work and formal aspects. One group mentioned they would like to see when another person is typing. This has since been implemented as a feature in TW. Another group said they sometimes experienced a short delay between typing text and its on screen appearance for other group members. This could explain the first aspect mentioned. One group stated they would like pop-ups that show when other people in the group are working in TW. The following was mentioned in relation to formal aspects and editing: Two groups wanted a spell-checker; two groups requested formatting functions similar to MS Word; one group asked for a literature list template; and two groups stated explicitly that nothing was missing from TW.

Discussion

The following presents discussion of both quantitative and qualitative results for each research question.

R1: If and to what extent did the different parts of the proposal pose problems to the students?

The overall mean ratings (M) regarding R1 range from 4 to 5.11, indicating that students judged themselves able to handle the difficulties. Significant differences only arose between “finding a topic” and “developing of a formulation of the research question” as well as “showing the knowledge gap”. The latter two are judged more difficult than the first. Interestingly, the qualitative data reveal more grave problems than the quantitative: Finding a topic or research question was identified to be the most difficult, whilst dealing with various aspects of literature was reported as another major difficulty. Understanding the structure of the research cycle (i.e., an extension of IMRD) and mastering each of its components was reported as the third major difficulty. These findings are striking, considering that most existing systems for supporting writing instruction focus on automated grading and/or feedback. Consequently, these systems do not support the areas mentioned. This is discouraging as the explicit teaching of text structure and the scaffolding of students’ writing is deemed to be particularly beneficial to improving writing skills (Graham and Perin 2007).

Two further aspects were revealed in the qualitative analysis regarding R1: Firstly, coordination of teamwork among the assignment groups was reported as difficult. Teamwork, and in this instance team assignments, seems desirable not only for scaling reasons (in this case 75% fewer proposals would require grading), but it can also be pedagogically valuable (Graham and Perin 2007). TW supports teamwork, as the study confirmed; however, improvements were also suggested. Secondly, problems with writing in a scientific style were reported as challenging. This supports the strategy utilized with TW to also provide linguistic support via examples, a phrase bank, and by incorporating real-time linguistic support via corpus queries.

R2: Do students use Thesis Writer, and if so which parts? Where they do not use TW, what are the reasons?

Almost two-thirds of the participants used TW. Usage of the different parts in descending order are proposal editor; tutorials; proposal wizard; and videos. With the proposal editor being the main place of work for a proposal, this result was expected. However, it was surprising that 22 participants stated they did not use it, preferring other parts of TW. Qualitative data obtained from the open question under R2 may help explain this finding; however, only two respondents answered, stating that as tasks were shared, not everyone was involved in writing within TW. This may also explain why only 32 of 66 users reported to have used the proposal wizard.

Instruction in TW is mainly provided via text-based tutorials and to a lesser extent via two videos. In this part of the analysis, a usage of tutorials was considered that would provide a general overview of thesis writing. Tutorial usage was reported for 38 of the 66 respondents that used TW, indicating the text-based instruction was well used and accepted. Surprisingly, when considering the user group (mean [M] age 21.52), there was relatively low usage (20 of 66) of the two videos provided; however, they mainly explained TW usage rather than the research cycle itself. Users may have just preferred to ‘try out’ the system rather than view a guided video tour. This interpretation is backed by high values regarding usability of TW (R4.1-4) that confirm TW as a mostly self-explanatory system.

Regarding the non-users of TW, the most significant finding was that no attempt was made to enter the system. Consequently, technical problems or aversion to electronic tools had less significance for the non-usage of TW (or non-users did not specifically object to TW as a system). An important result was that non-users did not see any use for TW, and, to a lesser degree, do not prefer electronic tools in general. This is in line with technology acceptance findings that propose perceived usefulness and perceived ease of use as two major factors explaining acceptance or rejection of information technologies (Davis 1989).

Qualitative data obtained from 11 responses to the open question and from focus groups support the discussed findings: The most frequently stated reason was that the necessity was not seen and/or that the assignment too small and that, therefore, no tool was required. A related finding was that TW was deemed too laborious to get used to for some. Both reasons mentioned are in line with expectations from a TAM perspective, as directly related to perceived usefulness and ease of use.

While gaining a good overview (quantitative data) of TW usage/non-usage and respective reasons (qualitative data), so far there has been no exact tracking of TW user actions within the system. This functionality has now been implemented – logging each interaction with a time stamp for detailed analysis. Another open question for research is why some users start using TW but later drop out.

R3: Four aids are provided in the proposal-editor (tutorial, phrase bank, examples, and linguistic support). To what extent are the aids provided in the proposal editor judged supportive by users?

Instruction provided via the tutorials was well used and rated as supportive or highly supportive. There were more users of these context-specific tutorials compared to general tutorials, suggesting that task-specific tutorials are more attractive. This interpretation is in line with Goodhue and Thompson (1995) concerning the positive impact of task-technology fit defined as “the degree to which a technology assists an individual in performing his or her portfolio of tasks” (p. 216). Regarding instructional systems, McGill and Klobas (2009) confirmed the positive impact of task-technology fit for acceptance of learning management systems. Yu and Yu (2010) further modeled the factors that affect individuals’ utilization of online learning systems, noting that “actual utilization depends not only on perceived learner-technology fitness, but also on how well the individual need tool functionality matches the needs of the task at hand” (p. 1014), which also seems applicable to this current study’s findings. These findings seem particularly salient, given that Graham and Perin (2007) pointed to the positive effect of explicit teaching of text structures, in that TW tutorials positively affect writing instruction.

With three linguistic support functions provided, phrase bank and examples were reported to have equal usage frequency, and nearly equal results with regard to usefulness. TW’s support functions seem to have been appreciated by users. Linguistic support from the integrated discipline-specific corpus was less used, but still judged as supportive. The relatively low usage was as expected, with the feature not yet fully functional and providing a concordance function only, highlighting collocations preceding and following a single word.

Qualitative data obtained regarding TW’s phrase bank were surprising: Users reported the phrase bank helped them getting started – one of the barriers to writing. One focus group reported students not knowing how to start writing. They choose a phrase within the research cycle and then just completed the sentence in regard to their proposal idea. The surprise for the students was that, after the first sentence was written, somehow automatically, the next one appeared etc. In terms of writing didactics, this can be classified as procedural facilitation; a means of scaffolding (Graham and Perin 2007). Clearly this needs further investigation as it would present another advantage over systems solely concentrating on feedback. Less surprising were students reporting that the phrase book supported the use of correct phrases as they were not used to the genre of research reports and, in particular, its conventions. Scientific style was also one of the biggest problems reported by students (R1). Again, this supports the claim that electronic systems for writing instruction should also provide explicit instruction (Graham and Perin 2007). Another surprise were that many focus group students reported that the TW structure (i.e. research cycle; Kruse 2016) was very helpful, as it focused their attention given the many tasks and challenges of the assignment. This finding is in line with cognitive theories of writing (Flower et al. 1989), and it could therefore be interpreted that the structure and corresponding instruction reduce cognitive load.

The four support functions provided in the proposal editor were well used and seem beneficial to writers in fulfilling their respective tasks. Furthermore, the results of this study show that TW functions align with the findings for effective writing instruction, as summarized by Graham and Perin (2007), and that students value the respective functions of TW.

R4: How do the students perceive the usability of Thesis Writer?

Concerning usability, four major aspects were studied: (1) General usability of TW (questions R4.1-3); (2) more support requested of TW (questions R4.4-5); (3) teamwork on TW (as the assignment was for groups of four) (questions R4.6-7); and (4) overall satisfaction with TW (questions R4.8-10).

  1. (1)

    Usability data indicate that TW is easy to understand, not confusing, and has a clear design. Taking into account that TW is relatively complex, this confirms the appropriateness of the chosen navigation strategy. This result is supported by most users not requesting additional technical support and that non-users did not state technical reasons for their non-usage. Therefore, it seems justifiable to conclude that TW offers a good user experience, and with sufficient usability.

  2. (2)

    Concerning the desire to achieve more support in terms of content (i.e., more background on different aspects of proposal/thesis writing), users either disagreed or strongly disagreed. However, this result is ambiguous: In R1 (problems posed by the assignment) it was revealed that the assignment task posed some problems to the students. Furthermore, instructors reported problems and failures found in the assignments the students submitted; therefore, one would be expected to see users requesting further content support. One potential interpretation is that students were unable to self-assess their performance; a plausible idea as, for almost all participants, it was their first experience with academic writing, and the data were collected before the results of their assignments were published. This issue could be studied in future research by means of interviews or by collecting data after assignment results are posted.

  3. (3)

    TW supports teamwork to the extent that users can easily invite other users (via the LDAP university directory) to work collaboratively on documents in real time (i.e., joint text production), but with some limitations: Users cannot work on the same section at the same time in the proposal editor, and there may be a short delay until other users’ entries become visible (as reported by two users under R4). Users rated teamwork in TW as easy and considered it supportive. Currently, none of the more sophisticated features, for example, Google Docs suggestion mode, sophisticated commenting mode, or chat mode, are available on TW, and only two users asked for additional features in the open questions (chat and reference management). This can be interpreted in line with TTF, meaning that users are satisfied if the functions provided sufficiently support the task in hand (as in ‘teamwork’ on TW). The qualitative data obtained mostly support this interpretation; however, some smaller features were requested to further support teamwork. As teamwork not only allows for scaling instruction (fewer proposals to be graded) and can also support writing instruction (Graham and Perin 2007), it is planned to further develop teamwork support on TW.

  4. (4)

    Results for the overall evaluation of TW are encouraging. Participants felt to a high degree that TW supported them in fulfilling the task (assignment) and agreed they would like to use TW for their undergraduate thesis. The question “Would you recommend Thesis Writer?” was used as a proxy for the overall satisfaction of users with TW. Considering this current study reports only the first large-scale test run of TW, and that some minor technical problems were encountered, a mean value of 3.28 (SD = 0.79) on a scale of 1 to 4 is certainly encouraging.

Conclusion

Thesis Writer differs from existing systems for supporting writing instruction in two ways: Firstly, it targets a different genre (research report; IMRD) and, secondly, more than most existing systems (AES, AWE) (Allen et al. 2015; Stevenson and Phakiti 2014), it focuses on scaffolding rather than feedback provision. In that respect, TW appears to be unique. Functions offered by TW are firmly rooted in proven effective strategies of writing instruction (Graham and Perin 2007). TW is still at an early stage of development, yet it has been used by around 1000 students. Without downplaying some minor problems experienced and the open questions raised in response to this study, one contribution of this study is that it gives first evidence that TW developed for the IMRD genre; utilizing primarily different means of scaffolding academic writing instruction is viable, functional and beneficial. Therefore, potential has been identified for improving existing systems by incorporating functionality proven useful in the context of TW. The same holds true the other way around: With focus groups students asking for spell checkers and similar functionality; TW would definitely benefit from providing automated feedback, as AWE systems do. For the English language version of TW, this should be relatively easy to integrate given that, unlike for the German language, many elaborated systems and approaches already exist.

A second contribution of this study relates to scaling academic writing instruction in group settings: TW was designed for scaling writing instruction, given both the high number of students at the author’s faculty and Switzerland’s high labor costs. Therefore, the practical goal of this study was to assess whether or not TW supports scaling academic writing instruction. The group assignments reduced the number of proposals to be graded by 75%. Graham and Perin (2007) proved that learning can actually be supported by group assignments rather than the opposite, i.e. scaling does not have to lead to reduced learning outcomes. It was proved that TW effectively supports group work, although improvements are still possible. A second intention regarding scaling was to reduce instructors’ routine work by providing tutorials on the genre of research proposals. The current study revealed the tutorials were well used and rated as beneficial, but so far it cannot be inferred that it reduced the amount of questions received by instructors. This remains an area of potential future study.

Potential limitations of this current study are: (1) Only one group of students was studied, with a survey response rate of 34.5% (102 of 296). Data bias may exist by (unknown) confounders specific to this group. The study was a first attempt at gathering user data on TW. Currently, more data with different groups are being collected; (2) although a pretest was applied and questions were designed to be specific, short, and unambiguous, there are some threats to validity (e.g., interpretability of the scales, misunderstanding of questions, or answering with slightly different context than assumed by the researchers); (3) as a pilot study, alternative data sources such as log files or screen recordings were not considered, which could, potentially, have significantly improved knowledge of TW’s actual system usage. Assignment results (submitted proposals) were also not used as a data source on the usage of TW; (4) it cannot be excluded that technical problems that occurred during the first test run may have distorted some of the reported data; (5) as the assignments were completed in groups of four, it is unknown which tasks each student performed, and therefore patterns of student usage are undefined; and (6) the study was carried out with first-year students as most academic writing instruction is provided to our BA students in the first year and, to date, not all functions are implemented to support the writing of a BA thesis. Although it remains to be proven, it is assumed that many of the results of this study will also be applicable to non-first-year BA students.

Concerning further development of TW, a tracking function has now been implemented to TW which will track time-stamped user interactions in TW. The aim is to build a “replay function” in order to visualize interaction on the screen, similar to a screen recording but circumventing associated problems (Tang et al. 2006). Obviously, this would allow analysis of both user interaction and writing process simultaneously. In other words, a user could open the phrase bank and the tracking function would show which text is typed, or allow for impact assessment of a tutorial that has been read, based on the subsequently entered text. This would enable researchers to study writing processes and the interaction of TW and users based on results at a much higher level of precision. Furthermore, as an initial test has shown, a great amount of data is generated across the many users. This has resulted in, without exaggeration, big data, which in itself opens up new opportunities in writing research.