skip to main content
10.1145/3640543.3645169acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Enhancing Peer Review with AI-Powered Suggestion Generation Assistance: Investigating the Design Dynamics

Published:05 April 2024Publication History

ABSTRACT

While writing peer reviews resembles an important task in science, education, and large organizations, providing fruitful suggestions to peers is not a straightforward task, as different user interaction designs of text suggestion interfaces can have diverse effects on user behaviors when writing the review text. Generative language models might be able to support humans in formulating reviews with textual suggestions. Previous systems use two designs for providing text suggestions, but do not empirically evaluate them: inline and list of suggestions. To investigate the effects of embedding NLP text generation models in the two designs, we collected user requirements to implement Hamta as an example of assistants providing reviewers with text suggestions. Our experiment on comparing the two designs on 31 participants indicates that people using the inline interface provided longer reviews on average, while participants using the list of suggestions experienced more ease of use in using our tool. The results shed light on important design findings for embedding text generation models in user-centered assistants.

References

  1. Enrique Acosta, juan jose escribano otero, and Gabriela Toletti. 2014. Peer Review Experiences for MOOC. Development and Testing of a Peer Review System for a Massive Online Course. The New Educational Review 37 (10 2014), 66. https://doi.org/10.15804/tner.14.37.3.05Google ScholarGoogle ScholarCross RefCross Ref
  2. Ritu Agarwal and Elena Karahanna. 2000. Time Flies When You’re Having Fun: Cognitive Absorption and Beliefs about Information Technology Usage. MIS Quarterly 24, 4 (12 2000), 665. https://doi.org/10.2307/3250951Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bruce Alberts, Brooks Hanson, and Katrina L. Kelner. 2008. Reviewing Peer Review. Science 321, 5885 (2008), 15–15. https://doi.org/10.1126/science.1162115 arXiv:https://www.science.org/doi/pdf/10.1126/science.1162115Google ScholarGoogle ScholarCross RefCross Ref
  4. Beverly Alimo-Metcalfe. 1998. 360 Degree Feedback and Leadership Development. International Journal of Selection and Assessment 6, 1 (1998), 35–44. https://doi.org/10.1111/1468-2389.00070Google ScholarGoogle ScholarCross RefCross Ref
  5. sharareh alipour, Sina Elahimanesh, Soroush Jahanzad, Parimehr Morassafar, and Seyed Parsa Neshaei. 2022. A Blockchain Approach to Academic Assessment. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 306, 6 pages. https://doi.org/10.1145/3491101.3519682Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ali Amin-Nejad, Julia Ive, and Sumithra Velupillai. 2020. Exploring transformer text generation for medical dataset augmentation. In Proceedings of the Twelfth Language Resources and Evaluation Conference. 4699–4708.Google ScholarGoogle Scholar
  7. Yanti Andriyani, Rashina Hoda, and Robert Amor. 2017. Reflection in agile retrospectives. In Agile Processes in Software Engineering and Extreme Programming(Lecture Notes in Business Information Processing), Hubert Baumeister, Horst Lichter, and Matthias Riebisch (Eds.). Springer, 3–19. https://doi.org/10.1007/978-3-319-57633-6_1 Conference on Agile Software Development 2017, XP 2017 ; Conference date: 22-05-2017 Through 26-05-2017.Google ScholarGoogle ScholarCross RefCross Ref
  8. Kenneth C Arnold, Krzysztof Z Gajos, and Adam T Kalai. 2016. On suggesting phrases vs. predicting words for mobile text composition. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. 603–608.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kathleen M Arnold, Sharda Umanath, Kara Thio, Walter B Reilly, Mark A McDaniel, and Elizabeth J Marsh. 2017. Understanding the cognitive processes involved in writing to learn.Journal of Experimental Psychology: Applied 23, 2 (2017), 115.Google ScholarGoogle Scholar
  10. S. J. Ashford. 1986. Feedback-Seeking in Individual Adaptation : A Resource Perspective. Academy of Management Journal 29, 3 (9 1986), 465–487. https://doi.org/10.2307/256219Google ScholarGoogle ScholarCross RefCross Ref
  11. Kent Beck, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick, Robert C. Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland, and Dave Thomas. 2001. Manifesto for Agile Software Development. http://www.agilemanifesto.org/Google ScholarGoogle Scholar
  12. Xiaojun Bi, Tom Ouyang, and Shumin Zhai. 2014. Both complete and correct? Multi-objective optimization of touchscreen keyboard. In Proceedings of the SIGCHI conference on human factors in computing systems. 2297–2306.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Baidyanath Biswas, Pooja Sengupta, Ajay Kumar, Dursun Delen, and Shivam Gupta. 2022. A critical assessment of consumer reviews: A hybrid NLP-based methodology. Decision Support Systems 159 (2022), 113799. https://doi.org/10.1016/j.dss.2022.113799Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. James Bradbury, Stephen Merity, Caiming Xiong, and Richard Socher. 2016. Quasi-Recurrent Neural Networks. https://doi.org/10.48550/ARXIV.1611.01576Google ScholarGoogle ScholarCross RefCross Ref
  15. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdfGoogle ScholarGoogle Scholar
  16. William S Burroughs. 1961. The cut-up method of Brion Gysin. The third mind (1961), 29–33.Google ScholarGoogle Scholar
  17. Mia Xu Chen, Benjamin N. Lee, Gagan Bansal, Yuan Cao, Shuyuan Zhang, Justin Lu, Jackie Tsay, Yinan Wang, Andrew M. Dai, Zhifeng Chen, Timothy Sohn, and Yonghui Wu. 2019. Gmail Smart Compose. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. https://doi.org/10.1145/3292500.3330723Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kun-Hung Cheng, Jyh-Chong Liang, and Chin-Chung Tsai. 2015. Examining the role of feedback messages in undergraduate students’ writing performance during an online peer assessment activity. The Internet and Higher Education 25 (2015), 78–84. https://doi.org/10.1016/j.iheduc.2015.02.001Google ScholarGoogle ScholarCross RefCross Ref
  19. Peng Cheng, Xiang Lian, Zhao Chen, Rui Fu, Lei Chen, Jinsong Han, and Jizhong Zhao. 2014. Reliable diversity-based spatial crowdsourcing by moving workers. arXiv preprint arXiv:1412.0223 (2014).Google ScholarGoogle Scholar
  20. Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2015. Gated feedback recurrent neural networks. In International conference on machine learning. PMLR, 2067–2075.Google ScholarGoogle Scholar
  21. Elizabeth Clark, Anne Spencer Ross, Chenhao Tan, Yangfeng Ji, and Noah A. Smith. 2018. Creative Writing with a Machine in the Loop: Case Studies on Slogans and Stories. In 23rd International Conference on Intelligent User Interfaces (Tokyo, Japan) (IUI ’18). Association for Computing Machinery, New York, NY, USA, 329–340. https://doi.org/10.1145/3172944.3172983Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Victoria Clarke and Virginia Braun. 2021. Thematic analysis: a practical guide. Thematic Analysis (2021), 1–100.Google ScholarGoogle Scholar
  23. Alan Cooper, Robert Reimann, and David Cronin. 2007. About face 3: the essentials of interaction design. John Wiley & Sons.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Microsoft Corporation. 2023. IntelliSense in Visual Studio Code. https://code.visualstudio.com/docs/editor/intellisenseGoogle ScholarGoogle Scholar
  25. Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A. Bharath. 2018. Generative Adversarial Networks: An Overview. IEEE Signal Processing Magazine 35, 1 (1 2018), 53–65. https://doi.org/10.1109/MSP.2017.2765202Google ScholarGoogle ScholarCross RefCross Ref
  26. Lee J. Cronbach. 1951. Coefficient alpha and the internal structure of tests. Psychometrika 16, 3 (1951), 297–334. https://doi.org/10.1007/BF02310555Google ScholarGoogle ScholarCross RefCross Ref
  27. Arghavan Moradi Dakhel, Vahid Majdinasab, Amin Nikanjam, Foutse Khomh, Michel C Desmarais, Zhen Ming, 2022. GitHub Copilot AI pair programmer: Asset or Liability?arXiv preprint arXiv:2206.15331 (2022).Google ScholarGoogle Scholar
  28. Robert Dale. 2021. GPT-3: What’s it good for?Natural Language Engineering 27, 1 (2021), 113–118.Google ScholarGoogle Scholar
  29. Richard Lee Davis, Thiemo Wambsganss, Wei Jiang, Kevin Gonyop Kim, Tanja Käser, and Pierre Dillenbourg. 2023. Fashioning the Future: Unlocking the Creative Potential of Deep Generative Models for Design Space Exploration. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–9.Google ScholarGoogle Scholar
  30. Gustavo H de Rosa and Joao P Papa. 2021. A survey on text generation using generative adversarial networks. Pattern Recognition 119 (2021), 108098.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mark D Dunlop and Andrew Crossan. 2000. Predictive text entry methods for mobile phones. Personal Technologies 4, 2 (2000), 134–143.Google ScholarGoogle ScholarCross RefCross Ref
  32. Peter Elbow. 1998. Writing with power: Techniques for mastering the writing process. Oxford University Press.Google ScholarGoogle Scholar
  33. Katherine Elkins and Jon Chun. 2020. Can GPT-3 pass a Writer’s turing test?Journal of Cultural Analytics 5, 2 (2020), 17212.Google ScholarGoogle Scholar
  34. Le Fang, Tao Zeng, Chaochun Liu, Liefeng Bo, Wen Dong, and Changyou Chen. 2021. Transformer-based conditional variational autoencoder for controllable story generation. arXiv preprint arXiv:2101.00828 (2021).Google ScholarGoogle Scholar
  35. Donald B Fedor, Kenneth L Bettenhausen, and Walter Davis. 1999. Peer reviews: Employees’ dual roles as raters and recipients. Group & Organization Management 24, 1 (1999), 92–120.Google ScholarGoogle ScholarCross RefCross Ref
  36. Tira Nur Fitria. 2021. Grammarly as AI-powered English writing assistant: Students’ alternative for writing English. Metathesis: Journal of English Language, Literature, and Teaching 5, 1 (2021), 65–78.Google ScholarGoogle ScholarCross RefCross Ref
  37. Linda Flower and John R Hayes. 1981. A cognitive process theory of writing. College composition and communication 32, 4 (1981), 365–387.Google ScholarGoogle ScholarCross RefCross Ref
  38. Ana Frankenberg-Garcia, Robert Lew, Jonathan C Roberts, Geraint Paul Rees, and Nirwan Sharma. 2019. Developing a writing assistant to help EAP writers with collocations in real time. ReCALL 31, 1 (2019), 23–39.Google ScholarGoogle ScholarCross RefCross Ref
  39. Hansjörg Fromm, Thiemo Wambsganss, and Matthias Söllner. 2019. Towards A Taxonomy of Text Mining Features. In Proceedings of the 27th European Conference on Information Systems (ECIS).Google ScholarGoogle Scholar
  40. David Galbraith. 2009. Cognitive models of writing. German as a foreign language2-3 (2009), 7–22.Google ScholarGoogle Scholar
  41. Hao Ge, Yin Xia, Xu Chen, Randall Berry, and Ying Wu. 2018. Fictitious GAN: Training GANs with historical models. In Proceedings of the European Conference on Computer Vision (ECCV). 119–134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Katy Gero, Alex Calderwood, Charlotte Li, and Lydia Chilton. 2022. A Design Space for Writing Support Tools Using a Cognitive Process Model of Writing. In Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2022). Association for Computational Linguistics, Dublin, Ireland, 11–24. https://doi.org/10.18653/v1/2022.in2writing-1.2Google ScholarGoogle ScholarCross RefCross Ref
  43. Surjya Ghosh, Kaustubh Hiware, Niloy Ganguly, Bivas Mitra, and Pradipta De. 2019. Does emotion influence the use of auto-suggest during smartphone typing?. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 144–149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. https://doi.org/10.48550/ARXIV.1406.2661Google ScholarGoogle ScholarCross RefCross Ref
  45. J Hattie and H Timperley. 2007. The power of feedback. Review of Educational Research, 77 (1), 81-112. Retrieved from. (2007).Google ScholarGoogle ScholarCross RefCross Ref
  46. John R Hayes and Linda S Flower. 1980. The dynamics of composing: Making plans and juggling constraints. Cognitive processes in writing (1980), 31–50.Google ScholarGoogle Scholar
  47. John R Hayes and Linda S Flower. 1986. Writing research and the writer.American psychologist 41, 10 (1986), 1106.Google ScholarGoogle Scholar
  48. Yijue How and Min-Yen Kan. 2005. Optimizing predictive text entry for short message service on mobile phones. In Proceedings of HCII, Vol. 5. 2005.Google ScholarGoogle Scholar
  49. Julie S. Hui, Darren Gergle, and Elizabeth M. Gerber. 2018. IntroAssist: A Tool to Support Writing Introductory Help Requests. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3173596Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Theresia Devi Indriasari, Andrew Luxton-Reilly, and Paul Denny. 2020. A Review of Peer Code Review in Higher Education. ACM Trans. Comput. Educ. 20, 3, Article 22 (sep 2020), 25 pages. https://doi.org/10.1145/3403935Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Touseef Iqbal and Shaima Qureshi. 2020. The survey: Text generation models in deep learning. Journal of King Saud University-Computer and Information Sciences (2020).Google ScholarGoogle Scholar
  52. Ayush Jaiswal, Wael AbdAlmageed, Yue Wu, and Premkumar Natarajan. 2018. Capsulegan: Generative adversarial capsule network. In Proceedings of the European conference on computer vision (ECCV) workshops. 0–0.Google ScholarGoogle Scholar
  53. Tom Jefferson, Philip Alderson, Elizabeth Wager, and Frank Davidoff. 2002. Effects of editorial peer review: a systematic review. Jama 287, 21 (2002), 2784–2786.Google ScholarGoogle ScholarCross RefCross Ref
  54. Jenni. 2023. Jenni AI: Supercharge your writing with the most advanced AI writing assistant. https://jenni.ai/Google ScholarGoogle Scholar
  55. Emil Thorstensen Jensen, Martin Hansen, Evelyn Eika, and Frode Eika Sandnes. 2020. Country selection on web forms: a comparison of dropdown menus, radio buttons and text field with autocomplete. In 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM). IEEE, 1–4.Google ScholarGoogle ScholarCross RefCross Ref
  56. Jeff Johnson, Teresa L. Roberts, William Verplank, David Canfield Smith, Charles H. Irby, Marian Beard, and Kevin Mackey. 1989. The Xerox Star: A Retrospective. Computer 22, 9 (1989), 11–26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine Van Zuylen, Sebastian Kohlmeier, Eduard Hovy, and Roy Schwartz. 2018. A dataset of peer reviews (peerread): Collection, insights and nlp applications. arXiv preprint arXiv:1804.09635 (2018).Google ScholarGoogle Scholar
  58. Urvashi Khandelwal, Kevin Clark, Dan Jurafsky, and Lukasz Kaiser. 2019. Sample efficient text summarization using a single pre-trained transformer. arXiv preprint arXiv:1905.08836 (2019).Google ScholarGoogle Scholar
  59. Rafal Kocielnik, Saleema Amershi, and Paul N Bennett. 2019. Will you accept an imperfect AI? exploring designs for adjusting end-user expectations of AI systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Surafel M Lakew, Mauro Cettolo, and Marcello Federico. 2018. A comparison of transformer and recurrent neural networks on multilingual neural machine translation. arXiv preprint arXiv:1806.06957 (2018).Google ScholarGoogle Scholar
  61. Severin Landolt, Thiemo Wambsganß, and Matthias Söllner. 2021. A Taxonomy for Deep Learning in Natural Language Processing. https://doi.org/10.24251/HICSS.2021.129Google ScholarGoogle ScholarCross RefCross Ref
  62. Jieh-Sheng Lee and Jieh Hsiang. 2020. Patent claim generation by fine-tuning OpenAI GPT-2. World Patent Information 62 (2020), 101983.Google ScholarGoogle ScholarCross RefCross Ref
  63. Florian Lehmann and Daniel Buschek. 2022. Examining Autocompletion as a Basic Concept for Interaction with Generative AI. CoRR abs/2201.06892 (2022). arXiv:2201.06892https://arxiv.org/abs/2201.06892Google ScholarGoogle Scholar
  64. Rensis Likert. 1932. A technique for the measurement of attitudes.Archives of psychology (1932).Google ScholarGoogle Scholar
  65. Ming Liu, Rafael A Calvo, and Vasile Rus. 2012. Hybrid Question Generation Approach for Critical Review Writing Support. In Proceedings of the 20th International Conference on Computers in Education. Singapore: Asia-Pacific Society for Computers in Education.Google ScholarGoogle Scholar
  66. Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, and Jun Zhu. 2021. Query2label: A simple transformer way to multi-label classification. arXiv preprint arXiv:2107.10834 (2021).Google ScholarGoogle Scholar
  67. Lawrence C Loh. 2016. Autocomplete: Dr Google’s “helpful” assistant?Canadian Family Physician 62, 8 (2016), 622–623.Google ScholarGoogle Scholar
  68. Kristi Lundstrom and Wendy Baker. 2009. To give is better than to receive: The benefits of peer review to the reviewer’s own writing. Journal of Second Language Writing 18, 1 (2009), 30–43. https://doi.org/10.1016/j.jslw.2008.06.002Google ScholarGoogle ScholarCross RefCross Ref
  69. Charles A. MacArthur. 1999. Word Prediction for Students with Severe Spelling Problems. Learning Disability Quarterly 22, 3 (1999), 158–172. https://doi.org/10.2307/1511283 arXiv:https://doi.org/10.2307/1511283Google ScholarGoogle ScholarCross RefCross Ref
  70. A. J. Meadows. 1998. Communicating Research. Academic Press, San Diego, CA.Google ScholarGoogle Scholar
  71. Bhaskar Mitra, Milad Shokouhi, Filip Radlinski, and Katja Hofmann. 2014. On user interactions with query auto-completion. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 1055–1058.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Rohan Modi, Kush Naik, Tarjni Vyas, Shivani Desai, and Sheshang Degadwala. 2021. E-mail autocomplete function using RNN Encoder-decoder sequence-to-sequence model. In 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE, 710–714.Google ScholarGoogle ScholarCross RefCross Ref
  73. Raoul A Mulder, Jon M Pearce, and Chi Baik. 2014. Peer review in higher education: Student perceptions before and after participation. Active Learning in Higher Education 15, 2 (2014), 157–171.Google ScholarGoogle ScholarCross RefCross Ref
  74. David Nicol. 2014. Guiding principles for peer review: unlocking learners’ evaluative skills. Advances and Innovations in University Assessment and Feedback (2014), 197–224.Google ScholarGoogle Scholar
  75. OECD. 2018. The Future of Education and Skills - Education 2030. https://doi.org/2018-06-15Google ScholarGoogle Scholar
  76. Hiroyuki Osone, Jun-Li Lu, and Yoichi Ochiai. 2021. BunCho: AI Supported Story Co-Creation via Unsupervised Multitask Learning to Increase Writers’ Creativity in Japanese. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI EA ’21). Association for Computing Machinery, New York, NY, USA, Article 19, 10 pages. https://doi.org/10.1145/3411763.3450391Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Eyal Peer, Laura Brandimarte, Sonam Samat, and Alessandro Acquisti. 2017. Beyond the Turk: Alternative platforms for crowdsourcing behavioral research. Journal of Experimental Social Psychology 70 (2017), 153–163.Google ScholarGoogle ScholarCross RefCross Ref
  78. Baolin Peng and Kaisheng Yao. 2015. Recurrent Neural Networks with External Memory for Language Understanding. https://doi.org/10.48550/ARXIV.1506.00195Google ScholarGoogle ScholarCross RefCross Ref
  79. Mariia Petryk, Michael Rivera, Siddharth Bhattacharya, Liangfei Qiu, and Subodha Kumar. 2022. How Network Embeddedness Affects Real-Time Performance Feedback: An Empirical Investigation. Information Systems Research 33, 4 (2022), 1467–1489. https://doi.org/10.1287/isre.2022.1110 arXiv:https://doi.org/10.1287/isre.2022.1110Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018. Improving language understanding by generative pre-training. (2018).Google ScholarGoogle Scholar
  81. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google ScholarGoogle Scholar
  82. Lloyd J. Rieber. 2006. Using Peer Review to Improve Student Writing in Business Courses. Journal of Education for Business 81, 6 (2006), 322–326. https://doi.org/10.3200/JOEB.81.6.322-326 arXiv:https://doi.org/10.3200/JOEB.81.6.322-326Google ScholarGoogle ScholarCross RefCross Ref
  83. Roman Rietsche, Daniel Frei, Emanuel Stöckli, and Matthias Söllner. 2019. Not All Reviews are Equal - a Literature Review on Online Review Helpfulness. In Proceedings of the 27th European Conference on Information Systems (ECIS).Google ScholarGoogle Scholar
  84. Michael Rivera, Liangfei Qiu, Subodha Kumar, and Tony Petrucci. 2021. Are Traditional Performance Reviews Outdated? An Empirical Analysis on Continuous, Real-Time Feedback in the Workplace. Information Systems Research 32, 2 (2021), 517–540. https://doi.org/10.1287/isre.2020.0979 arXiv:https://doi.org/10.1287/isre.2020.0979Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who Are the Crowdworkers? Shifting Demographics in Mechanical Turk. In CHI ’10 Extended Abstracts on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI EA ’10). Association for Computing Machinery, New York, NY, USA, 2863–2872. https://doi.org/10.1145/1753846.1753873Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Taihua Shao, Yupu Guo, Honghui Chen, and Zepeng Hao. 2019. Transformer-based neural network for answer selection in question answering. IEEE Access 7 (2019), 26146–26156.Google ScholarGoogle ScholarCross RefCross Ref
  87. Mike Sharples. 2022. Automated Essay Writing: An AIED Opinion. International Journal of Artificial Intelligence in Education (2022), 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  88. Xiaotian Su, Thiemo Wambsganss, Roman Rietsche, Seyed Parsa Neshaei, and Tanja Käser. 2023. Reviewriter: AI-generated instructions for peer review writing. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). 57–71.Google ScholarGoogle ScholarCross RefCross Ref
  89. Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune BERT for text classification?. In China national conference on Chinese computational linguistics. Springer, 194–206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Vadim Sushko, Jurgen Gall, and Anna Khoreva. 2021. One-shot GAN: Learning to generate samples from single images and videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2596–2600.Google ScholarGoogle ScholarCross RefCross Ref
  91. Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Jonas Thiergart, Stefan Huber, and Thomas Übellacker. 2021. Understanding Emails and Drafting Responses–An Approach Using GPT-3. arXiv preprint arXiv:2102.03062 (2021).Google ScholarGoogle Scholar
  93. Tharis Thimthong, Thippaya Chintakovid, and Soradech Krootjohn. 2012. An empirical study of search box and autocomplete design patterns in online bookstore. In 2012 IEEE Symposium on Humanities, Science and Engineering Research. 1165–1170. https://doi.org/10.1109/SHUSER.2012.6268796Google ScholarGoogle ScholarCross RefCross Ref
  94. Almira Osmanovic Thunström and Steinn Steingrimsson. 2022. Can GPT-3 write an academic paper on itself, with minimal human input? (2022).Google ScholarGoogle Scholar
  95. Keith Topping. 1998. Peer Assessment Between Students in Colleges and Universities. Review of Educational Research 68, 3 (1998), 249–276.Google ScholarGoogle ScholarCross RefCross Ref
  96. Keith J Topping. 2010. Methodological quandaries in studying process and outcomes in peer assessment. Learning and instruction 20, 4 (2010), 339–343.Google ScholarGoogle Scholar
  97. Lewis Tunstall, Leandro von Werra, and Thomas Wolf. 2022. Natural Language Processing with Transformers. " O’Reilly Media, Inc.".Google ScholarGoogle Scholar
  98. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In Advances in neural information processing systemsNips (2017), 5998–6008.Google ScholarGoogle Scholar
  99. Viswanath Venkatesh and Hillol Bala. 2008. Technology acceptance model 3 and a research agenda on interventions. Decision Sciences 39, 2 (5 2008), 273–315. https://doi.org/10.1111/j.1540-5915.2008.00192.xGoogle ScholarGoogle ScholarCross RefCross Ref
  100. Viswanath Venkatesh, Michael G Morris, Gordon B Davis, and Fred D Davis. 2003. User Acceptance of Information Technology: Toward a Unified View. MIS Quarterly 27, 3 (2003), 425–478.Google ScholarGoogle ScholarCross RefCross Ref
  101. Thiemo Wambsganss, Tobias Kueng, Matthias Soellner, and Jan Marco Leimeister. 2021. ArgueTutor: An adaptive dialog-based learning system for argumentation skills. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Thiemo Wambsganss, Christina Niklaus, Matthias Cetto, Matthias Söllner, Jan Marco Leimeister, and Siegfried Handschuh. 2020. AL : An Adaptive Learning Support System for Argumentation Skills. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Thiemo Wambsganss, Christina Niklaus, Matthias Söllner, Siegfried Handschuh, and Jan Marco Leimeister. 2020. A Corpus for Argumentative Writing Support in German. In 28th International Conference on Computational Linguistics (Coling). Barcelona, Spain. https://doi.org/10.18653/v1/2020.coling-main.74Google ScholarGoogle ScholarCross RefCross Ref
  104. Thiemo Wambsganss, Christina Niklaus, Matthias Söllner, Siegfried Handschuh, and Jan Marco Leimeister. 2021. Supporting Cognitive and Emotional Empathic Writing of Students. In 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 4063–4077. https://doi.org/10.18653/v1/2021.acl-long.314Google ScholarGoogle ScholarCross RefCross Ref
  105. Thiemo Wambsganss, Matthias Soellner, Kenneth R Koedinger, and Jan Marco Leimeister. 2022. Adaptive Empathy Learning Support in Peer Review Scenarios. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 227, 17 pages. https://doi.org/10.1145/3491102.3517740Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Thiemo Wambsganss, Xiaotian Su, Vinitra Swamy, Seyed Parsa Neshaei, Roman Rietsche, and Tanja Käser. 2023. Unraveling Downstream Gender Bias from Large Language Models: A Study on AI Educational Writing Assistance. arXiv preprint arXiv:2311.03311 (2023).Google ScholarGoogle Scholar
  107. Thiemo Wambsganss, Vinitra Swamy, Roman Rietsche, and Tanja Käser. 2022. Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na (Eds.). International Committee on Computational Linguistics, 1344–1356. https://aclanthology.org/2022.coling-1.115Google ScholarGoogle Scholar
  108. David Ward, Jim Hahn, and Kirsten Feist. 2012. Autocomplete as Research Tool: A Study on Providing Search Suggestions. Information Technology and Libraries 31, 4 (Dec. 2012), 6–19. https://doi.org/10.6017/ital.v31i4.1930Google ScholarGoogle ScholarCross RefCross Ref
  109. Florian Weber, Thiemo Wambsganss, Seyed Parsa Neshaei, and Matthias Soellner. 2023. Structured persuasive writing support in legal education: A model and tool for German legal case solutions. In Findings of the Association for Computational Linguistics: ACL 2023. 2296–2313.Google ScholarGoogle ScholarCross RefCross Ref
  110. World Economic Forum WEF. 2018. The Future of Jobs Report 2018. Technical Report. https://doi.org/10.1177/0891242417690604Google ScholarGoogle Scholar
  111. Vanessa Williamson. 2016. On the ethics of crowdsourced research. PS: Political Science & Politics 49, 1 (2016), 77–81.Google ScholarGoogle ScholarCross RefCross Ref
  112. Matthew M. Yalch, Erika M. Vitale, and J. Kevin Ford. 2019. Benefits of Peer Review on Students’ Writing. Psychology Learning & Teaching 18, 3 (2019), 317–325. https://doi.org/10.1177/1475725719835070 arXiv:https://doi.org/10.1177/1475725719835070Google ScholarGoogle ScholarCross RefCross Ref
  113. Yu-Fen Yang. 2011. A reciprocal peer review system to support college students’ writing. British Journal of Educational Technology 42, 4 (2011), 687–700. https://doi.org/10.1111/j.1467-8535.2010.01059.x arXiv:https://bera-journals.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8535.2010.01059.xGoogle ScholarGoogle ScholarCross RefCross Ref
  114. Su-Fang Yeh, Meng-Hsin Wu, Tze-Yu Chen, Yen-Chun Lin, XiJing Chang, You-Hsuan Chiang, and Yung-Ju Chang. 2022. How to Guide Task-Oriented Chatbot Users, and When: A Mixed-Methods Study of Combinations of Chatbot Guidance Types and Timings. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 488, 16 pages. https://doi.org/10.1145/3491102.3501941Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Zheng-Jun Zha, Linjun Yang, Tao Mei, Meng Wang, and Zengfu Wang. 2009. Visual Query Suggestion. In Proceedings of the 17th ACM International Conference on Multimedia (Beijing, China) (MM ’09). Association for Computing Machinery, New York, NY, USA, 15–24. https://doi.org/10.1145/1631272.1631278Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip H. S. Torr. 2015. Conditional Random Fields as Recurrent Neural Networks. In 2015 IEEE International Conference on Computer Vision (ICCV). 1529–1537. https://doi.org/10.1109/ICCV.2015.179Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Enhancing Peer Review with AI-Powered Suggestion Generation Assistance: Investigating the Design Dynamics

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        IUI '24: Proceedings of the 29th International Conference on Intelligent User Interfaces
        March 2024
        955 pages
        ISBN:9798400705083
        DOI:10.1145/3640543

        Copyright © 2024 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 April 2024

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate746of2,811submissions,27%
      • Article Metrics

        • Downloads (Last 12 months)75
        • Downloads (Last 6 weeks)75

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format