Abstract
Solid research depends on systematic, verifiable and repeatable scientometric analysis. However, scientometric analysis is difficult in the current research landscape characterized by the increasing number of publications per year, intersections between research domains, and the diversity of stakeholders involved in research projects. To address this problem, we propose SciCrowd, a hybrid human–AI mixed-initiative system, which supports the collaboration between Artificial Intelligence services and crowdsourcing services. This work discusses the design and evaluation of SciCrowd. The evaluation is focused on attitudes, concerns and intentions towards use. This study contributes a nuanced understanding of the interplay between algorithmic and human tasks in the process of conducting scientometric analysis.
Similar content being viewed by others
Notes
The questionnaire is available at: https://forms.gle/hQaPLMo1PDZWqCU46.
References
Antunes P, Johnstone D, Hoang Thuan N, de Vreede GJ (2022) Delivering evidence-based management services: rising to the challenge using design science. Knowl Manag Res Pract 1:1–16
Armentano MG, Godoy D, Campo M, Amandi A (2014) NLP-based faceted search: experience in the development of a science and technology search engine. Expert Syst Appl 41(6):2886–2896
Bansal G, Nushi B, Kamar E, Weld DS, Lasecki WS, Horvitz E (2019) Updates in human–AI teams: understanding and addressing the performance/compatibility tradeoff. Proc AAAI Conf Artif Intell 33(1):2429–2437
Beck S, Brasseur TM, Poetz M, Sauermann H (2022) Crowdsourcing research questions in science. Res Policy 51(4):104491
Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pp 3613–3618
Berente N, Seidel S, Safadi H (2019) Data-driven computationally intensive theory development. Inf Syst Res 30(1):50–64
Biermann OC, Ma NF, Yoon D (2022) From tool to companion: storywriters want AI writers to respect their personal values and writing strategies. In: Proceedings of the designing interactive systems conference, pp 1209–1227
Blesik T, Bick M, Kummer TF (2021) A conceptualisation of crowd knowledge. Inf Syst Front 1:1–19
Bornmann L (2014) Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics. J Informet 8(4):895–903
Chan J, Chang JC, Hope T, Shahaf D, Kittur A (2018) SOLVENT: a mixed initiative system for finding analogies between research papers. Proc ACM Hum Comput Interact 2:1–21
Chilton LB, Little G, Edge D, Weld DS, Landay JA (2013) Cascade: crowdsourcing taxonomy creation. In: Proceedings of the CHI conference on human factors in computing systems, pp 1999–2008
Corbin JM, Strauss A (1990) Grounded theory research: procedures, canons, and evaluative criteria. Qual Sociol 13(1):3–21
Correia A, Paredes H, Fonseca B (2018) Scientometric analysis of scientific publications in CSCW. Scientometrics 114(1):31–89
Correia A, Lindley S (2022) Collaboration in relation to human–AI systems: Status, trends, and impact. In: Proceedings of the 2022 IEEE international conference on big data, pp 3417–3422
Correia A, Fonseca B, Paredes H, Schneider D, Jameel S (2019) Development of a crowd-powered system architecture for knowledge discovery in scientific domains. In: Proceedings of the 2019 IEEE international conference on systems, man, and cybernetics, pp 1372–1377
Correia A, Jameel S, Schneider D, Paredes H, Fonseca B (2020) A workflow-based methodological framework for hybrid human–AI enabled scientometrics. In: Proceedings of the 2020 IEEE international conference on big data, pp 2876–2883
Correia A, Guimarães D, Paulino D, Jameel S, Schneider D, Fonseca B, Paredes H (2021) AuthCrowd: author name disambiguation and entity matching using crowdsourcing. In: Proceedings of the IEEE 24th international conference on computer supported cooperative work in design, pp 150–155
Daniel F, Kucherbaev P, Cappiello C, Benatallah B, Allahbakhsh M (2018) Quality control in crowdsourcing: a survey of quality attributes, assessment techniques, and assurance actions. ACM Comput Surv 51(1):1–40
De la Vega Hernández IM, Urdaneta AS, Carayannis E (2023) Global bibliometric mapping of the frontier of knowledge in the field of artificial intelligence for the period 1990–2019. Artif Intell Rev 56(2):1699–1729
Dhamala J, Sun T, Kumar V, Krishna S, Pruksachatkun Y, Chang KW, Gupta R (2021) Bold: dataset and metrics for measuring biases in open-ended language generation. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 862–872
Doré JC, Dutheuil C, Miquel JF (2000) Multidimensional analysis of trends in patent activity. Scientometrics 47(3):475–492
Du W, Ding S (2021) A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artif Intell Rev 54(5):3215–3238
Dwivedi YK, Rana NP, Jeyaraj A, Clement M, Williams MD (2019) Re-examining the unified theory of acceptance and use of technology (UTAUT): towards a revised theoretical model. Inf Syst Front 21(3):719–734
Ehsan U, Liao QV, Muller M, Riedl MO, Weisz JD (2021) Expanding explainability: towards social transparency in AI systems. In: Proceedings of the 2021 CHI conference on human factors in computing systems, pp 1–19
Eickhoff C (2018) Cognitive biases in crowdsourcing. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp 162–170
Engström E, Storey MA, Runeson P, Höst M, Baldassarre MT (2020) How software engineering research aligns with design science: a review. Empir Softw Eng 25:2630–2660
Evans JA, Rzhetsky A (2010) Machine science. Science 329(5990):399–400
Ferrara A, Salini S (2012) Ten challenges in modeling bibliographic data for bibliometric analysis. Scientometrics 93(3):765–785
Feuston JL, Brubaker JR (2021) Putting tools in their place: the role of time and perspective in human–AI collaboration for qualitative analysis. Proc ACM Hum Comput Interact 5(CSCW2):1–25
Floridi L, Chiriatti M (2020) GPT-3: its nature, scope, limits, and consequences. Mind Mach 30:681–694
Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B, Vespignani A, Waltman L, Wang D, Barabási A-L (2018) Science of science. Science 359(6379):e0185
Frame JD (1984) Multidimensionality is alive and well in applied statistics. Scientometrics 6(2):97–101
Franzoni C, Sauermann H (2014) Crowd science: the organization of scientific research in open collaborative projects. Res Policy 43(1):1–20
Gadiraju U, Demartini G, Kawase R, Dietze S (2015) Human beyond the machine: challenges and opportunities of microtask crowdsourcing. IEEE Intell Syst 30(4):81–85
Garfield E (1979) Scientometrics comes to age. Curr Contents 46:5–10
Gero KI, Chilton LB (2019) Metaphoria: an algorithmic companion for metaphor creation. In Proceedings of the 2019 CHI conference on human factors in computing systems, pp 1–12
Gil Y, Greaves M, Hendler J, Hirsh H (2014) Amplify scientific discovery with artificial intelligence. Science 346(6206):171–172
Hevner AR, March ST, Park J, Ram S (2004) Design science in information systems research. MIS Q 1:75–105
Hope T, Downey D, Etzioni O, Weld DS, Horvitz E (2022) A computational inflection for scientific discovery. http://arxiv.org/abs/2205.02007
Howe J (2006) The rise of crowdsourcing. Wired Mag 14(6):1–4
Iivari J (2017) Information system artefact or information system application: that is the question. Inf Syst J 27(6):753–774
Jackson CB, Østerlund C, Mugar G, Hassman KD, Crowston K (2015) Motivations for sustained participation in crowdsourcing: case studies of citizen science on the role of talk. In: Proceedings of the 48th Hawaii international conference on system sciences, pp 1624–1634
Jiang JA, Wade K, Fiesler C, Brubaker JR (2021) Supporting serendipity: opportunities and challenges for human–AI collaboration in qualitative analysis. Proc ACM Hum Comput Interact 5(CSCW1):1–23
Johnsson M, Gustafsson C, Johansson PE (2022) Disrupting the research process through artificial intelligence: towards a research agenda. Artif Intell Innov Manag 1:161–183
Jorge CC, Tielman ML, Jonker CM (2022) Artificial trust as a tool in human–AI teams. In: Proceedings of the 2022 ACM/IEEE international conference on human–robot interaction, pp 1155–1157
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Karimi P, Rezwana J, Siddiqui S, Maher ML, Dehbozorgi N (2020) Creative sketching partner: an analysis of human–AI co-creativity. In: Proceedings of the 25th international conference on intelligent user interfaces, pp 221–230
Karunagaran S, Mathew SK, Lehner F (2019) Differential cloud adoption: a comparative case study of large enterprises and SMEs in Germany. Inf Syst Front 21(4):861–875
Knox WB, Stone P (2009) Interactively shaping agents via human reinforcement: the TAMER framework. In: Proceedings of the 5th international conference on knowledge capture, pp 9–16
Koren J, Zhang Y, Liu X (2008) Personalized interactive faceted search. In: Proceedings of the 17th international conference on world wide web, pp 477–486
Krivosheev E, Casati F, Baez M, Benatallah B (2018) Combining crowd and machines for multi-predicate item screening. Proc ACM Hum Comput Interact 2:1–18
Ley M (2009) DBLP: some lessons learned. Proc VLDB Endowm 2(2):1493–1500
Liu B (2021) In AI we trust? Effects of agency locus and transparency on uncertainty reduction in human–AI interaction. J Comput-Mediat Commun 26(6):384–402
Lukyanenko R, Wiggins A, Rosser HK (2020) Citizen science: an information quality research frontier. Inf Syst Front 22(4):961–983
Luz N, Silva N, Novais P (2015) A survey of task-oriented crowdsourcing. Artif Intell Rev 44(2):187–213
Ma S, Zhang C, Liu X (2020) A review of citation recommendation: from textual content to enriched context. Scientometrics 122(3):1445–1472
Micchi G, Bigo L, Giraud M, Groult R, Levé F (2021) I keep counting: an experiment in human/AI co-creative songwriting. Trans Int Soc Music Inf Retriev 4(1):263–275
Mittleman DD, Briggs RO, Murphy J, Davis A (2008) Toward a taxonomy of groupware technologies. In: Proceedings of the 14th International Workshop on Groupware: Design, Implementation, and Use, pp 305–317
Nakagawa S, Samarasinghe G, Haddaway NR, Westgate MJ, O’Dea RE, Noble DW, Lagisz M (2019) Research weaving: visualizing the future of research synthesis. Trends Ecol Evol 34(3):224–238
Noel-Storr AH, Redmond P, Lamé G, Liberati E, Kelly S, Miller L, Dooley G, Paterson A, Burt J (2021) Crowdsourcing citation-screening in a mixed-studies systematic review: a feasibility study. BMC Med Res Methodol 21(1):1–10
Peeters MM, van Diggelen J, Van Den Bosch K, Bronkhorst A, Neerincx MA, Schraagen JM, Raaijmakers S (2021) Hybrid collective intelligence in a human–AI society. AI Soc 36(1):217–238
Peffers K, Tuunanen T, Rothenberger MA, Chatterjee S (2007) A design science research methodology for information systems research. J Manag Inf Syst 24(3):45–77
Price S, Flach PA (2017) Computational support for academic peer review: a perspective from artificial intelligence. Commun ACM 60(3):70–79
Rohde M, Stevens G, Brödner P, Wulf V (2009) Towards a paradigmatic shift in IS: Designing for social practice. In: Proceedings of the 4th international conference on design science research in information systems and technology, pp 1–11
Rosser H, Wiggins A (2019) Crowds and camera traps: genres in online citizen science projects. In: Proceedings of the 52nd Hawaii international conference on system sciences, pp 5289–5298
Rzeszotarski J, Kittur A (2012) CrowdScape: interactively visualizing user behavior and output. In: Proceedings of the 25th annual ACM symposium on user interface software and technology, pp 55–62
Sanyal DK, Bhowmick PK, Das PP (2021) A review of author name disambiguation techniques for the PubMed bibliographic database. J Inf Sci 47(2):227–254
Schmiedel T, Müller O, Vom Brocke J (2019) Topic modeling as a strategy of inquiry in organizational research: a tutorial with an application example on organizational culture. Organ Res Methods 22(4):941–968
Schroder A, Constantiou I, Tuunainen VK, Austin RD (2022) Human–AI collaboration: coordinating automation and augmentation tasks in a digital service company. In: Proceedings of the 55th Hawaii international conference on system sciences, pp 206–215
Seeber I, Bittner E, Briggs RO, de Vreede T, de Vreede GJ, Elkins A, Maier R, Merz AB, Oeste-Reiß S, Randrup N, Schwabe G, Söllner M (2020) Machines as teammates: a research agenda on AI in team collaboration. Inf Manag 57(2):103174
Shneiderman B (1996) The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the 1996 IEEE symposium on visual languages, pp 336–343
Singh S, Jain S, Jha, S. S. (2023). On subset selection of multiple humans to improve human–AI team accuracy. In: Proceedings of the 2023 international conference on autonomous agents and multiagent systems, pp 317–325
Suh M, Youngblom E, Terry M, Cai CJ (2021) AI as social glue: uncovering the roles of deep generative AI during social music composition. In Proceedings of the 2021 CHI conference on human factors in computing systems, pp 1–11
Swanson DR, Smalheiser NR (1997) An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif Intell 91(2):183–203
Tchoua RB, Chard K, Audus DJ, Ward LT, Lequieu J, De Pablo JJ, Foster IT (2017) Towards a hybrid human-computer scientific information extraction pipeline. In: Proceedings of the 2017 IEEE 13th international conference on e-science, pp 109–118
Thilakaratne M, Falkner K, Atapattu T (2019) A systematic review on literature-based discovery: general overview, methodology, & statistical analysis. ACM Comput Surv 52(6):1–34
Thomas J, Zaytseva A (2016) Mapping complexity/human knowledge as a complex adaptive system. Complexity 21(S2):207–234
Tokarchuk O, Cuel R, Zamarian M (2012) Analyzing crowd labor and designing incentives for humans in the loop. IEEE Internet Comput 16(5):45–51
Vincent-Lamarre P, Larivière V (2023) Are self-citations a normal feature of knowledge accumulation? http://arxiv.org/abs/2303.02667
Vinella FL, Hu J, Lykourentzou I, Masthoff J (2022) Crowdsourcing team formation with worker-centered modeling. Front Artif Intell 102:1–10
Vössing M, Kühl N, Lind M, Satzger G (2022) Designing transparency for effective human–AI collaboration. Inf Syst Front 24:877–895
Wagner G, Lukyanenko R, Paré G (2022) Artificial intelligence and the conduct of literature reviews. J Inf Technol 37(2):209–226
Waltz D, Buchanan BG (2009) Automating science. Science 324(5923):43–44
Wang S, Koopman R (2017) Clustering articles based on semantic similarity. Scientometrics 111(2):1017–1031
Wang W, Jiang X, Tian S, Liu P, Dang D, Su Y, Lookman T, Xie J (2022) Automated pipeline for superalloy data by text mining. NPJ Comput Mater 8(1):1–12
Wiethof C, Bittner EA (2022) Toward a hybrid intelligence system in customer service: collaborative learning of human and AI. In: Proceedings of the 30th European conference on information systems, 66.
Yang Q, Steinfeld A, Rosé C, Zimmerman J (2020) Re-examining whether, why, and how human–AI interaction is uniquely difficult to design. In: Proceedings of the 2020 CHI conference on human factors in computing systems, pp 1–13
Zhang J, Yu W (2020) Early detection of technology opportunity based on analogy design and phrase semantic representation. Scientometrics 125(1):551–576
Zhang R, McNeese NJ, Freeman G, Musick G (2021) “An ideal human”: expectations of AI teammates in human–AI teaming. Proc ACM Hum Comput Interact 4(CSCW3):1–25
Acknowledgements
This research was mainly performed during an internship of António Correia at Microsoft Research, Cambridge, UK. The work was supported in part by the Portuguese Foundation for Science and Technology (FCT), national funding through the individual research Grant SFRH/BD/136211/2018. The authors would like to thank Siân Lindley from Microsoft Research for the important role in understanding and modifying the human–AI scientometric workflow that supports the SciCrowd system, as well as Jorge Santos for the help while building the necessary infrastructure. Our thanks extend to Hugo Paredes for the helpful discussions and valuable insights in the early stages of this work.
Author information
Authors and Affiliations
Contributions
Conceptualization, investigation, methodology, formal analysis & writing–original draft and revised version, A.C.; writing–review and editing, A.G., P.A., S.J., and D.S.; P.A. prepared Figs. 2, 3, and 5; supervision & validation, A.G. and B.F. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors of this manuscript have no conflicts of interest or competing interests to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Correia, A., Grover, A., Jameel, S. et al. A hybrid human–AI tool for scientometric analysis. Artif Intell Rev 56 (Suppl 1), 983–1010 (2023). https://doi.org/10.1007/s10462-023-10548-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-023-10548-7