A hybrid human–AI tool for scientometric analysis

Correia, António; Grover, Andrea; Jameel, Shoaib; Schneider, Daniel; Antunes, Pedro; Fonseca, Benjamim

doi:10.1007/s10462-023-10548-7

A hybrid human–AI tool for scientometric analysis

Published: 12 July 2023

Volume 56, pages 983–1010, (2023)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

António Correia^1,2,
Andrea Grover²,
Shoaib Jameel³,
Daniel Schneider⁴,
Pedro Antunes⁵ &
…
Benjamim Fonseca¹

995 Accesses
1 Citation
Explore all metrics

Abstract

Solid research depends on systematic, verifiable and repeatable scientometric analysis. However, scientometric analysis is difficult in the current research landscape characterized by the increasing number of publications per year, intersections between research domains, and the diversity of stakeholders involved in research projects. To address this problem, we propose SciCrowd, a hybrid human–AI mixed-initiative system, which supports the collaboration between Artificial Intelligence services and crowdsourcing services. This work discusses the design and evaluation of SciCrowd. The evaluation is focused on attitudes, concerns and intentions towards use. This study contributes a nuanced understanding of the interplay between algorithmic and human tasks in the process of conducting scientometric analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting Classical Bibliometrics of CSCW: Classification, Evaluation, Limitations, and the Odds of Semantic Analytics

A Sciento-text framework to characterize research strength of institutions at fine-grained thematic area level

Article 21 January 2016

A Survey of Informetric Methods and Technologies

Article 15 May 2019

Notes

https://github.com/trrproject/SciCrowd.
The questionnaire is available at: https://forms.gle/hQaPLMo1PDZWqCU46.

References

Antunes P, Johnstone D, Hoang Thuan N, de Vreede GJ (2022) Delivering evidence-based management services: rising to the challenge using design science. Knowl Manag Res Pract 1:1–16
Google Scholar
Armentano MG, Godoy D, Campo M, Amandi A (2014) NLP-based faceted search: experience in the development of a science and technology search engine. Expert Syst Appl 41(6):2886–2896
Google Scholar
Bansal G, Nushi B, Kamar E, Weld DS, Lasecki WS, Horvitz E (2019) Updates in human–AI teams: understanding and addressing the performance/compatibility tradeoff. Proc AAAI Conf Artif Intell 33(1):2429–2437
Google Scholar
Beck S, Brasseur TM, Poetz M, Sauermann H (2022) Crowdsourcing research questions in science. Res Policy 51(4):104491
Google Scholar
Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pp 3613–3618
Berente N, Seidel S, Safadi H (2019) Data-driven computationally intensive theory development. Inf Syst Res 30(1):50–64
Google Scholar
Biermann OC, Ma NF, Yoon D (2022) From tool to companion: storywriters want AI writers to respect their personal values and writing strategies. In: Proceedings of the designing interactive systems conference, pp 1209–1227
Blesik T, Bick M, Kummer TF (2021) A conceptualisation of crowd knowledge. Inf Syst Front 1:1–19
Google Scholar
Bornmann L (2014) Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics. J Informet 8(4):895–903
Google Scholar
Chan J, Chang JC, Hope T, Shahaf D, Kittur A (2018) SOLVENT: a mixed initiative system for finding analogies between research papers. Proc ACM Hum Comput Interact 2:1–21
Google Scholar
Chilton LB, Little G, Edge D, Weld DS, Landay JA (2013) Cascade: crowdsourcing taxonomy creation. In: Proceedings of the CHI conference on human factors in computing systems, pp 1999–2008
Corbin JM, Strauss A (1990) Grounded theory research: procedures, canons, and evaluative criteria. Qual Sociol 13(1):3–21
Google Scholar
Correia A, Paredes H, Fonseca B (2018) Scientometric analysis of scientific publications in CSCW. Scientometrics 114(1):31–89
Google Scholar
Correia A, Lindley S (2022) Collaboration in relation to human–AI systems: Status, trends, and impact. In: Proceedings of the 2022 IEEE international conference on big data, pp 3417–3422
Correia A, Fonseca B, Paredes H, Schneider D, Jameel S (2019) Development of a crowd-powered system architecture for knowledge discovery in scientific domains. In: Proceedings of the 2019 IEEE international conference on systems, man, and cybernetics, pp 1372–1377
Correia A, Jameel S, Schneider D, Paredes H, Fonseca B (2020) A workflow-based methodological framework for hybrid human–AI enabled scientometrics. In: Proceedings of the 2020 IEEE international conference on big data, pp 2876–2883
Correia A, Guimarães D, Paulino D, Jameel S, Schneider D, Fonseca B, Paredes H (2021) AuthCrowd: author name disambiguation and entity matching using crowdsourcing. In: Proceedings of the IEEE 24th international conference on computer supported cooperative work in design, pp 150–155
Daniel F, Kucherbaev P, Cappiello C, Benatallah B, Allahbakhsh M (2018) Quality control in crowdsourcing: a survey of quality attributes, assessment techniques, and assurance actions. ACM Comput Surv 51(1):1–40
Google Scholar
De la Vega Hernández IM, Urdaneta AS, Carayannis E (2023) Global bibliometric mapping of the frontier of knowledge in the field of artificial intelligence for the period 1990–2019. Artif Intell Rev 56(2):1699–1729
Google Scholar
Dhamala J, Sun T, Kumar V, Krishna S, Pruksachatkun Y, Chang KW, Gupta R (2021) Bold: dataset and metrics for measuring biases in open-ended language generation. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 862–872
Doré JC, Dutheuil C, Miquel JF (2000) Multidimensional analysis of trends in patent activity. Scientometrics 47(3):475–492
Google Scholar
Du W, Ding S (2021) A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artif Intell Rev 54(5):3215–3238
Google Scholar
Dwivedi YK, Rana NP, Jeyaraj A, Clement M, Williams MD (2019) Re-examining the unified theory of acceptance and use of technology (UTAUT): towards a revised theoretical model. Inf Syst Front 21(3):719–734
Google Scholar
Ehsan U, Liao QV, Muller M, Riedl MO, Weisz JD (2021) Expanding explainability: towards social transparency in AI systems. In: Proceedings of the 2021 CHI conference on human factors in computing systems, pp 1–19
Eickhoff C (2018) Cognitive biases in crowdsourcing. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp 162–170
Engström E, Storey MA, Runeson P, Höst M, Baldassarre MT (2020) How software engineering research aligns with design science: a review. Empir Softw Eng 25:2630–2660
Google Scholar
Evans JA, Rzhetsky A (2010) Machine science. Science 329(5990):399–400
Google Scholar
Ferrara A, Salini S (2012) Ten challenges in modeling bibliographic data for bibliometric analysis. Scientometrics 93(3):765–785
Google Scholar
Feuston JL, Brubaker JR (2021) Putting tools in their place: the role of time and perspective in human–AI collaboration for qualitative analysis. Proc ACM Hum Comput Interact 5(CSCW2):1–25
Google Scholar
Floridi L, Chiriatti M (2020) GPT-3: its nature, scope, limits, and consequences. Mind Mach 30:681–694
Google Scholar
Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B, Vespignani A, Waltman L, Wang D, Barabási A-L (2018) Science of science. Science 359(6379):e0185
Google Scholar
Frame JD (1984) Multidimensionality is alive and well in applied statistics. Scientometrics 6(2):97–101
Google Scholar
Franzoni C, Sauermann H (2014) Crowd science: the organization of scientific research in open collaborative projects. Res Policy 43(1):1–20
Google Scholar
Gadiraju U, Demartini G, Kawase R, Dietze S (2015) Human beyond the machine: challenges and opportunities of microtask crowdsourcing. IEEE Intell Syst 30(4):81–85
Google Scholar
Garfield E (1979) Scientometrics comes to age. Curr Contents 46:5–10
Google Scholar
Gero KI, Chilton LB (2019) Metaphoria: an algorithmic companion for metaphor creation. In Proceedings of the 2019 CHI conference on human factors in computing systems, pp 1–12
Gil Y, Greaves M, Hendler J, Hirsh H (2014) Amplify scientific discovery with artificial intelligence. Science 346(6206):171–172
Google Scholar
Hevner AR, March ST, Park J, Ram S (2004) Design science in information systems research. MIS Q 1:75–105
Google Scholar
Hope T, Downey D, Etzioni O, Weld DS, Horvitz E (2022) A computational inflection for scientific discovery. http://arxiv.org/abs/2205.02007
Howe J (2006) The rise of crowdsourcing. Wired Mag 14(6):1–4
Google Scholar
Iivari J (2017) Information system artefact or information system application: that is the question. Inf Syst J 27(6):753–774
Google Scholar
Jackson CB, Østerlund C, Mugar G, Hassman KD, Crowston K (2015) Motivations for sustained participation in crowdsourcing: case studies of citizen science on the role of talk. In: Proceedings of the 48th Hawaii international conference on system sciences, pp 1624–1634
Jiang JA, Wade K, Fiesler C, Brubaker JR (2021) Supporting serendipity: opportunities and challenges for human–AI collaboration in qualitative analysis. Proc ACM Hum Comput Interact 5(CSCW1):1–23
Google Scholar
Johnsson M, Gustafsson C, Johansson PE (2022) Disrupting the research process through artificial intelligence: towards a research agenda. Artif Intell Innov Manag 1:161–183
Google Scholar
Jorge CC, Tielman ML, Jonker CM (2022) Artificial trust as a tool in human–AI teams. In: Proceedings of the 2022 ACM/IEEE international conference on human–robot interaction, pp 1155–1157
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Google Scholar
Karimi P, Rezwana J, Siddiqui S, Maher ML, Dehbozorgi N (2020) Creative sketching partner: an analysis of human–AI co-creativity. In: Proceedings of the 25th international conference on intelligent user interfaces, pp 221–230
Karunagaran S, Mathew SK, Lehner F (2019) Differential cloud adoption: a comparative case study of large enterprises and SMEs in Germany. Inf Syst Front 21(4):861–875
Google Scholar
Knox WB, Stone P (2009) Interactively shaping agents via human reinforcement: the TAMER framework. In: Proceedings of the 5th international conference on knowledge capture, pp 9–16
Koren J, Zhang Y, Liu X (2008) Personalized interactive faceted search. In: Proceedings of the 17th international conference on world wide web, pp 477–486
Krivosheev E, Casati F, Baez M, Benatallah B (2018) Combining crowd and machines for multi-predicate item screening. Proc ACM Hum Comput Interact 2:1–18
Google Scholar
Ley M (2009) DBLP: some lessons learned. Proc VLDB Endowm 2(2):1493–1500
Google Scholar
Liu B (2021) In AI we trust? Effects of agency locus and transparency on uncertainty reduction in human–AI interaction. J Comput-Mediat Commun 26(6):384–402
Google Scholar
Lukyanenko R, Wiggins A, Rosser HK (2020) Citizen science: an information quality research frontier. Inf Syst Front 22(4):961–983
Google Scholar
Luz N, Silva N, Novais P (2015) A survey of task-oriented crowdsourcing. Artif Intell Rev 44(2):187–213
Google Scholar
Ma S, Zhang C, Liu X (2020) A review of citation recommendation: from textual content to enriched context. Scientometrics 122(3):1445–1472
Google Scholar
Micchi G, Bigo L, Giraud M, Groult R, Levé F (2021) I keep counting: an experiment in human/AI co-creative songwriting. Trans Int Soc Music Inf Retriev 4(1):263–275
Google Scholar
Mittleman DD, Briggs RO, Murphy J, Davis A (2008) Toward a taxonomy of groupware technologies. In: Proceedings of the 14th International Workshop on Groupware: Design, Implementation, and Use, pp 305–317
Nakagawa S, Samarasinghe G, Haddaway NR, Westgate MJ, O’Dea RE, Noble DW, Lagisz M (2019) Research weaving: visualizing the future of research synthesis. Trends Ecol Evol 34(3):224–238
Google Scholar
Noel-Storr AH, Redmond P, Lamé G, Liberati E, Kelly S, Miller L, Dooley G, Paterson A, Burt J (2021) Crowdsourcing citation-screening in a mixed-studies systematic review: a feasibility study. BMC Med Res Methodol 21(1):1–10
Google Scholar
Peeters MM, van Diggelen J, Van Den Bosch K, Bronkhorst A, Neerincx MA, Schraagen JM, Raaijmakers S (2021) Hybrid collective intelligence in a human–AI society. AI Soc 36(1):217–238
Google Scholar
Peffers K, Tuunanen T, Rothenberger MA, Chatterjee S (2007) A design science research methodology for information systems research. J Manag Inf Syst 24(3):45–77
Google Scholar
Price S, Flach PA (2017) Computational support for academic peer review: a perspective from artificial intelligence. Commun ACM 60(3):70–79
Google Scholar
Rohde M, Stevens G, Brödner P, Wulf V (2009) Towards a paradigmatic shift in IS: Designing for social practice. In: Proceedings of the 4th international conference on design science research in information systems and technology, pp 1–11
Rosser H, Wiggins A (2019) Crowds and camera traps: genres in online citizen science projects. In: Proceedings of the 52nd Hawaii international conference on system sciences, pp 5289–5298
Rzeszotarski J, Kittur A (2012) CrowdScape: interactively visualizing user behavior and output. In: Proceedings of the 25th annual ACM symposium on user interface software and technology, pp 55–62
Sanyal DK, Bhowmick PK, Das PP (2021) A review of author name disambiguation techniques for the PubMed bibliographic database. J Inf Sci 47(2):227–254
Google Scholar
Schmiedel T, Müller O, Vom Brocke J (2019) Topic modeling as a strategy of inquiry in organizational research: a tutorial with an application example on organizational culture. Organ Res Methods 22(4):941–968
Google Scholar
Schroder A, Constantiou I, Tuunainen VK, Austin RD (2022) Human–AI collaboration: coordinating automation and augmentation tasks in a digital service company. In: Proceedings of the 55th Hawaii international conference on system sciences, pp 206–215
Seeber I, Bittner E, Briggs RO, de Vreede T, de Vreede GJ, Elkins A, Maier R, Merz AB, Oeste-Reiß S, Randrup N, Schwabe G, Söllner M (2020) Machines as teammates: a research agenda on AI in team collaboration. Inf Manag 57(2):103174
Google Scholar
Shneiderman B (1996) The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the 1996 IEEE symposium on visual languages, pp 336–343
Singh S, Jain S, Jha, S. S. (2023). On subset selection of multiple humans to improve human–AI team accuracy. In: Proceedings of the 2023 international conference on autonomous agents and multiagent systems, pp 317–325
Suh M, Youngblom E, Terry M, Cai CJ (2021) AI as social glue: uncovering the roles of deep generative AI during social music composition. In Proceedings of the 2021 CHI conference on human factors in computing systems, pp 1–11
Swanson DR, Smalheiser NR (1997) An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif Intell 91(2):183–203
MATH Google Scholar
Tchoua RB, Chard K, Audus DJ, Ward LT, Lequieu J, De Pablo JJ, Foster IT (2017) Towards a hybrid human-computer scientific information extraction pipeline. In: Proceedings of the 2017 IEEE 13th international conference on e-science, pp 109–118
Thilakaratne M, Falkner K, Atapattu T (2019) A systematic review on literature-based discovery: general overview, methodology, & statistical analysis. ACM Comput Surv 52(6):1–34
Google Scholar
Thomas J, Zaytseva A (2016) Mapping complexity/human knowledge as a complex adaptive system. Complexity 21(S2):207–234
MathSciNet Google Scholar
Tokarchuk O, Cuel R, Zamarian M (2012) Analyzing crowd labor and designing incentives for humans in the loop. IEEE Internet Comput 16(5):45–51
Google Scholar
Vincent-Lamarre P, Larivière V (2023) Are self-citations a normal feature of knowledge accumulation? http://arxiv.org/abs/2303.02667
Vinella FL, Hu J, Lykourentzou I, Masthoff J (2022) Crowdsourcing team formation with worker-centered modeling. Front Artif Intell 102:1–10
Google Scholar
Vössing M, Kühl N, Lind M, Satzger G (2022) Designing transparency for effective human–AI collaboration. Inf Syst Front 24:877–895
Google Scholar
Wagner G, Lukyanenko R, Paré G (2022) Artificial intelligence and the conduct of literature reviews. J Inf Technol 37(2):209–226
Google Scholar
Waltz D, Buchanan BG (2009) Automating science. Science 324(5923):43–44
Google Scholar
Wang S, Koopman R (2017) Clustering articles based on semantic similarity. Scientometrics 111(2):1017–1031
Google Scholar
Wang W, Jiang X, Tian S, Liu P, Dang D, Su Y, Lookman T, Xie J (2022) Automated pipeline for superalloy data by text mining. NPJ Comput Mater 8(1):1–12
Google Scholar
Wiethof C, Bittner EA (2022) Toward a hybrid intelligence system in customer service: collaborative learning of human and AI. In: Proceedings of the 30th European conference on information systems, 66.
Yang Q, Steinfeld A, Rosé C, Zimmerman J (2020) Re-examining whether, why, and how human–AI interaction is uniquely difficult to design. In: Proceedings of the 2020 CHI conference on human factors in computing systems, pp 1–13
Zhang J, Yu W (2020) Early detection of technology opportunity based on analogy design and phrase semantic representation. Scientometrics 125(1):551–576
Google Scholar
Zhang R, McNeese NJ, Freeman G, Musick G (2021) “An ideal human”: expectations of AI teammates in human–AI teaming. Proc ACM Hum Comput Interact 4(CSCW3):1–25
Google Scholar

Download references

Acknowledgements

This research was mainly performed during an internship of António Correia at Microsoft Research, Cambridge, UK. The work was supported in part by the Portuguese Foundation for Science and Technology (FCT), national funding through the individual research Grant SFRH/BD/136211/2018. The authors would like to thank Siân Lindley from Microsoft Research for the important role in understanding and modifying the human–AI scientometric workflow that supports the SciCrowd system, as well as Jorge Santos for the help while building the necessary infrastructure. Our thanks extend to Hugo Paredes for the helpful discussions and valuable insights in the early stages of this work.

Author information

Authors and Affiliations

INESC TEC and University of Trás-os-Montes e Alto Douro, UTAD, Quinta de Prados, Apartado 1013, Vila Real, Portugal
António Correia & Benjamim Fonseca
College of Information Science & Technology, University of Nebraska at Omaha, Omaha, NE, 68182, USA
António Correia & Andrea Grover
University of Southampton, Southampton, SO17 1BJ, UK
Shoaib Jameel
Tércio Pacitti Institute of Computer Applications and Research (NCE), Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Daniel Schneider
LASIGE and University of Lisbon, 1749-016, Lisbon, Portugal
Pedro Antunes

Authors

António Correia
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Grover
View author publications
You can also search for this author in PubMed Google Scholar
Shoaib Jameel
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Schneider
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Antunes
View author publications
You can also search for this author in PubMed Google Scholar
Benjamim Fonseca
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, investigation, methodology, formal analysis & writing–original draft and revised version, A.C.; writing–review and editing, A.G., P.A., S.J., and D.S.; P.A. prepared Figs. 2, 3, and 5; supervision & validation, A.G. and B.F. All authors reviewed the manuscript.

Corresponding author

Correspondence to António Correia.

Ethics declarations

Conflict of interest

The authors of this manuscript have no conflicts of interest or competing interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Correia, A., Grover, A., Jameel, S. et al. A hybrid human–AI tool for scientometric analysis. Artif Intell Rev 56 (Suppl 1), 983–1010 (2023). https://doi.org/10.1007/s10462-023-10548-7

Download citation

Accepted: 01 July 2023
Published: 12 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10462-023-10548-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid human–AI tool for scientometric analysis

Abstract

Access this article

Similar content being viewed by others

Exploiting Classical Bibliometrics of CSCW: Classification, Evaluation, Limitations, and the Odds of Semantic Analytics

A Sciento-text framework to characterize research strength of institutions at fine-grained thematic area level

A Survey of Informetric Methods and Technologies

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid human–AI tool for scientometric analysis

Abstract

Access this article

Similar content being viewed by others

Exploiting Classical Bibliometrics of CSCW: Classification, Evaluation, Limitations, and the Odds of Semantic Analytics

A Sciento-text framework to characterize research strength of institutions at fine-grained thematic area level

A Survey of Informetric Methods and Technologies

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation