Skip to main content

Advertisement

Log in

A hybrid human–AI tool for scientometric analysis

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Solid research depends on systematic, verifiable and repeatable scientometric analysis. However, scientometric analysis is difficult in the current research landscape characterized by the increasing number of publications per year, intersections between research domains, and the diversity of stakeholders involved in research projects. To address this problem, we propose SciCrowd, a hybrid human–AI mixed-initiative system, which supports the collaboration between Artificial Intelligence services and crowdsourcing services. This work discusses the design and evaluation of SciCrowd. The evaluation is focused on attitudes, concerns and intentions towards use. This study contributes a nuanced understanding of the interplay between algorithmic and human tasks in the process of conducting scientometric analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://github.com/trrproject/SciCrowd.

  2. The questionnaire is available at: https://forms.gle/hQaPLMo1PDZWqCU46.

References

  • Antunes P, Johnstone D, Hoang Thuan N, de Vreede GJ (2022) Delivering evidence-based management services: rising to the challenge using design science. Knowl Manag Res Pract 1:1–16

    Google Scholar 

  • Armentano MG, Godoy D, Campo M, Amandi A (2014) NLP-based faceted search: experience in the development of a science and technology search engine. Expert Syst Appl 41(6):2886–2896

    Google Scholar 

  • Bansal G, Nushi B, Kamar E, Weld DS, Lasecki WS, Horvitz E (2019) Updates in human–AI teams: understanding and addressing the performance/compatibility tradeoff. Proc AAAI Conf Artif Intell 33(1):2429–2437

    Google Scholar 

  • Beck S, Brasseur TM, Poetz M, Sauermann H (2022) Crowdsourcing research questions in science. Res Policy 51(4):104491

    Google Scholar 

  • Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pp 3613–3618

  • Berente N, Seidel S, Safadi H (2019) Data-driven computationally intensive theory development. Inf Syst Res 30(1):50–64

    Google Scholar 

  • Biermann OC, Ma NF, Yoon D (2022) From tool to companion: storywriters want AI writers to respect their personal values and writing strategies. In: Proceedings of the designing interactive systems conference, pp 1209–1227

  • Blesik T, Bick M, Kummer TF (2021) A conceptualisation of crowd knowledge. Inf Syst Front 1:1–19

    Google Scholar 

  • Bornmann L (2014) Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics. J Informet 8(4):895–903

    Google Scholar 

  • Chan J, Chang JC, Hope T, Shahaf D, Kittur A (2018) SOLVENT: a mixed initiative system for finding analogies between research papers. Proc ACM Hum Comput Interact 2:1–21

    Google Scholar 

  • Chilton LB, Little G, Edge D, Weld DS, Landay JA (2013) Cascade: crowdsourcing taxonomy creation. In: Proceedings of the CHI conference on human factors in computing systems, pp 1999–2008

  • Corbin JM, Strauss A (1990) Grounded theory research: procedures, canons, and evaluative criteria. Qual Sociol 13(1):3–21

    Google Scholar 

  • Correia A, Paredes H, Fonseca B (2018) Scientometric analysis of scientific publications in CSCW. Scientometrics 114(1):31–89

    Google Scholar 

  • Correia A, Lindley S (2022) Collaboration in relation to human–AI systems: Status, trends, and impact. In: Proceedings of the 2022 IEEE international conference on big data, pp 3417–3422

  • Correia A, Fonseca B, Paredes H, Schneider D, Jameel S (2019) Development of a crowd-powered system architecture for knowledge discovery in scientific domains. In: Proceedings of the 2019 IEEE international conference on systems, man, and cybernetics, pp 1372–1377

  • Correia A, Jameel S, Schneider D, Paredes H, Fonseca B (2020) A workflow-based methodological framework for hybrid human–AI enabled scientometrics. In: Proceedings of the 2020 IEEE international conference on big data, pp 2876–2883

  • Correia A, Guimarães D, Paulino D, Jameel S, Schneider D, Fonseca B, Paredes H (2021) AuthCrowd: author name disambiguation and entity matching using crowdsourcing. In: Proceedings of the IEEE 24th international conference on computer supported cooperative work in design, pp 150–155

  • Daniel F, Kucherbaev P, Cappiello C, Benatallah B, Allahbakhsh M (2018) Quality control in crowdsourcing: a survey of quality attributes, assessment techniques, and assurance actions. ACM Comput Surv 51(1):1–40

    Google Scholar 

  • De la Vega Hernández IM, Urdaneta AS, Carayannis E (2023) Global bibliometric mapping of the frontier of knowledge in the field of artificial intelligence for the period 1990–2019. Artif Intell Rev 56(2):1699–1729

    Google Scholar 

  • Dhamala J, Sun T, Kumar V, Krishna S, Pruksachatkun Y, Chang KW, Gupta R (2021) Bold: dataset and metrics for measuring biases in open-ended language generation. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 862–872

  • Doré JC, Dutheuil C, Miquel JF (2000) Multidimensional analysis of trends in patent activity. Scientometrics 47(3):475–492

    Google Scholar 

  • Du W, Ding S (2021) A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artif Intell Rev 54(5):3215–3238

    Google Scholar 

  • Dwivedi YK, Rana NP, Jeyaraj A, Clement M, Williams MD (2019) Re-examining the unified theory of acceptance and use of technology (UTAUT): towards a revised theoretical model. Inf Syst Front 21(3):719–734

    Google Scholar 

  • Ehsan U, Liao QV, Muller M, Riedl MO, Weisz JD (2021) Expanding explainability: towards social transparency in AI systems. In: Proceedings of the 2021 CHI conference on human factors in computing systems, pp 1–19

  • Eickhoff C (2018) Cognitive biases in crowdsourcing. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp 162–170

  • Engström E, Storey MA, Runeson P, Höst M, Baldassarre MT (2020) How software engineering research aligns with design science: a review. Empir Softw Eng 25:2630–2660

    Google Scholar 

  • Evans JA, Rzhetsky A (2010) Machine science. Science 329(5990):399–400

    Google Scholar 

  • Ferrara A, Salini S (2012) Ten challenges in modeling bibliographic data for bibliometric analysis. Scientometrics 93(3):765–785

    Google Scholar 

  • Feuston JL, Brubaker JR (2021) Putting tools in their place: the role of time and perspective in human–AI collaboration for qualitative analysis. Proc ACM Hum Comput Interact 5(CSCW2):1–25

    Google Scholar 

  • Floridi L, Chiriatti M (2020) GPT-3: its nature, scope, limits, and consequences. Mind Mach 30:681–694

    Google Scholar 

  • Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B, Vespignani A, Waltman L, Wang D, Barabási A-L (2018) Science of science. Science 359(6379):e0185

    Google Scholar 

  • Frame JD (1984) Multidimensionality is alive and well in applied statistics. Scientometrics 6(2):97–101

    Google Scholar 

  • Franzoni C, Sauermann H (2014) Crowd science: the organization of scientific research in open collaborative projects. Res Policy 43(1):1–20

    Google Scholar 

  • Gadiraju U, Demartini G, Kawase R, Dietze S (2015) Human beyond the machine: challenges and opportunities of microtask crowdsourcing. IEEE Intell Syst 30(4):81–85

    Google Scholar 

  • Garfield E (1979) Scientometrics comes to age. Curr Contents 46:5–10

    Google Scholar 

  • Gero KI, Chilton LB (2019) Metaphoria: an algorithmic companion for metaphor creation. In Proceedings of the 2019 CHI conference on human factors in computing systems, pp 1–12

  • Gil Y, Greaves M, Hendler J, Hirsh H (2014) Amplify scientific discovery with artificial intelligence. Science 346(6206):171–172

    Google Scholar 

  • Hevner AR, March ST, Park J, Ram S (2004) Design science in information systems research. MIS Q 1:75–105

    Google Scholar 

  • Hope T, Downey D, Etzioni O, Weld DS, Horvitz E (2022) A computational inflection for scientific discovery. http://arxiv.org/abs/2205.02007

  • Howe J (2006) The rise of crowdsourcing. Wired Mag 14(6):1–4

    Google Scholar 

  • Iivari J (2017) Information system artefact or information system application: that is the question. Inf Syst J 27(6):753–774

    Google Scholar 

  • Jackson CB, Østerlund C, Mugar G, Hassman KD, Crowston K (2015) Motivations for sustained participation in crowdsourcing: case studies of citizen science on the role of talk. In: Proceedings of the 48th Hawaii international conference on system sciences, pp 1624–1634

  • Jiang JA, Wade K, Fiesler C, Brubaker JR (2021) Supporting serendipity: opportunities and challenges for human–AI collaboration in qualitative analysis. Proc ACM Hum Comput Interact 5(CSCW1):1–23

    Google Scholar 

  • Johnsson M, Gustafsson C, Johansson PE (2022) Disrupting the research process through artificial intelligence: towards a research agenda. Artif Intell Innov Manag 1:161–183

    Google Scholar 

  • Jorge CC, Tielman ML, Jonker CM (2022) Artificial trust as a tool in human–AI teams. In: Proceedings of the 2022 ACM/IEEE international conference on human–robot interaction, pp 1155–1157

  • Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Google Scholar 

  • Karimi P, Rezwana J, Siddiqui S, Maher ML, Dehbozorgi N (2020) Creative sketching partner: an analysis of human–AI co-creativity. In: Proceedings of the 25th international conference on intelligent user interfaces, pp 221–230

  • Karunagaran S, Mathew SK, Lehner F (2019) Differential cloud adoption: a comparative case study of large enterprises and SMEs in Germany. Inf Syst Front 21(4):861–875

    Google Scholar 

  • Knox WB, Stone P (2009) Interactively shaping agents via human reinforcement: the TAMER framework. In: Proceedings of the 5th international conference on knowledge capture, pp 9–16

  • Koren J, Zhang Y, Liu X (2008) Personalized interactive faceted search. In: Proceedings of the 17th international conference on world wide web, pp 477–486

  • Krivosheev E, Casati F, Baez M, Benatallah B (2018) Combining crowd and machines for multi-predicate item screening. Proc ACM Hum Comput Interact 2:1–18

    Google Scholar 

  • Ley M (2009) DBLP: some lessons learned. Proc VLDB Endowm 2(2):1493–1500

    Google Scholar 

  • Liu B (2021) In AI we trust? Effects of agency locus and transparency on uncertainty reduction in human–AI interaction. J Comput-Mediat Commun 26(6):384–402

    Google Scholar 

  • Lukyanenko R, Wiggins A, Rosser HK (2020) Citizen science: an information quality research frontier. Inf Syst Front 22(4):961–983

    Google Scholar 

  • Luz N, Silva N, Novais P (2015) A survey of task-oriented crowdsourcing. Artif Intell Rev 44(2):187–213

    Google Scholar 

  • Ma S, Zhang C, Liu X (2020) A review of citation recommendation: from textual content to enriched context. Scientometrics 122(3):1445–1472

    Google Scholar 

  • Micchi G, Bigo L, Giraud M, Groult R, Levé F (2021) I keep counting: an experiment in human/AI co-creative songwriting. Trans Int Soc Music Inf Retriev 4(1):263–275

    Google Scholar 

  • Mittleman DD, Briggs RO, Murphy J, Davis A (2008) Toward a taxonomy of groupware technologies. In: Proceedings of the 14th International Workshop on Groupware: Design, Implementation, and Use, pp 305–317

  • Nakagawa S, Samarasinghe G, Haddaway NR, Westgate MJ, O’Dea RE, Noble DW, Lagisz M (2019) Research weaving: visualizing the future of research synthesis. Trends Ecol Evol 34(3):224–238

    Google Scholar 

  • Noel-Storr AH, Redmond P, Lamé G, Liberati E, Kelly S, Miller L, Dooley G, Paterson A, Burt J (2021) Crowdsourcing citation-screening in a mixed-studies systematic review: a feasibility study. BMC Med Res Methodol 21(1):1–10

    Google Scholar 

  • Peeters MM, van Diggelen J, Van Den Bosch K, Bronkhorst A, Neerincx MA, Schraagen JM, Raaijmakers S (2021) Hybrid collective intelligence in a human–AI society. AI Soc 36(1):217–238

    Google Scholar 

  • Peffers K, Tuunanen T, Rothenberger MA, Chatterjee S (2007) A design science research methodology for information systems research. J Manag Inf Syst 24(3):45–77

    Google Scholar 

  • Price S, Flach PA (2017) Computational support for academic peer review: a perspective from artificial intelligence. Commun ACM 60(3):70–79

    Google Scholar 

  • Rohde M, Stevens G, Brödner P, Wulf V (2009) Towards a paradigmatic shift in IS: Designing for social practice. In: Proceedings of the 4th international conference on design science research in information systems and technology, pp 1–11

  • Rosser H, Wiggins A (2019) Crowds and camera traps: genres in online citizen science projects. In: Proceedings of the 52nd Hawaii international conference on system sciences, pp 5289–5298

  • Rzeszotarski J, Kittur A (2012) CrowdScape: interactively visualizing user behavior and output. In: Proceedings of the 25th annual ACM symposium on user interface software and technology, pp 55–62

  • Sanyal DK, Bhowmick PK, Das PP (2021) A review of author name disambiguation techniques for the PubMed bibliographic database. J Inf Sci 47(2):227–254

    Google Scholar 

  • Schmiedel T, Müller O, Vom Brocke J (2019) Topic modeling as a strategy of inquiry in organizational research: a tutorial with an application example on organizational culture. Organ Res Methods 22(4):941–968

    Google Scholar 

  • Schroder A, Constantiou I, Tuunainen VK, Austin RD (2022) Human–AI collaboration: coordinating automation and augmentation tasks in a digital service company. In: Proceedings of the 55th Hawaii international conference on system sciences, pp 206–215

  • Seeber I, Bittner E, Briggs RO, de Vreede T, de Vreede GJ, Elkins A, Maier R, Merz AB, Oeste-Reiß S, Randrup N, Schwabe G, Söllner M (2020) Machines as teammates: a research agenda on AI in team collaboration. Inf Manag 57(2):103174

    Google Scholar 

  • Shneiderman B (1996) The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the 1996 IEEE symposium on visual languages, pp 336–343

  • Singh S, Jain S, Jha, S. S. (2023). On subset selection of multiple humans to improve human–AI team accuracy. In: Proceedings of the 2023 international conference on autonomous agents and multiagent systems, pp 317–325

  • Suh M, Youngblom E, Terry M, Cai CJ (2021) AI as social glue: uncovering the roles of deep generative AI during social music composition. In Proceedings of the 2021 CHI conference on human factors in computing systems, pp 1–11

  • Swanson DR, Smalheiser NR (1997) An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif Intell 91(2):183–203

    MATH  Google Scholar 

  • Tchoua RB, Chard K, Audus DJ, Ward LT, Lequieu J, De Pablo JJ, Foster IT (2017) Towards a hybrid human-computer scientific information extraction pipeline. In: Proceedings of the 2017 IEEE 13th international conference on e-science, pp 109–118

  • Thilakaratne M, Falkner K, Atapattu T (2019) A systematic review on literature-based discovery: general overview, methodology, & statistical analysis. ACM Comput Surv 52(6):1–34

    Google Scholar 

  • Thomas J, Zaytseva A (2016) Mapping complexity/human knowledge as a complex adaptive system. Complexity 21(S2):207–234

    MathSciNet  Google Scholar 

  • Tokarchuk O, Cuel R, Zamarian M (2012) Analyzing crowd labor and designing incentives for humans in the loop. IEEE Internet Comput 16(5):45–51

    Google Scholar 

  • Vincent-Lamarre P, Larivière V (2023) Are self-citations a normal feature of knowledge accumulation? http://arxiv.org/abs/2303.02667

  • Vinella FL, Hu J, Lykourentzou I, Masthoff J (2022) Crowdsourcing team formation with worker-centered modeling. Front Artif Intell 102:1–10

    Google Scholar 

  • Vössing M, Kühl N, Lind M, Satzger G (2022) Designing transparency for effective human–AI collaboration. Inf Syst Front 24:877–895

    Google Scholar 

  • Wagner G, Lukyanenko R, Paré G (2022) Artificial intelligence and the conduct of literature reviews. J Inf Technol 37(2):209–226

    Google Scholar 

  • Waltz D, Buchanan BG (2009) Automating science. Science 324(5923):43–44

    Google Scholar 

  • Wang S, Koopman R (2017) Clustering articles based on semantic similarity. Scientometrics 111(2):1017–1031

    Google Scholar 

  • Wang W, Jiang X, Tian S, Liu P, Dang D, Su Y, Lookman T, Xie J (2022) Automated pipeline for superalloy data by text mining. NPJ Comput Mater 8(1):1–12

    Google Scholar 

  • Wiethof C, Bittner EA (2022) Toward a hybrid intelligence system in customer service: collaborative learning of human and AI. In: Proceedings of the 30th European conference on information systems, 66.

  • Yang Q, Steinfeld A, Rosé C, Zimmerman J (2020) Re-examining whether, why, and how human–AI interaction is uniquely difficult to design. In: Proceedings of the 2020 CHI conference on human factors in computing systems, pp 1–13

  • Zhang J, Yu W (2020) Early detection of technology opportunity based on analogy design and phrase semantic representation. Scientometrics 125(1):551–576

    Google Scholar 

  • Zhang R, McNeese NJ, Freeman G, Musick G (2021) “An ideal human”: expectations of AI teammates in human–AI teaming. Proc ACM Hum Comput Interact 4(CSCW3):1–25

    Google Scholar 

Download references

Acknowledgements

This research was mainly performed during an internship of António Correia at Microsoft Research, Cambridge, UK. The work was supported in part by the Portuguese Foundation for Science and Technology (FCT), national funding through the individual research Grant SFRH/BD/136211/2018. The authors would like to thank Siân Lindley from Microsoft Research for the important role in understanding and modifying the human–AI scientometric workflow that supports the SciCrowd system, as well as Jorge Santos for the help while building the necessary infrastructure. Our thanks extend to Hugo Paredes for the helpful discussions and valuable insights in the early stages of this work.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, investigation, methodology, formal analysis & writing–original draft and revised version, A.C.; writing–review and editing, A.G., P.A., S.J., and D.S.; P.A. prepared Figs. 2, 3, and 5; supervision & validation, A.G. and B.F. All authors reviewed the manuscript.

Corresponding author

Correspondence to António Correia.

Ethics declarations

Conflict of interest

The authors of this manuscript have no conflicts of interest or competing interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Correia, A., Grover, A., Jameel, S. et al. A hybrid human–AI tool for scientometric analysis. Artif Intell Rev 56 (Suppl 1), 983–1010 (2023). https://doi.org/10.1007/s10462-023-10548-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10548-7

Keywords

Navigation