Abstract
There has been considerable optimistic speculation on how well ChatGPT-4 would perform in a Turing Test. However, no minimally serious implementation of the test has been reported to have been carried out. This brief note documents the results of subjecting ChatGPT-4 to 10 Turing Tests, with different interrogators and participants. The outcome is tremendously disappointing for the optimists. Despite ChatGPT reportedly outperforming 99.9% of humans in a Verbal IQ test, it falls short of passing the Turing Test. In 9 out of the 10 tests conducted, the interrogators successfully identified ChatGPT-4 and the human participant. The probability of obtaining this result from a process in which the interrogator is really no better than chance at correct identification is calculated to be less than 1%. An additional question was posed to the interrogators at the end of each test: What led them to distinguish between the human and the machine? The interrogators, who effectively filtered out ChatGPT-4 from passing the Turing Test for intelligence, stated that they could identify the machine because it, in effect, responded more intelligently than the human. Subsequently, ChatGPT-4 was tasked with differentiating syntax from semantics and self-corrected when falling for the fallacy of equivocation. The curious situation is arrived at that passing the Turing Test for intelligence remains a challenge that ChatGPT-4 has yet to overcome, precisely because, as per the interrogators, its intellectual abilities surpass those of individual humans.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Many thanks to an anonymous reviewer for making this argument.
A reviewer has objected that the Turing Test provides “At most. evidence for behavioral indiscernibility, not thought.” However, valid scientific tests warrant abductive inferences about the causes of their results, as was previously noted. Further, it is worth examining the way Turing presented his ideas. He titled his 1950 paper “Computing machinery and intelligence”—not “Computing machinery and behaviour”—and “propose(d) to consider the question, ‘Can machines think?’” (p. 433). It is true that he then, however, asserted that the question was “too meaningless” and should be “replaced” with whether there are digital computers which would do well in the imitation game (p. 442). However, he then re-affirms that “We cannot altogether abandon the original form of the problem”—i.e. “Can machines think?” (p. 442), and throughout his paper frames his inquiry in terms of the possibility of machines having “intellectual capacities” (p. 434), “the power of thinking” (p. 444), “intellect” (p. 445), “thinking” (p. 453), ways of making “sure that a machine thinks” (p. 446), and that the standard of his test is similar to our grounds against “the solipsist point of view” (p. 446), which is based on the behavioural evidence of others and which also serves for discerning whether someone “really understands” something or has “learned it parrot fashion” (p. 446), among others. Consequently, there is ground to think that Turing’s aim, albeit with some tension, was designing a test to warrant the inference from behaviour to thought in the case of machines.
References
Arendt, H. (1963). Eichmann in Jerusalem: A report on the banality of evil. Viking.
Bayne, T., & Williams, I. (2023). The Turing test is not a good benchmark for thought in LLMs. Nature Human Behaviour, 7, 1806–1807. https://www.nature.com/articles/s41562-023-01710-w
Biever, C. (2023). ChatGPT broke the Turing Test-the race is on for new ways to assess AI. Nature, 619, 686–689. https://doi.org/10.1038/d41586-023-02361-7
Block, N. (1981). Psychologism and behaviorism. Philosophical Review, 90(1), 5–43. https://doi.org/10.2307/2184371
Bunge, M. (2010). Matter and mind. Springer. Boston Studies in the Philosophy of Sciencehttps://doi.org/10.1007/978-90-481-9225-0
Carrim, S. (2017) The legacy of Hannah Arendt’s. Banality of Evil Review of Human Rights, 3(1), 65–86. https://doi.org/10.35994/rhr.v3i1.83
Copeland, J. (2000). The Turing Test. Minds and Machines, 10, 519–539. https://doi.org/10.1023/A:1011285919106
Copeland, J. (2004). The essential turing. Oxford University Press.
Dietrich, E. (Ed.). (2014). Thinking computers and virtual persons. Academic.
Dretske, F. (1988). Explaining behavior: Reasons in a world of causes. MIT Press.
Dreyfus, H. (1992). What computers still can’t do: A critique of artificial reason. MIT Press.
Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30, 681–694. https://doi.org/10.1007/s11023-020-09548-1
Gonçalves, B. (2023). The Turing Test is a thought experiment. Minds & Machines, 33, 1–31. https://doi.org/10.1007/s11023-022-09616-8
Jannai, D., Meron, A., Lenz, B., Levine, Y., & Shoham, Y. (2023). Human or not? A gamified approach to the Turing Test. https://arxiv.org/abs/2305.20010
Jones, C. R., & Bergen, B. K. (2024). Does GPT-4 pass the Turing Test? Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 5183–5210. https://aclanthology.org/2024.naacl-long.290.pdf
Kearns, M., & Roth, A. (2019). The ethical algorithm: The science of socially aware algorithm design. Oxford University Press.
Kuhn, T. (1977 [1964]). A function for thought experiments. In T. Kuhn (Ed.), The essential tension: Selected studies in scientific tradition and change (pp. 240–265). University of Chicago Press.
Lacker, K. (2020). Giving GPT-3 a Turing Test. Blog. Retrieved from https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html
Macdonald, C., & Macdonald, G. (2010). Emergence in mind. Oxford University Press.
Mach, E. (1976 [1897]). On thought experiments. In E. N. Hiebert (Ed.), Knowledge and error: Sketches on the psychology of enquiry (pp. 134–147). D. Reidel. https://doi.org/10.1007/978-94-010-1428-1
Milgram, S. (1974). Obedience to authority. An experimental view. Tavistock.
Moor, J. (2001). The status and future of the Turing Test. Minds and Machines, 11, 77–93. https://doi.org/10.1023/A:1011218925467
OpenAI (2023). Technical Report GPT-4. Open AI. https://arxiv.org/abs/2303.08774
Oppy, G., & Dowe, D. (2021). The Turing Test. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/win2021/entries/turing-test/
Penrose, R. (2016). The emperor’s new mind: Concerning computers, minds, and the laws of physics. Oxford University Press.
Piccinini, G. (2000). Turing’s rules for the imitation game. Minds and Machines, 10, 573–585. https://doi.org/10.1023/A:1011246220923
Popper, K. (2002 [1959]). The logic of scientific discovery. Routledge. English edition translated and extended from the 1935 German edition Logik der Forschung: zur Erkenntnistheorie der Modernen Naturwissenschaft. Springer.
Restrepo, R. (2012a). Two myths of psychophysical reductionism. Open Journal of Philosophy, 2(2), 75–83. https://doi.org/10.4236/ojpp.2012.22011
Restrepo, R. (2012b). Computers, persons, and the Chinese room. Part 1: The human computer. The Journal of Mind and Behavior, 33(1/2), 27–47. http://www.jstor.org/stable/43854322
Restrepo, R. (2012c). Computers, persons, and the Chinese room. Part 2: The man who understood. The Journal of Mind and Behavior, 33(3/4), 123–139. http://www.jstor.org/stable/43854338
Restrepo Echavarría, R. (2009). Russell’s structuralism and the supposed death of cognitive science. Minds and Machines, 19, 181–197. https://doi.org/10.1007/s11023-009-9155-5
Roivainen, E. (2023). AI's IQ: ChatGPT aced a test but showed that intelligence cannot be measured by IQ alone. Scientific American, 329(1),7. https://doi.org/10.1038/scientificamerican0723-7
Searle, J. (1980). Minds, brains and programs. Behavioral and Brain Sciences, 3, 417–457. https://doi.org/10.1017/S0140525X00005756
Shannon, C. E., & McCarthy, J. (1956). Automata studies. Princeton University Press.
Tegmark, M. (2017). Life 3.0: Being human in the age of artificial intelligence. Knopf.
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433–460. https://doi.org/10.1093/mind/LIX.236.433
Wallach, W., & Allen, C. (2008). Moral machines: Teaching robots right from wrong. Oxford University Press.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Restrepo Echavarría, R. ChatGPT-4 in the Turing Test. Minds & Machines 35, 8 (2025). https://doi.org/10.1007/s11023-025-09711-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11023-025-09711-6