Abstract
Humans display remarkable long-term visual memory (LTVM) processes. Even though images may be intrinsically memorable, the fidelity of their visual representations, and consequently the likelihood of successfully retrieving them, hinges on their similarity when concurrently held in LTVM. In this debate, it is still unclear whether intrinsic features of images (perceptual and semantic) may be mediated by mechanisms of interference generated at encoding, or during retrieval, and how these factors impinge on recognition processes. In the current study, participants (32) studied a stream of 120 natural scenes from 8 semantic categories, which varied in frequencies (4, 8, 16 or 32 exemplars per category) to generate different levels of category interference, in preparation for a recognition test. Then they were asked to indicate which of two images, presented side by side (i.e. two-alternative forced-choice), they remembered. The two images belonged to the same semantic category but varied in their perceptual similarity (similar or dissimilar). Participants also expressed their confidence (sure/not sure) about their recognition response, enabling us to tap into their metacognitive efficacy (meta-d’). Additionally, we extracted the activation of perceptual and semantic features in images (i.e. their informational richness) through deep neural network modelling and examined their impact on recognition processes. Corroborating previous literature, we found that category interference and perceptual similarity negatively impact recognition processes, as well as response times and metacognitive efficacy. Moreover, images semantically rich were less likely remembered, an effect that trumped a positive memorability boost coming from perceptual information. Critically, we did not observe any significant interaction between intrinsic features of images and interference generated either at encoding or during retrieval. All in all, our study calls for a more integrative understanding of the representational dynamics during encoding and recognition enabling us to form, maintain and access visual information.
Data and code availability
The data and R script to replicate all results presented in this study are deposited in the Open Science Framework (https://osf.io/b3snj/).
Notes
We adopted the terminology, exemplar condition, to be faithful with the study by Konkle, et al. 2010b, where this manipulation was first introduced; and theoretically, this is what we refer to as category interference.
Differently from Huebner and Gegenfurtner (2012), we maintained the same level of conceptual/category similarity (i.e. two kitchen scenes) and only manipulated perceptual similarity.
Note, some studies refer to all DNN features as perceptual regardless the layer they are extracted from (e.g. Hovhannisyan et al. 2021; Heinen et al. 2023). Here instead, we operationally distinguish between early convolutional layers, more closely represent purely low-level perceptual features, and late convolutional layers that are deeper in the network and nearer to the output so reflecting better grouping properties typical of higher semantic features.
Here we use the terms exemplar condition to describe the manipulation and category interference to describe the theoretical mechanism behind the manipulation. Previously, we used the term semantic interference to describe this manipulation (Mikhailova et al. 2021), but in the context of other manipulations (perceptual similarity), we opted to “exemplar condition”, which is in line with the terminology of the original study (Konkle et al. 2010b).
In Appendix 4, we report corroborating results where exemplar condition was modelled as a categorical rather than continuous variable corresponding to the initial experiment manipulation.
We also evaluated the relationship of perceptual and semantic image features with perceptual similarity effect at recognition, which has shown no interaction of perceptual similarity manipulation with perceptual features (z = − 0.05, p = 0.96) and semantic features (z = − 1.45, p = 0.15). Considering that adding perceptual similarity factor into the main analysis would lead to uninterpretable 4-way interactions and different memory phase that this manipulation is conducted at, we did not report this findings in the present paper.
References
Anderson MC, Neely JH (1996) Interference and inhibition in memory retrieval. Memory. Academic Press, Cambridge, pp 237–313
Anwyl-Irvine AL, Massonnié J, Flitton A, Kirkham N, Evershed JK (2020) Gorilla in our midst: an online behavioral experiment builder. Behav Res Methods 52(1):388–407
Baddeley AD, Dale HC (1966) The effect of semantic similarity on retroactive interference in long-and short-term memory. J Verbal Learn Verbal Behav 5(5):417–420
Bainbridge WA, Isola P, Oliva A (2013) The intrinsic memorability of face photographs. J Exp Psychol Gen 142(4):1323–1334
Bates D, Mächler M, Bolker B, Walker S (2015) Fitting Linear Mixed-Effects Models Using lme4. J Stat Softw 67(1):1–48
Brady TF, Konkle T, Alvarez GA, Oliva A (2008) Visual long-term memory has a massive storage capacity for object details. Proc Natl Acad Sci 105(38):14325–14329
Brady TF, Konkle T, Alvarez GA (2011) A review of visual memory capacity: beyond individual items and toward structured representations. J vis 11(5):1–34
Bylinskii Z, Isola P, Bainbridge C, Torralba A, Oliva A (2015) Intrinsic and extrinsic effects on image memorability. Vision Res 116:165–178
Cadieu CF, Hong H, Yamins DL, Pinto N, Ardila D, Solomon EA, DiCarlo JJ (2014) Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Comput Biol 10(12):e1003963
Castelhano MS, Krzyś K (2020) Rethinking space: a review of perception, attention, and memory in scene processing. Annu Rev Vision Sci 6(1):563–586
Chandler CC (1994) Studying related pictures can reduce accuracy, but increase confidence, in a modified recognition test. Mem Cognit 22(3):273–280
Cichy RM, Kaiser D (2019) Deep neural networks as scientific models. Trends Cogn Sci 23(4):305–317
Constant M, Liesefeld HR (2021) Massive effects of saliency on information processing in visual working memory. Psychol Sci 32(5):682–691
Craig M, Dewar M, Della Sala S (2015) Retroactive interference. International encyclopedia of the social & behavioral sciences. Elsevier, Amsterdam, pp 613–620
Damiano C, Walther DB (2019) Distinct roles of eye movements during memory encoding and retrieval. Cognition 184:119–129
Drascher ML, Kuhl BA (2022) Long-term memory interference is resolved via repulsion and precision along diagnostic memory dimensions. Psychon Bullet Rev 29(5):1–15
Egan JP (1975) Signal detection theory and ROC-analysis. Academic Press
Eriksen BA, Eriksen CW (1974) Effects of noise letters upon the identification of a target letter in a nonsearch task. Percept Psychophys 16(1):143–149
Fleming SM (2017) HMeta-d: hierarchical Bayesian estimation of metacognitive efficiency from confidence ratings. Neurosci Conscious 2017(1):1–14
Fleming SM, Lau HC (2014) How to measure metacognition. Front Hum Neurosci Neurosci 8:1–9
Gauthier I, James TW, Curby KM, Tarr MJ (2003) The influence of conceptual knowledge on visual discrimination. Cogn Neuropsychol 20(3–6):507–523
Goetschalckx L, Moors P, Wagemans J (2018) Image memorability across longer time intervals. Memory 26(5):581–588
Goetschalckx L, Andonian A, Wagemans J (2021) Generative adversarial networks unlock new methods for cognitive science. Trends Cogn Sci 25(9):788–801
Green DM, Swets JA (1966) Signal detection theory and psychophysics, vol 1. Wiley, New York, pp 1969–2012
Greene MR, Oliva A (2009) Recognition of natural scenes from global properties: seeing the forest without representing the trees. Cogn Psychol 58(2):137–176
Groen II, Greene MR, Baldassano C, Fei-Fei L, Beck DM, Baker CI (2018) Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior. Elife 7:e32962
Hanczakowski M, Butowska E, Philip Beaman C, Jones DM, Zawadzka K (2021) The dissociations of confidence from accuracy in forced-choice recognition judgments. J Mem Lang 117:104189
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hebart MN, Zheng CY, Pereira F, Baker CI (2020) Revealing the multidimensional mental representations of natural objects underlying human similarity judgements. Nat Hum Behav 4(11):1173–1185
Heinen R, Bierbrauer A, Wolf OT, Axmacher N (2023) Representational formats of human memory traces. Brain Struct Funct. https://doi.org/10.1007/s00429-023-02636-9
Hollingworth A, Henderson JM (2000) Semantic informativeness mediates the detection of changes in natural scenes. Vis Cogn 7(1–3):213–235
Hovhannisyan M, Clarke A, Geib BR, Cicchinelli R, Monge Z, Worth T, Davis SW (2021) The visual and semantic features that predict object memory: concept property norms for 1,000 object images. Mem Cognit 49(4):712–731
Hu S, Liu D, Song F, Wang Y, Zhao J (2020) The influence of object similarity on real object-based attention: the disassociation of perceptual and semantic similarity. Acta Physiol (oxf) 205:103046
Huebner GM, Gegenfurtner KR (2012) Conceptual and visual features contribute to visual memory for natural images. PLoS ONE 7(6):e37575
Isola P, Xiao J, Torralba A, Oliva A (2011) What makes an image memorable? J vis 11(11):1282
Jaegle A, Mehrpour V, Mohsenzadeh Y, Meyer T, Oliva A, Rust N (2019) Population response magnitude variation in inferotemporal cortex predicts image memorability. Elife 8:e47596
Ko Y, Lau H (2012) A detection theoretic explanation of blindsight suggests a link between conscious perception and metacognition. Philos Trans Royal Soc B Biol Sci 367(1594):1401–1411
Koch GE, Akpan E, Coutanche MN (2020) Image memorability is predicted by discriminability and similarity in different stages of a convolutional neural network. Learn Mem 27(12):503–509
Konkle T, Brady TF, Alvarez GA, Oliva A (2010a) Conceptual distinctiveness supports detailed visual long-term memory for real-world objects. J Exp Psychol Gen 139(3):558–578
Konkle T, Brady TF, Alvarez GA, Oliva A (2010b) Scene memory is more detailed than you think: the role of categories in visual long-term memory. Psychol Sci 21(11):1551–1556
Kriegeskorte N (2015) Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu Rev Vision Sci 1(1):417–446
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Kyle-Davidson C, Bors AG, Evans KK (2022) Modulating human memory for complex scenes with artificially generated images. Sci Rep 12(1):1–15
Lau H, Rosenthal D (2011) Empirical support for higher-order theories of conscious awareness. Trends Cogn Sci 15(8):365–373
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Lindsay GW (2021) Convolutional neural networks as a model of the visual system: past, present, and future. J Cogn Neurosci 33(10):2017–2031
Lukavský J, Děchtěrenko F (2017) Visual properties and memorising scenes: effects of image-space sparseness and uniformity. Atten Percept Psychophys 79(7):2044–2054
Mandler JM, Ritchey GH (1977) Long-term memory for pictures. J Exp Psychol Hum Learn Mem 3(4):386–396
Maniscalco B, Lau H (2012) A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings. Conscious Cognit 21(1):422–430
Maniscalco B, Lau H (2014) Signal detection theory analysis of type 1 and type 2 data meta-d’, response-specific meta-d’, and the unequal variance SDT model. The cognitive neuroscience of metacognition. Springer, Berlin, pp 25–66
Mikhailova A, Raposo A, Della Sala S, Coco MI (2021) Eye-movements reveal semantic interference effects during the encoding of naturalistic scenes in long-term memory. Psychon Bullet Rev 28(5):1601–1614. https://doi.org/10.3758/s13423-021-01920-1
Mikhailova A, Santos-Victor J, Coco MI (2022) Contribution of low, mid and high-level image features of indoor scenes in predicting human similarity judgements. Pattern recognition and image analysis. Springer, Cham, pp 505–514
Nairne JS (2006) Modeling distinctiveness: Implications for general memory theory. Distinctiveness and memory. Oxford University Press, New York, pp 27–46
Needell CD, Bainbridge WA (2022) Embracing new techniques in deep learning for estimating image memorability. Comput Brain Behav 5(2):168–184
Neumann D, Gegenfurtner Justus KR (2006) Image retrieval and perceptual similarity. ACM Trans Appl Percept 3(1):31–47
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42(3):145–175
Olsson H, Poom L (2005) Visual memory needs categories. Proc Natl Acad Sci USA 102(24):8776–8780
Paivio A (1991) Dual coding theory: retrospect and current status. Can J Psychol/Revue Canadienne De Psychologie 45(3):255
Ratcliff R, Gronlund SD (1992) Testing global memory models using ROC curves. Psychol Rev 99(3):518–535
Ridderinkhof KR, Band GPH, Logan GD (1999) A study of adaptive behavior: effects of age and irrelevant information on the ability to inhibit one’s actions. Acta Physiol (oxf) 101(2–3):315–337
Robertson IH, Manly T, Andrade J, Baddeley BT, Yiend J (1997) “Oops!”: performance correlates of everyday attentional failures in traumatic brain injured and normal subjects. Neuropsychologia 35(6):747–758
Santangelo V (2015) Forced to remember: when memory is biased by salient information. Behav Brain Res 283:1–10
Satterthwaite FE (1946) An approximate distribution of estimates of variance components. Biom Bullet 2(6):110–114
Schurgin MW (2018) Visual memory, the long and the short of it: a review of visual working memory and long-term memory. Atten Percept Psychophys 80(5):1035–1056
Scott RB, Dienes Z, Barrett AB, Bor D, Seth AK (2014) Blind insight: metacognitive discrimination despite chance task performance. Psychol Sci 25(12):2199–2208
Shepard RN (1967) Recognition memory for words, sentences, and pictures. J Verbal Learn Verbal Behav 6:156–163
Son G, Walther DB, Mack ML (2022) Scene wheels: measuring perception and memory of real-world scenes with a continuous stimulus space. Behav Res Methods 54(1):444–456
Standing L (1973) Learning 10,000 pictures. Q J Exp Psychol 25(2):207–222
Standing L, Conezio J, Haber RN (1970) Perception and memory for pictures: single-trial learning of 2500 visual stimuli. Psychon Sci 19(2):73–74
Underwood BJ (1957) Interference and forgetting. Psychol Rev 64(1):49
Valentine T, Lewis MB, Hills PJ (2016) Face-space: a unifying concept in face recognition research. Q J Exp Psychol 69(10):1996–2019
Võ MLH (2021) The meaning and structure of scenes. Vision Res 181:10–20
Vogt S, Magnussen S (2007) Long-term memory for 400 pictures on a common theme. Exp Psychol 54(4):298–303
Watkins M, Watkins OC (1976) Cue-overload theory and the method of interpolated attributes. Bull Psychon Soc 7(3):289–291
Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer, Cham
Wiseman S, Neisser U (1974) Perceptual organization as a determinant of visual recognition memory. Am J Psychol 87(4):675–681
Wixted JT (2021) The role of retroactive interference and consolidation in everyday forgetting. Current issues in memory. Routledge, New York, pp 117–143
Xiao J, Hays J, Ehinger KA, Torralba A (2010) SUN database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), 3485–3492
Acknowledgements
We additionally thank Wilma A. Bainbridge's lab at the University of Chicago for providing A.M. with the skills and expertise to execute image feature modelling presented in this work.
Funding
This work was funded by Fundação para a Ciência e Tecnologia with a PhD scholarship to A.M. (SFRH/BD/144453/2019) and a grant awarded to M.I.C. (PTDC/PSI-ESP/30958/2017).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no financial or non-financial interests that are relevant to this article.
Consent to participate
All participants included in the study signed the informed consent before data collection.
Ethics approval
The study was conducted following the 1964 Declaration of Helsinki, the British Psychological Society’s Code of Ethics and Conduct (2018) and the UEL Code of Practice for Research Ethics (2015–16) and approved by the Psychology Ethics Committee of the University of East London proceeding data collection.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Editors: Daniele Nardi (Ball State University), Irene León (International University of La Rioja); Reviewers: Karolina Krzys (Queen's University) and a second researcher who prefers to remain anonymous.
Appendices
Appendix 1
Since the experiment was executed online, there is a stronger necessity for the quality control of the data obtained. Besides enforcing full-screen mode in the experiment to avoid participants switching between the tabs and ensuring that the experimental session duration of all participants falls under the same time frame (31 ± 6.36 min), we also evaluated the participants’ performance as a way to measure their attention to the task. The results of the binomial test of recognition accuracy are reported in the main text under the section Participants since according to its results one participant was excluded from further analysis. Here we additionally report the participants’ performance on the Flanker task. All of the participants performed above chance (overall, there were 98% of correct responses with a maximum rate of 10% incorrect responses by only one of the participants). Based on such results, we did not exclude any of the participants from the analysis.
Appendix 2
The sustained attention task (SAT, Robertson et al. 1997) and Flanker task (Eriksen and Eriksen 1974, implemented after Ridderinkhof et al. 1999) were utilised to fill the consolidation time between the memorisation and recognition phase and assess whether scores at these tasks, which tap into attentional capacity and inhibitory control mechanisms, would speak about effects of perceptual similarity of target and foil images and exemplar conditions on recognition memory.
In SAT trials (N = 10), participants were presented with a stream of 19 letters and were told to press the spacebar every time they saw the letter X (3 instances in each trial). At the end of each trial, they received written feedback about their performance. The average accuracy across participants in this task was 43% (SD = 5%), which did not significantly correlate with the recognition accuracy (r = 0.09, p = 0.64) and the slope of exemplar condition (r = − 0.03, p = 0.85) at the LTVM task.
In Flanker trials (N = 60), participants were presented with five arrows in a horizontal line either pointing left or right and instructed to indicate the direction of the central arrow using arrow keys. Participants received a practice round of 4 trials and were feedback regarding their overall performance score at the end of the task. For the reaction time (RT) analysis of the Flanker task, we excluded all incorrect trials (about 1 trial per participant out of 60 trials). As expected, incongruent trials lead to significantly slower RT than congruent trials (t = 6.02, p > 0.001). The difference between congruent and incongruent trials RT was positively correlated with exemplar condition slope (r = 0.4, p = 0.02) but did not correlate with memory recognition performance (r = 0.25, p = 0.16).
Appendix 3
In this appendix, we provide a linear regression analysis of d’ as a function of continuous exemplar and perceptual similarity (PS) conditions (Fig.
6 and Table 3). We corroborate the significant main effects of PS, such that similar target and foil images were associated with a lower d’, and the significant main effect of exemplar condition, such that it decreases for increasing levels of category interference, and no significant interaction between these two factors.
Appendix 4
We repeated all our analyses for replication purposes but now with exemplar condition expressed as categorical (i.e. 4, 8, 16, 32, with 4 as the reference level) rather than a continuous variable (see the original study by Konkle et al. 2010b). All analyses mostly replicate the patterns reported in the main text (Fig.
7 and Table 4), even though we now do not observe significant differences across all possible exemplar condition contrasts, which reinforces the methodological soundness of our approach to treating category interference as a continuous rather than categorical predictor.
Appendix 5
See Table 5 for descriptive statistics of recognition accuracy and memory reaction time with meta-d' values across perceptual similarity and exemplar conditions.
Rights and permissions
About this article
Cite this article
Mikhailova, A., Lightfoot, S., Santos-Victor, J. et al. Differential effects of intrinsic properties of natural scenes and interference mechanisms on recognition processes in long-term visual memory. Cogn Process 25, 173–187 (2024). https://doi.org/10.1007/s10339-023-01164-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10339-023-01164-y