Skip to main content
Log in

Benchmarking Scientific Image Forgery Detectors

  • Original Research/Scholarship
  • Published:
Science and Engineering Ethics Aims and scope Submit manuscript

Abstract

The field of scientific image integrity presents a challenging research bottleneck given the lack of available datasets to design and evaluate forensic techniques. The sensitivity of data also creates a legal hurdle that restricts the use of real-world cases to build any accessible forensic benchmark. In light of this, there is no comprehensive understanding on the limitations and capabilities of automatic image analysis tools for scientific images, which might create a false sense of data integrity. To mitigate this issue, we present an extendable open-source algorithm library that reproduces the most common image forgery operations reported by the research integrity community: duplication, retouching, and cleaning. We create a large scientific forgery image benchmark (39,423 images) with enriched ground truth using this library and realistic scientific images. All figures within the benchmark are synthetically doctored using images collected from creative commons sources. While collecting the source images, we ensured that the they did not present any suspicious integrity problems. Because of the high number of retracted papers due to image duplication, this work evaluates the state-of-the-art copy-move detection methods in the proposed dataset, using a new metric that asserts consistent match detection between the source and the copied region. All evaluated methods had a low performance in this dataset, indicating that scientific images might need a specialized copy-move detector. The dataset and source code are available at https://github.com/phillipecardenuto/rsiil.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. A biomedical experimental photo resulting of a process to detect proteins from a cell or tissue.

  2. https://ori.hhs.gov/advanced-forensic-actions (Last Access June 2022)

  3. https://imagetwin.ai (Last Access June 2022)

  4. https://www.darpa.mil/program/media-forensics (Last access June, 2022)

  5. http://celltracking.bio.nyu.edu (Last access June, 2022)

  6. https://bbbc.broadinstitute.org (Last access June, 2022)

  7. https://imagej.net/Adiposoft (Last access June, 2022)

  8. https://idr.openmicroscopy.org (Last access June, 2022)

  9. http://retractiondatabase.org (Last access June, 2022)

  10. A non-profit organization affiliated with the Center for Scientific Integrity and dedicated to report and discuss cases of retracted papers and related issues.

  11. https://headt.eu/Image-Integrity-Database (Last access June, 2022)

  12. https://www.stm-assoc.org/standards-technology/working-group-on-image-alterations-and-duplications (Last access June, 2022)

  13. Code available at https://github.com/igorcmoura/inpaint-object-remover. (Last access June, 2022)

  14. https://creativecommons.org/publicdomain/zero/1.0 (Last access June, 2022)

  15. https://creativecommons.org/licenses/by/4.0 (Last access June, 2022)

  16. https://bbbc.broadinstitute.org (Last access June, 2022)

  17. https://www.ncbi.nlm.nih.gov/pmc (Last access June, 2022)

  18. https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist (Last access June, 2022)

References

  • Al-Qershi, O. M., & Khoo, B. E. (2018). Evaluation of copy-move forgery detection: Datasets and evaluation metrics. Multimedia Tools and Applications, 77(24), 31807–31833. https://doi.org/10.1007/s11042-018-6201-4.

    Article  Google Scholar 

  • Amerini, I., Ballan, L., Caldelli, R., Bimbo, A. D., & Serra, G. (2011). A SIFT-based forensic method for copy–move attack detection and transformation recovery. IEEE Transactions on Information Forensics and Security, 6(3), 1099–1110. https://doi.org/10.1109/tifs.2011.2129512

    Article  Google Scholar 

  • Anderson, C. (1994). Easy-to-alter digital images raise fears of tampering. Science, 263(5145), 317–318. https://doi.org/10.1126/science.8278802

    Article  Google Scholar 

  • Andrade, R.d.O. (2021). Elisabeth Bik: On the trail of scientific fraud. https://revistapesquisa.fapesp.br/en/elisabeth-bik-on-the-trail-of-scientific-fraud/

  • Azoulay, P., Bonatti, A., & Krieger, J. L. (2017). The career effects of scandal: Evidence from scientific retractions. Research Policy, 46(9), 1552–1569.

    Article  Google Scholar 

  • Barnes, C., Shechtman, E., Finkelstein, A., & Goldman, D. B. (2009). Patchmatch: A randomized correspondence algorithm for structural image editing. In ACM transactions on graphics (TOG), (vol. 28p. 24).

  • Bik, E., Casadevall, A., & Fang, F. (2016). The prevalence of inappropriate image duplication in biomedical research publications. MBio, 7(3), e00809.

    Article  Google Scholar 

  • Bo, X., Junwen, W., Guangjie, L., & Yuewei, D. (2010). Image copy-move forgery detection based on SURF. In 2010 International conference on multimedia information networking and security. IEEE. https://doi.org/10.1109/mines.2010.189.

  • Bucci, E. (2018). Automatic detection of image manipulations in the biomedical literature. Nature Cell Death & Disease, 9(3), 400.

    Article  Google Scholar 

  • Christlein, V., Riess, C., Jordan, J., Riess, C., & Angelopoulou, E. (2012). An evaluation of popular copy-move forgery detection approaches. IEEE Transactions on Information Forensics and Security, 7(6), 1841–1854. https://doi.org/10.1109/tifs.2012.2218597

    Article  Google Scholar 

  • Christopher, J. (2018). Systematic fabrication of scientific images revealed. FEBS Letters, 592, 3027–3029.

    Article  Google Scholar 

  • Cozzolino, D., Poggi, G., & Verdoliva, L. (2015). Efficient dense-field copy-move forgery detection. IEEE Transactions on Information Forensics and Security, 10(11), 2284–2297.

    Article  Google Scholar 

  • Criminisi, A., Pérez, P., & Toyama, K. (2004). Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on image processing, 13(9), 1200–1212.

    Article  Google Scholar 

  • Cromey, D. (2010). Avoiding twisted pixels: Ethical guidelines for the appropriate use and manipulation of scientific digital images. Springer Science and Engineering Ethics, 16(4), 639–667.

    Article  Google Scholar 

  • Ehret, T. (2018). Automatic detection of internal copy-move forgeries in images. Image Processing On Line, 8, 167–191. https://doi.org/10.5201/ipol.2018.213

    Article  Google Scholar 

  • Guan, H., Kozak, M., Robertson, E., Lee, Y., Yates, A.N., Delgado, A., Zhou, D., Kheyrkhah, T., Smith, J., & Fiscus, J. (2019) MFC datasets: Large-scale benchmark datasets for media forensic challenge evaluation. In 2019 IEEE winter applications of computer vision workshops (WACVW) (pp. 63–72). https://doi.org/10.1109/WACVW.2019.00018

  • Koker, T.E., Chintapalli, S.S., Wang, S., Talbot, B.A., Wainstock, D., Cicconet, M., & Walsh, M.C. (2021). On identification and retrieval of near-duplicate biological images: A new dataset and protocol. In International conference on pattern recognition (ICPR). IEEE. https://ailb-web.ing.unimore.it/icpr/author/3517

  • Krueger, J. (2002). Forensic examination of questioned scientific images. Accountability in Research, 9(2), 105–125. https://doi.org/10.1080/08989620212970

    Article  Google Scholar 

  • Li, Y., & Zhou, J. (2019). Fast and effective image copy-move forgery detection via hierarchical feature point matching. IEEE Transactions on Information Forensics and Security, 14(5), 1307–1322. https://doi.org/10.1109/tifs.2018.2876837

    Article  Google Scholar 

  • Marcus, A. (2019). Pitt researchers sue journal for defamation following retraction. https://retractionwatch.com/2019/12/02/pitt-researchers-sue-journal-for-defamation-following-retraction/

  • Mongeon, P., & Larivière, V. (2013). The collective consequences of scientific fraud: An analysis of biomedical research. In Proceedings of ISSI 2013, proceedings of the international conference on scientometrics and informetrics (pp. 1897–1899). Austrian Institute of Technology.

  • Moreira, D., Bharati, A., Brogan, J., Pinto, A., Parowski, M., Bowyer, K., Flynn, P., Rocha, A., & Scheirer, W. (2018). Image provenance analysis at scale. IEEE Transactions on Image Processing, 27(12), 6109–6123.

    Article  Google Scholar 

  • Naylor, P., Lae, M., Reyal, F., & Walter, T. (2017). Nuclei segmentation in histopathology images using deep neural networks. In 2017 IEEE 14th international symposium on biomedical imaging (ISBI 2017). IEEE. https://doi.org/10.1109/isbi.2017.7950669.

  • Noorden, R. V. (2015). The image detective who roots out manuscript flaws. Nature. https://doi.org/10.1038/nature.2015.17749

  • Parrish, D., & Noonan, B. (2009). Image manipulation as research misconduct. Science and Engineering Ethics, 15(2), 161–167. https://doi.org/10.1007/s11948-008-9108-z

    Article  Google Scholar 

  • Pun, C. M., Yuan, X. C., & Bi, X. L. (2015). Image forgery detection using adaptive oversegmentation and feature point matching. IEEE Transactions on Information Forensics and Security, 10(8), 1705–1716. https://doi.org/10.1109/tifs.2015.2423261

    Article  Google Scholar 

  • Qi, C., Zhang, J., & Luo, P. (2020). Emerging concern of scientific fraud: Deep learning and image manipulation. bioRxiv.

  • Rossner, M. (2008). A false sense of security. Journal of Cell Biology, 183(4), 573–574. https://doi.org/10.1083/jcb.200810172

    Article  Google Scholar 

  • Rossner, M., & Yamada, K. (2004). What’s in a picture? The temptation of image manipulation. The Journal of Cell Biology, 166(1), 11–15.

    Article  Google Scholar 

  • Taubes, G. (1994). Technology for turning seeing into believing. Science, 263(5145), 318. https://doi.org/10.1126/science.8278803.

    Article  Google Scholar 

  • Wjst, M. (2021). Scientific integrity is threatened by image duplications. American Journal of Respiratory Cell and Molecular Biology, 64(2), 271–272. https://doi.org/10.1165/rcmb.2020-0419le

    Article  Google Scholar 

  • Wu, Y., AbdAlmageed, W., & Natarajan, P. (2018). Busternet: Detecting image copy-move forgery with source/target localization. In European conference on computer vision (ECCV). Springer.

  • Xiang, Z., & Acuna, D. (2020). Scientific image tampering detection based on noise inconsistencies: A method and datasets. arXiv preprint arXiv:2001.07799

  • Zhou, P., Han, X., Morariu, V.I., & Davis, L.S. (2018). Learning rich features for image manipulation detection. In 2018 IEEE/CVF conference on computer vision and pattern recognition. IEEE. https://doi.org/10.1109/cvpr.2018.00116.

Download references

Funding

This research was supported by São Paulo Research Foundation (FAPESP), under the thematic project DéjàVu, Grants 2017/12646-3 and 2020/02211-2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João P. Cardenuto.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Computer Science and Digital Forensics Glossary

  1. 1.

    Algorithm Library: A collection of algorithms and routines used by computers to run a software.

  2. 2.

    Benchmark: The process of assessing the performance of a method using a standard dataset and metric.

  3. 3.

    Brute Force Algorithm: Solving a problem trying each possible solution with exhaustive search.

  4. 4.

    Color Histogram: The frequency representation of each pixel color intensity in an image.

  5. 5.

    Computer Vision: An interdisciplinary area from Computer Science dedicated to inferring, understanding, and producing digital images and videos using computational intelligence.

  6. 6.

    Dataset: In the scope of this work, a collection of images.

  7. 7.

    Deep Fakes: Synthetic and realistic images generated by an Artificial Intelligence.

  8. 8.

    Detection Map: A map informing, for each pixel from an input image, whether the pixel is doctored or not according to a tampering detection method.

  9. 9.

    Detection Probability Map: A map related to an image that indicates the probability of each pixel being doctored.

  10. 10.

    False Negative: In the tampering detection scope, the result of predicting a forged data as pristine.

  11. 11.

    False Positive: In the tampering detection scope, the result of predicting a pristine data as doctored.

  12. 12.

    Ground Truth: The annotated information from data. In this work, we also use this term as the annotated map that pinpoints the location of each doctored pixel from an image.

  13. 13.

    True Positive: In the tampering detection scope, the result of correctly predicting a forgery.

  14. 14.

    True Negative: In the tampering detection scope, the result of correctly predicting pristine data.

  15. 15.

    JavaScript Object Notation (JSON): A typical file format in Computer Science for sharing object data in a human-readable text.

  16. 16.

    Metadata: A data description providing information regarding its storage, content, and creation.

  17. 17.

    Model Fine-tuning: The process of performing small adjustments in a pre-trained model to improve its performance for a specific task.

  18. 18.

    Model Training: The phase of an Artificial Intelligence algorithm in which a model learns how to accomplish a task.

  19. 19.

    Object-Mask or Object map: A mask/map that pinpoints the location of each foreground object within an image.

Evaluation Plots and Output Results

This section presents all scores organized in radar plots from the copy-move forgery detection benchmark. We also present output samples from all methods for each evaluated modality.

Fig. 16
figure 16

a CMFD Simple Figure Evaluation. Busternet has the best average performance for Simple Figures because its polygon contains almost all the others (only OVERSEG is not contained by Busternet’s polygon). The shrinking polygons area from \(\text{ F1-score}_{TP}\) to \(\text{ F1-score}_{CTP}\) indicates that all methods show inconsistencies in their detection map. All metrics in percentage. b CMFD Inter-Panel Figure Evaluation on different DVS. In this plot, each method fares differently for each modality. The polygons from SIFT-NN and SURF-NN have a larger area than the other methods, indicating their robustness to more operations than the other methods. The shrinking polygons area from \(\text{ F1-score}_{CTP}(DVS-1)\) to \(\text{ F1-score}_{CTP}(DVS-3)\) indicates that, the higher the caption degree of visual signs, the lower the method’s effectiveness will be. All metrics in percentage. c CMFD Intra-Panel Figure Evaluation on different DVS. All methods show low performance and concentrate on the center of the radar, which indicates this is a challenging modality. The best method in this modality is OVERSEG with scores lower than 3.0 for the types of evaluated forgeries. All metrics in percentage. Evaluation Baseline Results. Within the parenthesis of each copy-move modality, there is the transformation used during the copy-move forgery. All F1-scores presented in this figure are in percentage format. The best result for each duplication modality is indicated with the color of the respective detector (e.g., in the left plot of a the value 16.83 relative to Copy-move (Random) is colored with Busternet’s color (orange), indicating that Busternet is the best method in this modalilty). a Result for Single Figure Evaluation using \(\text{ F1-score}_{TP}\) and \(\text{ F1-score}_{CTP}\). b Result for Inter-Panel Figure Evaluation using \(\text{ F1-score}_{CTP}\) across all degrees of visual signs, indicated by the number in its subtitle (i.e., DVS1 for degree of visual signs equals to one). c Result for Intra-Panel Figure Evaluation using \(\text{ F1-score}_{CTP}\) across all degrees of visual signs

Figure 16 presents a radar graph visualization based on the evaluation tables in which the forgery modalities are arranged in the radius axes. Each CMFD methods’ result is represented by a different color in the radar chart. In this visualization, we inserted the score of each method along the modality axis (e.g., copy-move with flip), which start from the radar center (score zero) to its border (highest score); thus, the farther the method point (colorpoint) is from the center, the better the method for the axis copy-move modality will be. After inserting all points of a method for each copy-move modality, we connected those points, which results in a polygon. The larger the polygon area, the better the method’s robustness among different forgery modalities. Also, comparing each detector’s robustness to the operations, this visualization type helps identify possible complementary behaviors among different methods. As an example, consider Fig. 19 left panel. In this case, we have five modalities being compared (e.g., Copy-Move with Flip, Cleaning with Brute-Force, and Copy-Move with Translation). This chart shows the results of seven methods represented by each polygon color (see legend on the right). The best method in this figure is OVERSEG (in red), while the two worse methods (HFPM and SIFT-NN) are in superposition at the center (smaller areas).

Evaluation I: Simple Forgery Figures

Figure 16a shows the evaluation of Simple Figure Forgery Detection with \(\text{ F1-score}_{TP}\) (left plot) and \(\text{ F1-score}_{CTP}\) (right char), all scores in percentage (i.e. within a range from zero to one hundred).

Figure 17 presents output detection samples for all methods applied to the evaluated Simple forgery modalities. This figure includes \(\text{ F1-score}_{CTP}\) of each detected map next to each method’s name in the figure.

Evaluation II: Inter-Forgery Compound Figures

Figure 16b shows the evaluation of Inter-Forgery Compound Figure Detection using \(\text{ F1-score}_{CTP}\). In this modality, the radar visualization helped us observe some complementary performance among the chosen detectors. For instance, SURF-NN and Zernike-PM show a complementary behavior to copy-move with rotation and retouching. The flipped copy-move, Splicing, and Overlap forgery proved the most challenging forgeries. In addition, the visual signs are shown to have a noticeable impact in this scenario, reducing by up to seven points from degree 1 to degree 3 for some detectors.

Figure 18 presents output detection samples for all methods applied to the evaluated Inter-Panel forgery modalities with degree of visual sign equals to one.

Evaluation III: Intra-Forgery Compound Figures

Figure 16c shows the evaluation of Intra-Forgery Compound Figure Detection using \(\text{ F1-score}_{CTP}\). In this modality, the detectors scored lower than three percent on \(\text{ F1-score}_{CTP}\) for all evaluated operations. In this figure, the polygon from OVERSEG contains all the others, indicating that OVERSEG had the best performance in this modality.

Figure 18 presents output detection samples for all method applied to the evaluated Inter-Panel forgery modalities with degree of visual sign equal one (Fig. 19).

Fig. 17
figure 17

Comparative Simple Forgery duplication detection output per modality. The purple color represents a pristine/non-suspect region and each other color in the ground-truth and detection maps represents a different ID assigned to each object and its copies. Inside the parenthesis of each method, we insert the \(\text{ F1-score}_{CTP}\) metric in percentage

Fig. 18
figure 18

Comparative Compound Inter-Panel Forgery duplication detection output per modality. The purple color represents a pristine/non-suspect region and each other color in the ground-truth and detection maps represents a different ID assigned to each object and its copies. Inside the parenthesis of each method, we insert the \(\text{ F1-score}_{CTP}\) metric in percentage

Fig. 19
figure 19

Comparative Compound Intra-Panel Forgery duplication detection output per modality. The purple color represents a pristine/non-suspect region and each other color in the ground-truth and detection maps represents a different ID assigned to each object and its copies. Inside the parenthesis of each method, we insert the \(\text{ F1-score}_{CTP}\) metric in percentage

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cardenuto, J.P., Rocha, A. Benchmarking Scientific Image Forgery Detectors. Sci Eng Ethics 28, 35 (2022). https://doi.org/10.1007/s11948-022-00391-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11948-022-00391-4

Keywords

Navigation