Skip to main content
Log in

Quote examiner: verifying quoted images using web-based text similarity

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Over the last few years, there has been a rapid growth in digital data. Images with quotes are spreading virally through online social media platforms. Misquotes found online often spread like a forest fire through social media, which highlights the lack of responsibility of the web users when circulating poorly cited quotes. Thus, it is important to authenticate the content contained in the images being circulated online. So, there is a need to retrieve the information within such textual images to verify quotes before its usage in order to differentiate a fake or misquote from an authentic one. Optical Character Recognition (OCR) is used in this paper, for converting textual images into readable text format, but none of the OCR tools are perfect in extracting information from the images accurately. In this paper, a method of post-processing on the retrieved text to improve the accuracy of the detected text from images has been proposed. Google Cloud Vision has been used for recognizing text from images. It has also been observed that using post-processing on the extracted text improved the accuracy of text recognition by 3.5% approximately. A web-based text similarity approach (URLs and domain name) has been used to examine the authenticity of the content of the quoted images. Approximately, 96.26% accuracy has been achieved in classifying quoted images as verified or misquoted. Also, a ground truth dataset of authentic site names has been created. In this research, images with quotes by famous celebrities and global leaders have been used. A comparative analysis has been performed to show the effectiveness of our proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. AWS rekognition. Available on: https://docs.aws.amazon.com/rekognition /latest/dg/text-detection.html

  2. Bassil Y, Alwani M (2012) Ocr post-processing error correction algorithm using google online spelling suggestion. arXiv preprint arXiv:1204.0191

  3. Du S, Ibrahim M, Shehata M, Badawy W (2012) Automatic license plate recognition (ALPR): a state-of-the-art review. IEEE Trans Circuits Syst Video Technol 23(2):311–325

    Article  Google Scholar 

  4. Dutta S, Sankaran N, Sankar KP, Jawahar CV (2012) Robust Recognition of Degraded Documents Using Character N-Grams, 10th IAPR International Workshop on Document Analysis Systems. Gold Cost, QLD 2012:130–134

    Google Scholar 

  5. S. Dutta, N. Sankaran, K. P. Sankar, C. V. Jawahar. “Robust Recognition of Degraded Documents Using Character N-Grams”. 10th IAPR international workshop on document analysis systems, Gold Cost, QLD, 2012, pp. 130–134.

  6. Geetha M, Pooja RC, Swetha J, Nivedha N, Daniya T (2020) Implementation of text recognition and text extraction on formatted bills using deep learning. Int J Contrl Automat 13(2):646–651

    Google Scholar 

  7. Google Cloud Vision API. Available on: https://cloud.google.com/vision/docs/libraries

  8. Gur E, Zelavsky Z (2012) Retrieval of Rashi semi-cursive handwriting via fuzzy logic. International Conference on Frontiers in Handwriting Recognition, Bari, pp 354–359

    Google Scholar 

  9. Handwritten word dataset. Available on: https://www.kaggle.com/nabeel965/handwritten-words-dataset

  10. IIIT 5K-word dataset. Available on: http://cvit.iiit.ac.in/projects/SceneText Understanding /IIIT5K.html

  11. Joshi N, Jain S, Agarwal A (2017) Segmentation based non local means filter for denoising MRI. 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, pp 640–644

    Google Scholar 

  12. KAIST Scene text dataset. Available on: http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_%20Text_Database

  13. Kushol R, Ahsan I, Raihan MN (2018) An Android-Based Useful Text Extraction Framework Using Image and Natural Language Processing. Int J Comput Theory Eng 10(3):77–83

    Article  Google Scholar 

  14. Manwatkar PM, Yadav SH (2015) Text recognition from images. International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, pp 1–6

    Google Scholar 

  15. Mihailidis P, Viotty S (2017) Spreadable spectacle in digital culture: civic expression, fake news, and the role of media literacies in “post-fact” society. Am Behav Sci 61(4):441–454

    Article  Google Scholar 

  16. Mukherjee A, Venkataraman V, Liu B, Glance N (2013) Fake review detection: Classification and analysis of real and pseudo reviews”. UIC-CS-03-2013. Technical Report

  17. Ntirogiannis K, Gatos B, Pratikakis I (2013) Performance evaluation methodology for historical document image Binarization. IEEE Trans Image Process 22(2):595–609

    Article  MathSciNet  MATH  Google Scholar 

  18. Papapicco C, Quatera I (2019) Do not make to eat to troll!: the dark side of web. Online J Commun Media Technol 9(2):e201910

    Article  Google Scholar 

  19. Quoted image. Available online: https://drive.google.com/open?id=1O9aNCEDowiFpZ6m8ID6mFq5oS_TipFlU

  20. Rajan V, Raj S (2017) Text detection and character extraction in natural scene images using fractional poisson model. International Conference on Computing Methodologies and Communication (ICCMC), Erode, pp 1136–1141

    Google Scholar 

  21. Samarinas C, Tsoumakas G (2018) WAMBy: An information retrieval approach to web-based question answering. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence. ACM. 40:1–8. https://doi.org/10.1145/3200947.3201023

  22. Tripathy A, Agrawal A, Rath SK (2015) Classification of sentimental reviews using machine learning techniques. Procedia Comput Sci 57:821–829

    Article  Google Scholar 

  23. Vaithiyanathan D, Muniraj M (2019) Cloud based text extraction using Google Cloud Vison for visually impaired applications. In 2019 11th international conference on advanced computing (ICoAC) (pp 90–96). IEEE, Chennai. https://doi.org/10.1109/ICoAC48765.2019.246822

  24. Yang J, Wang K, Li J, Jiao J, Xu J (2012) A fast adaptive binarization method for complex scene images. 19th IEEE International Conference on Image Processing, Orlando, pp 1889–1892

    Google Scholar 

Download references

Acknowledgments

This Publication is an outcome of the R&D work undertaken in the project under the Visvesvaraya PhD Scheme of Ministry of Electronics & Information Technology, Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia).

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sneha Banerjee.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Banerjee, S., Kaur, S. & Kumar, P. Quote examiner: verifying quoted images using web-based text similarity. Multimed Tools Appl 80, 12135–12154 (2021). https://doi.org/10.1007/s11042-020-10270-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10270-4

Keywords

Navigation