Quote examiner: verifying quoted images using web-based text similarity

Banerjee, Sneha; Kaur, Sawinder; Kumar, Parteek

doi:10.1007/s11042-020-10270-4

Quote examiner: verifying quoted images using web-based text similarity

Published: 09 January 2021

Volume 80, pages 12135–12154, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Over the last few years, there has been a rapid growth in digital data. Images with quotes are spreading virally through online social media platforms. Misquotes found online often spread like a forest fire through social media, which highlights the lack of responsibility of the web users when circulating poorly cited quotes. Thus, it is important to authenticate the content contained in the images being circulated online. So, there is a need to retrieve the information within such textual images to verify quotes before its usage in order to differentiate a fake or misquote from an authentic one. Optical Character Recognition (OCR) is used in this paper, for converting textual images into readable text format, but none of the OCR tools are perfect in extracting information from the images accurately. In this paper, a method of post-processing on the retrieved text to improve the accuracy of the detected text from images has been proposed. Google Cloud Vision has been used for recognizing text from images. It has also been observed that using post-processing on the extracted text improved the accuracy of text recognition by 3.5% approximately. A web-based text similarity approach (URLs and domain name) has been used to examine the authenticity of the content of the quoted images. Approximately, 96.26% accuracy has been achieved in classifying quoted images as verified or misquoted. Also, a ground truth dataset of authentic site names has been created. In this research, images with quotes by famous celebrities and global leaders have been used. A comparative analysis has been performed to show the effectiveness of our proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Fig. 8

Consolidating Online Real Estate Data Using Image Analysis and Text Processing

Between Image and Text: Automatic Image Processing for Character Recognition in Historical Inscriptions

Recognize Meaningful Words and Idioms from the Images Based on OCR Tesseract Engine and NLTK

References

AWS rekognition. Available on: https://docs.aws.amazon.com/rekognition /latest/dg/text-detection.html
Bassil Y, Alwani M (2012) Ocr post-processing error correction algorithm using google online spelling suggestion. arXiv preprint arXiv:1204.0191
Du S, Ibrahim M, Shehata M, Badawy W (2012) Automatic license plate recognition (ALPR): a state-of-the-art review. IEEE Trans Circuits Syst Video Technol 23(2):311–325
Article Google Scholar
Dutta S, Sankaran N, Sankar KP, Jawahar CV (2012) Robust Recognition of Degraded Documents Using Character N-Grams, 10th IAPR International Workshop on Document Analysis Systems. Gold Cost, QLD 2012:130–134
Google Scholar
S. Dutta, N. Sankaran, K. P. Sankar, C. V. Jawahar. “Robust Recognition of Degraded Documents Using Character N-Grams”. 10th IAPR international workshop on document analysis systems, Gold Cost, QLD, 2012, pp. 130–134.
Geetha M, Pooja RC, Swetha J, Nivedha N, Daniya T (2020) Implementation of text recognition and text extraction on formatted bills using deep learning. Int J Contrl Automat 13(2):646–651
Google Scholar
Google Cloud Vision API. Available on: https://cloud.google.com/vision/docs/libraries
Gur E, Zelavsky Z (2012) Retrieval of Rashi semi-cursive handwriting via fuzzy logic. International Conference on Frontiers in Handwriting Recognition, Bari, pp 354–359
Google Scholar
Handwritten word dataset. Available on: https://www.kaggle.com/nabeel965/handwritten-words-dataset
IIIT 5K-word dataset. Available on: http://cvit.iiit.ac.in/projects/SceneText Understanding /IIIT5K.html
Joshi N, Jain S, Agarwal A (2017) Segmentation based non local means filter for denoising MRI. 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, pp 640–644
Google Scholar
KAIST Scene text dataset. Available on: http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_%20Text_Database
Kushol R, Ahsan I, Raihan MN (2018) An Android-Based Useful Text Extraction Framework Using Image and Natural Language Processing. Int J Comput Theory Eng 10(3):77–83
Article Google Scholar
Manwatkar PM, Yadav SH (2015) Text recognition from images. International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, pp 1–6
Google Scholar
Mihailidis P, Viotty S (2017) Spreadable spectacle in digital culture: civic expression, fake news, and the role of media literacies in “post-fact” society. Am Behav Sci 61(4):441–454
Article Google Scholar
Mukherjee A, Venkataraman V, Liu B, Glance N (2013) Fake review detection: Classification and analysis of real and pseudo reviews”. UIC-CS-03-2013. Technical Report
Ntirogiannis K, Gatos B, Pratikakis I (2013) Performance evaluation methodology for historical document image Binarization. IEEE Trans Image Process 22(2):595–609
Article MathSciNet MATH Google Scholar
Papapicco C, Quatera I (2019) Do not make to eat to troll!: the dark side of web. Online J Commun Media Technol 9(2):e201910
Article Google Scholar
Quoted image. Available online: https://drive.google.com/open?id=1O9aNCEDowiFpZ6m8ID6mFq5oS_TipFlU
Rajan V, Raj S (2017) Text detection and character extraction in natural scene images using fractional poisson model. International Conference on Computing Methodologies and Communication (ICCMC), Erode, pp 1136–1141
Google Scholar
Samarinas C, Tsoumakas G (2018) WAMBy: An information retrieval approach to web-based question answering. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence. ACM. 40:1–8. https://doi.org/10.1145/3200947.3201023
Tripathy A, Agrawal A, Rath SK (2015) Classification of sentimental reviews using machine learning techniques. Procedia Comput Sci 57:821–829
Article Google Scholar
Vaithiyanathan D, Muniraj M (2019) Cloud based text extraction using Google Cloud Vison for visually impaired applications. In 2019 11th international conference on advanced computing (ICoAC) (pp 90–96). IEEE, Chennai. https://doi.org/10.1109/ICoAC48765.2019.246822
Yang J, Wang K, Li J, Jiao J, Xu J (2012) A fast adaptive binarization method for complex scene images. 19th IEEE International Conference on Image Processing, Orlando, pp 1889–1892
Google Scholar

Download references

Acknowledgments

This Publication is an outcome of the R&D work undertaken in the project under the Visvesvaraya PhD Scheme of Ministry of Electronics & Information Technology, Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia).

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Author information

Authors and Affiliations

Department of Computer Science & Engineering, TIET, Patiala, Punjab, India
Sneha Banerjee, Sawinder Kaur & Parteek Kumar

Authors

Sneha Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Sawinder Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Parteek Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sneha Banerjee.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banerjee, S., Kaur, S. & Kumar, P. Quote examiner: verifying quoted images using web-based text similarity. Multimed Tools Appl 80, 12135–12154 (2021). https://doi.org/10.1007/s11042-020-10270-4

Download citation

Received: 17 October 2019
Revised: 20 October 2020
Accepted: 09 December 2020
Published: 09 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10270-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quote examiner: verifying quoted images using web-based text similarity

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Consolidating Online Real Estate Data Using Image Analysis and Text Processing

Between Image and Text: Automatic Image Processing for Character Recognition in Historical Inscriptions

Recognize Meaningful Words and Idioms from the Images Based on OCR Tesseract Engine and NLTK

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Quote examiner: verifying quoted images using web-based text similarity

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Consolidating Online Real Estate Data Using Image Analysis and Text Processing

Between Image and Text: Automatic Image Processing for Character Recognition in Historical Inscriptions

Recognize Meaningful Words and Idioms from the Images Based on OCR Tesseract Engine and NLTK

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation