HP_DocPres: a method for classifying printed and handwritten texts in doctor’s prescription

Dhar, Dibyajyoti; Garain, Avishek; Singh, Pawan Kumar; Sarkar, Ram

doi:10.1007/s11042-020-10151-w

HP_DocPres: a method for classifying printed and handwritten texts in doctor’s prescription

Published: 13 November 2020

Volume 80, pages 9779–9812, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Dibyajyoti Dhar¹,
Avishek Garain¹,
Pawan Kumar Singh ORCID: orcid.org/0000-0002-9598-7981² &
…
Ram Sarkar¹

2813 Accesses
14 Citations
Explore all metrics

Abstract

Optical Character Recognition (OCR) system is used to convert the document images, either printed or handwritten, into its electronic counterpart. But dealing with handwritten texts is much more challenging than printed ones due to the erratic writing style of the individuals. The problem becomes more severe when the input image is a doctor’s prescription. Before feeding such an image to the OCR engine, the classification of printed and handwritten texts is a necessity as a doctor’s prescription contains both handwritten and printed texts which are to be processed separately. Much work has been done in the domain of handwritten and printed text separation albeit work related to doctor’s handwriting. In this paper, a method is proposed which first localizes the position of texts in a doctor’s prescription, and then separates out the printed texts from the handwritten ones. Due to the unavailability of a large database, we have used some standard data (image) augmentation techniques to evaluate as well as to prove the robustness of our method. Besides, we have also designed a Graphical User Interface (GUI) so that anybody can visualize the output by providing a prescription image as input.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simplifying Handwritten Medical Prescription: OCR Approach

Interfered Seals and Handwritten Characters Removal for Prescription Images

Targeted Optical Character Recognition: Classification Using Capsule Network

References

Alard C (2000) Image subtraction using a space-varying kernel. Astron Astrophys Suppl Ser 144(2)
Banerjee S (2013) Identification of handwritten text in machine printed document images. In: Advances in computing and information technology. Springer, pp 823–831
Becker BG (1998) Visualizing decision table classifiers. In: Proceedings of the 1998 IEEE symposium on information visualization, INFOVIS ’98. IEEE Computer Society, USA, pp 102–105
Berwick DM (1996) The truth about doctors’ handwriting: a prospective study. BMJ 313:1657
Article Google Scholar
Bhattacharya R, Malakar S, Ghosh S, Bhowmik S, Sarkar R (2020) Understanding contents of filled-in bangla form images. Multimed Tools Appl 1–42
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Chanda S, Franke K, Pal U (2010) Structural handwritten and machine print classification for sparse content and arbitrary oriented document fragments. In: Proceedings of the 2010 ACM symposium on applied computing, SAC ’10. https://doi.org/10.1145/1774088.1774093. Association for Computing Machinery, New York, NY, USA, pp 18–22
Core, RS. Random tree (rapidminer studio core). https://docs.rapidminer.com/latest/studio/operators/modeling/predictive/trees/random_tree.html
Dhar D, Chakraborty N, Choudhury S, Paul A, Mollah AF, Basu S, Sarkar R (2020) Multilingual scene text detection using gradient morphology. Int J Comput Vis Image Process (IJCVIP) 10(3):31–43
Article Google Scholar
Dutly N, Slimane F, Ingold R (2019) Phti-ws: A printed and handwritten text identification web service based on fcn and crf post-processing. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol 2. IEEE, pp 20–25
Garain A, Dhar D, Singh PK, Sarkar R (2020) Dataset for classification of handwritten and printed text in a doctor’s prescription. https://doi.org/10.21227/5zbc-8g23
Garlapati BM, Chalamala SR (2017) A system for handwritten and printed text classification. In: 2017 UKSim-AMSS 19th international conference on computer modelling & simulation (UKSim). IEEE, pp 50–54
Garlapati BM, Chalamala SR (2017) A system for handwritten and printed text classification. In: 2017 UKSim-AMSS 19th international conference on computer modelling simulation (UKSim), pp 50–54
Ghosh S, Bhattacharya R, Majhi S, Bhowmik S, Malakar S, Sarkar R (2018) Textual content retrieval from filled-in form images. In: Workshop on document analysis and recognition. Springer, pp 27–37
How to create salt and pepper noise in an image. https://www.projectrhea.org/rhea/index.php/How_to_Create_Salt_and_Pepper_Noise_in_an_Image
Hamerly GEC (2003) Learning the k in k-means. Adv Neural Inf Process Syst
Hangarge M, Santosh K, Doddamani S, Pardeshi R (2013) Statistical texture features based handwritten and printed text classification in south indian documents. arXiv:1303.3087
Haralick RM, Katz PL (1995) Model-based morphology: The opening spectrum. Graph Models Image Process 57(1):1–12
Article Google Scholar
Herf M (2005) Method for blurring images in real-time. US Patent 6,925,210
Jawas N, Suciati N (2013) Image inpainting using erosion and dilation operation
Jindal A, Amir M (2014) Automatic classification of handwritten and printed text in icr boxes. In: 2014 IEEE International Advance Computing Conference (IACC). IEEE, pp 1028–1032
Lim KHJ, Yap KB (1999) The prescribing pattern of outpatient polyclinic doctors. Singapore Med J 40(6):416
Google Scholar
Lookup table (2020) https://en.wikipedia.org/wiki/Lookup_table
Malakar S, Das RK, Sarkar R, Basu S, Nasipuri M (2013) Handwritten and printed word identification using gray-scale feature vector and decision tree classifier. Procedia Technol 10:831–839
Article Google Scholar
Mandelbrot BB (1971) A fast fractional gaussian noise generator. Water Resour Res 7(3):543–553
Article Google Scholar
Meta machine learning. https://yandex.github.io/rep/metaml.html
Moghaddam RF, Cheriet M (2012) Adotsu: An adaptive and parameterless generalization of otsu’s method for document image binarization. Pattern Recognit 45(6):2419–2431. https://doi.org/10.1016/j.patcog.2011.12.013. Brain Decoding
Article Google Scholar
Peng X, Setlur S, Govindaraju V, Sitaram R (2010) Overlapped text segmentation using markov random field and aggregation. In: Proceedings of the 9th IAPR international workshop on document analysis systems, pp 129–134
Peng X, Setlur S, Govindaraju V, Sitaram R (2013) Handwritten text separation from annotated machine printed documents using markov random fields. Int J Doc Anal Recognit (IJDAR) 16(1):1–16
Article Google Scholar
Sharma AK, Sahni S (2011) A comparative study of classification algorithms for spam email data analysis. Int J Comput Sci Eng 3(5):1890–1895
Google Scholar
Singh PK, Sarkar R, Nasipuri M (2015) Statistical validation of multiple classifiers over multiple datasets in the field of pattern recognition. Int J Appl Pattern Recognit 2(1):1–23
Article Google Scholar
Singh PK, Sarkar R, Nasipuri M (2016) Significance of non-parametric statistical tests for comparison of classifiers over multiple datasets. Int J Comput Sci Math 7(5):410–442
Article MathSciNet Google Scholar
Sudhakar S (2017) Histogram equalization. https://towardsdatascience.com/histogram-equalization-5d1013626e64
Van Rossum G, Drake FL (2009) Python 3 Reference Manual. CreateSpace, Scotts Valley, CA
Zheng Y, Li H, Doermann D (2004) Machine printed text and handwriting identification in noisy document images. IEEE Trans Pattern Analy Mach Intell 26(3):337–353
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, Kolkata -, 700032, India
Dibyajyoti Dhar, Avishek Garain & Ram Sarkar
Department of Information Technology, Jadavpur University, Kolkata -, 700106, India
Pawan Kumar Singh

Authors

Dibyajyoti Dhar
View author publications
You can also search for this author inPubMed Google Scholar
Avishek Garain
View author publications
You can also search for this author inPubMed Google Scholar
Pawan Kumar Singh
View author publications
You can also search for this author inPubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Pawan Kumar Singh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Annexure

1.1 A.1 Code

1.2 A.2 Classifier Parameters

The parameters used for the Random Forest Classifier are following.

bagSizePercent = 100
batchsize = 100
breakTiesRandomly = False
calcOutOfBag = False
computeAttributeImportance = False
debug = False
doNotCheckCapabilities = False
maxDepth = 0
numDecimalPlaces = 2
numExecutionSlots = 1
numFeatures = 0
numIterations = 100
minVarianceProp = 0.001
outputOutOfBagComplexityStatistics = False
seed = 1
storeOutOfBagPredictions = False

The parameters used for the REPT Classifier are following.

Batchsize = 100
debug = False
doNotCheckCapabilities = False
initialCount = 0.0
maxDepth = -1
minNum = 2.0
minVarianceProp = 0.001
noPruning = False
numDecimalPlaces = 2
numFolds = 3
seed = 1
spreadInitialCount = False

The parameters used for the Random Tree Classifier are following.

kValue = 0
allowUnclassifiedInstances = False
batchsize = 100
debug = False
doNotCheckCapabilities = False
maxDepth = 0
minNum = 1.0
minVarianceProp = 0.001
numDecimalPlaces = 2
numFolds = 0
seed = 1

The parameters used for the Decision Table Classifier are following.

batchsize = 100
debug = False
doNotCheckCapabilities = False
crossVal = 1
displayRules = False
numDecimalPlaces = 2
search = BestFirst (LookUpCache Size = 1, Depth = 5)

The parameters used for the J48 Classifier are following.

batchsize = 100
binarySplits = False
collapseTree = True
confidenceFactor = 0.25
debug = False
doNotCheckCapabilities = False
doNotMakeSplitPointActionValue = False
minNumObj = 2
numFolds = 3
reducedErrorPruning = False
saveInstanceData = False
seed = 1
subtreeRaising = True
unpruned = False
useLaplace = False
useMDLcorrection = True

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dhar, D., Garain, A., Singh, P.K. et al. HP_DocPres: a method for classifying printed and handwritten texts in doctor’s prescription. Multimed Tools Appl 80, 9779–9812 (2021). https://doi.org/10.1007/s11042-020-10151-w

Download citation

Received: 02 June 2020
Revised: 19 October 2020
Accepted: 23 October 2020
Published: 13 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10151-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HP_DocPres: a method for classifying printed and handwritten texts in doctor’s prescription

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Simplifying Handwritten Medical Prescription: OCR Approach

Interfered Seals and Handwritten Characters Removal for Prescription Images

Targeted Optical Character Recognition: Classification Using Capsule Network

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Annexure

Annexure

1.1 A.1 Code

1.2 A.2 Classifier Parameters

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now