Common Sense Knowledge for Handwritten Chinese Text Recognition

Wang, Qiu-Feng; Cambria, Erik; Liu, Cheng-Lin; Hussain, Amir

doi:10.1007/s12559-012-9183-y

Common Sense Knowledge for Handwritten Chinese Text Recognition

Published: 23 August 2012

Volume 5, pages 234–242, (2013)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Qiu-Feng Wang¹,
Erik Cambria²,
Cheng-Lin Liu¹ &
…
Amir Hussain³

787 Accesses
Explore all metrics

Abstract

Compared to human intelligence, computers are far short of common sense knowledge which people normally acquire during the formative years of their lives. This paper investigates the effects of employing common sense knowledge as a new linguistic context in handwritten Chinese text recognition. Three methods are introduced to supplement the standard n-gram language model: embedding model, direct model, and an ensemble of these two. The embedding model uses semantic similarities from common sense knowledge to make the n-gram probabilities estimation more reliable, especially for the unseen n-grams in the training text corpus. The direct model, in turn, considers the linguistic context of the whole document to make up for the short context limit of the n-gram model. The three models are evaluated on a large unconstrained handwriting database, CASIA-HWDB, and the results show that the adoption of common sense knowledge yields improvements in recognition performance, despite the reduced concept list hereby employed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying Context to Handwritten Character Recognition

Text classification using embeddings: a survey

Article 26 March 2023

Integrating Character Representations into Chinese Word Embedding

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

In the case of unconstrained texts, no corpus is wide enough to contain all possible n-grams.
In Chinese, a word can comprises one or multiple characters, which can explore both syntactic and semantic meaning better than a character.
High-order n-gram models need much larger training corpus and higher cost of computation and memory, n usually takes no more than 5 in practice.

References

Liu H, Singh P. ConceptNet—a practical commonsense reasoning tool-kit. BT Tech J (2004); 22(4):211–26.
Article CAS Google Scholar
Lenat D, Guha R. Building large knowledge-based systems: representation and inference in the Cyc project. Reading: Addison-Wesley; (1989).
Google Scholar
Speer R, Havasi C. Representing general relational knowledge in ConceptNet 5. In: International conference on language resources and evaluation (LREC); 2012. p. 3679–86.
Cambria E, Havasi C, Hussain A. SenticNet 2: a semantic and affective resource for opinion mining and sentiment analysis. In: Florida artificial intelligence research society conference (FLAIRS); 2012. p. 202–7.
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J. Freebase: A collaboratively created graph database for structuring human knowledge. In: SIGMOD Conference; 2008. pp. 1247–50.
Cambria E, Song Y, Wang H, Hussain A. Isanette: a common and common sense knowledge base for opinon mining. In: International conference on data mining; 2011. p. 315–22.
Hsu MH, Chen HH. Information retrieval with commonsense knowledge. In: ACM SIGIR conference on research and development in information retrieval; 2006. p. 651–2.
Cambria E, Hussain A. Sentic computing: techniques, tools, and applications. Berlin: SpringerBriefs in Cognitive Computation, Springer (2012).
Book Google Scholar
Lieberman H, Faaborg A, Daher W, Espinosa J. How to wreck a nice beach you sing calm incense. In: International conference on intelligent user interfaces; 2005.
Cambria E, Hussain A. Sentic Album: content-, concept-, and context-based online personal photo management system. In: Cognitive Computation, Springer, Berlin/Heidelberg; 2012 (in press).
Stocky T, Faaborg A, Lieberman H. A commonsense approach to predictive text entry. In: International conference on human factors in computing systems; 2004.
Cambria E, Benson T, Eckl C, Hussain A. Sentic PROMs: application of sentic computing to the development of a novel unified framework for measuring health-care quality. Expert Syst Appl (2012); 39(12):10533–43.
Article Google Scholar
Dai RW, Liu CL, Xiao BH. Chinese character recognition: history, status and prospects. Front Comput Sci China (2007); 1(2):126–36.
Article Google Scholar
Tang HS, Augustin E, Suen CY, Baret O, Cheriet M. Spiral recognition methodology and its application for recognition of Chinese bank checks. In: International workshop on frontiers in handwriting recognition (IWFHR); 2004. p. 263–8.
Wang CH, Hotta Y, Suwa M, Naoi S. Handwritten Chinese address recognition. In: International workshop on frontiers in handwriting recognition (IWFHR); 2004. p. 539–44.
Su TH, Zhang TW, Guan DJ, Huang HJ. Off-line recognition of realistic chinese handwriting using segmentation-free strategy. Pattern Recogn (2009); 42(1):167–82.
Article Google Scholar
Li NX, Jin LW. A Bayesian-based probabilistic model for unconstrained handwritten offline Chinese text line recognition. In: International conference on systems man and cybernetics (SMC), 2010. p. 3664–68.
Wang QF, Yin F, Liu CL. Handwritten Chinese text recognition by integrating multiple contexts. IEEE Trans Pattern Anal Mach Intell (2012); 34(8):1469–81.
Article PubMed Google Scholar
Li YX, Tan CL, Ding XQ. A hybrid post-processing system for offline handwritten Chinese script recognition. pattern analysis and applications (2005); 8(3):272–86.
Article Google Scholar
Rosenfeld R. Two decades of statistical language modeling: where do we go from here? Proc IEEE (2000); 88(8):1270–8.
Article Google Scholar
Goodman JT. A bit of progress in language modeling: extended version. Technical Report MSR-TR-2001-72, Microsoft Research 2001.
Liu CL, Yin F, Wang DH, Wang QF. CASIA online and offline Chinese handwriting databases. In: International conference on document analysis and recognition (ICDAR); 2011. p. 37–41.
Wang QF, Yin F, Liu CL. Integrating language model in handwritten Chinese text recognition. In: International conference on document analysis and recognition (ICDAR); 2009. p. 1036–40.
Wang QF, Yin F, Liu CL. Improving handwritten Chinese text recognition by unsupervised language model adaptation. In: International workshop on document analysis systems (DAS); 2012. p. 110–4.
Siu M, Mari O. Variable n-grams and extensions for conversational speech language modeling. IEEE Trans Speech Audio Process (2000); 8:63–75.
Article Google Scholar
Kuhn R, De Mori R. A cache-based natural language model for speech reproduction. IEEE Trans Pattern Anal Mach Intell (1990); 12(6):570–83.
Article Google Scholar
Kuhn R, De Mori R. Correction to a cache-based natural language model for speech reproduction. IEEE Trans Pattern Anal Mach Intell (1992); 14(6):691–2.
Google Scholar
Coccaro N, Jurafsky D. Towards better integration of semantic predictors in statistical language modeling. In: International conference on spoken language processing (ICSLP), 1998. p. 2403–6.
Bellegarda JR. A multispan language modeling framework for large vocabulary speech recognition. IEEE Trans Speech Audio Process (1998); 6(5):456–467.
Article Google Scholar
Bellegarda JR. Exploiting latent semantic information in statistical language modeling. Proc IEEE (2000); 88(8):1279–96.
Article Google Scholar
Mrva D, Woodland PC. Unsupervised Language Model Adaptation for Mandarin Broadcast Conversation Transcription. In: Interspeech, 2006. pp. 2206–9.
Cambria E, Grassi M, Hussain A, Havasi C. Sentic computing for social media marketing. Multimed Tools Appl (2012); 59(2):557–77.
Article Google Scholar
Cambria E, Olsher D, Kwok K. Sentic activation: a two-level affective common sense reasoning framework. In: Association for the Advancement of Artificial Intelligence (AAAI); 2012.
Martin S, Liermann J, Ney H. Algorithms for bigram and trigram word clustering. Speech Commun (1998); 24(1):19–37.
Article Google Scholar
Manning CD, Schutze H. Foundations of statistical natural language processing. 2nd ed. The MIT Press, Cambridge (1999).
Google Scholar
Lieberman H, Liu H, Singh H, Barry B. Beating common sense into interactive applications. AI Mag (2004); 25(4):63–76.
Google Scholar
Speer R, Havasi C, Lieberman H. AnalogySpace: reducing the dimensionality of common sense knowledge. In: Association for the Advancement of Artificial Intelligence (AAAI). 2008. p. 548–53.
Chen SF, Goodman J (1998) An empirical study of smoothing techniques for language modeling. Cambridge, Massachusetts: Computer Science Group, Harvard University, TR-10-98.
Katz SM. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans Acoustics Speech Signal Process (1987); 35(3):400–1.
Article Google Scholar
Kimura F, Takashina K, Tsuruoka S, Miyake Y. modified quadratic discriminant functions and the application to Chinese character recognition. IEEE Trans Pattern Anal Mach Intell (1987); 9(1):149–53.
Article PubMed CAS Google Scholar

Download references

Acknowledgments

This work has been supported in part by the National Basic Research Program of China (973 Program) Grant 2012CB316302, the National Natural Science Foundation of China (NSFC) Grants 60825301 and 60933010, and the Royal Society of Edinburgh (UK) and the Chinese Academy of Sciences within the China-Scotland SIPRA (Signal Image Processing Research Academy) Programme. The authors would like to thank Jia-jun Zhang for his aid in the machine translation process.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences, Beijing, 100190, People’s Republic of China
Qiu-Feng Wang & Cheng-Lin Liu
Temasek Laboratories, National University of Singapore, Singapore, 117411, Singapore
Erik Cambria
Department of Computing Science and Mathematics, University of Stirling, Stirling, FK9 4LA, UK
Amir Hussain

Authors

Qiu-Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Erik Cambria
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Amir Hussain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiu-Feng Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, QF., Cambria, E., Liu, CL. et al. Common Sense Knowledge for Handwritten Chinese Text Recognition. Cogn Comput 5, 234–242 (2013). https://doi.org/10.1007/s12559-012-9183-y

Download citation

Received: 30 April 2012
Accepted: 09 August 2012
Published: 23 August 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s12559-012-9183-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Common Sense Knowledge for Handwritten Chinese Text Recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Applying Context to Handwritten Character Recognition

Text classification using embeddings: a survey

Integrating Character Representations into Chinese Word Embedding

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Common Sense Knowledge for Handwritten Chinese Text Recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Applying Context to Handwritten Character Recognition

Text classification using embeddings: a survey

Integrating Character Representations into Chinese Word Embedding

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation