research-article

Syntactic Pattern Recognition in Computer Vision: A Systematic Review

Authors:
Gilberto Astolfi

College of Computing, Federal University of Mato Grosso do Sul (UFMS), Brazil and Federal Institute of Education, Science and Technology of Mato Grosso do Sul (IFMS), Brazil

College of Computing, Federal University of Mato Grosso do Sul (UFMS), Brazil and Federal Institute of Education, Science and Technology of Mato Grosso do Sul (IFMS), Brazil

0000-0003-2565-1822
View Profile

,
Fábio Prestes Cesar Rezende

Universidade Católica Dom Bosco (UCDB), Brazil

Universidade Católica Dom Bosco (UCDB), Brazil

0000-0003-3208-757X
View Profile

,
João Vitor De Andrade Porto

Universidade Católica Dom Bosco (UCDB), Brazil

Universidade Católica Dom Bosco (UCDB), Brazil

0000-0002-4766-3675
View Profile

,
Edson Takashi Matsubara

College of Computing, Federal University of Mato Grosso do Sul (UFMS), Brazil

College of Computing, Federal University of Mato Grosso do Sul (UFMS), Brazil

0000-0002-4471-0886
View Profile

,
Hemerson Pistori

Universidade Católica Dom Bosco (UCDB), Brazil and College of Computing, Federal University of Mato Grosso do Sul (UFMS), Brazil

Universidade Católica Dom Bosco (UCDB), Brazil and College of Computing, Federal University of Mato Grosso do Sul (UFMS), Brazil

0000-0001-8181-760X
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 54 Issue 3Article No.: 65pp 1–35https://doi.org/10.1145/3447241

Published:17 April 2021Publication History

ACM Computing Surveys

Abstract

Using techniques derived from the syntactic methods for visual pattern recognition is not new and was much explored in the area called syntactical or structural pattern recognition. Syntactic methods have been useful because they are intuitively simple to understand and have transparent, interpretable, and elegant representations. Their capacity to represent patterns in a semantic, hierarchical, compositional, spatial, and temporal way have made them very popular in the research community. In this article, we try to give an overview of how syntactic methods have been employed for computer vision tasks. We conduct a systematic literature review to survey the most relevant studies that use syntactic methods for pattern recognition tasks in images and videos. Our search returned 597 papers, of which 71 papers were selected for analysis. The results indicated that in most of the studies surveyed, the syntactic methods were used as a high-level structure that makes the hierarchical or semantic relationship among objects or actions to perform the most diverse tasks.

References

Nosheen Abid, Adnan ul Hasan, and Faisal Shafait. 2018. DeepParse: A trainable postal address parser. In Proceedings of the Conference on Digital Image Computing: Techniques and Applications (DICTA’18). IEEE, 1--8.Google ScholarCross Ref
Francisco Álvaro, Joan-Andreu Sánchez, and José-Miguel Benedí. 2014. Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Pattern Recog. Lett. 35 (2014), 58--67. DOI:https://doi.org/10.1016/j.patrec.2012.09.023Google ScholarDigital Library
Francisco Álvaro, Joan-Andreu Sánchez, and José-Miguel Benedí. 2016. An integrated grammar-based approach for mathematical expression recognition. Pattern Recog. 51 (2016), 135--147.Google ScholarDigital Library
Alexander Andreopoulos and John K. Tsotsos. 2013. 50 Years of object recognition: Directions forward. Comput. Vis. Image Underst. 117, 8 (2013), 827--891. DOI:https://doi.org/10.1016/j.cviu.2013.04.005Google ScholarCross Ref
Gilberto Astolfi, Marcio Carneiro Brito Pache, Geazy Vilharva Menezes, Adair da Silva Oliveira Junior, Gabriel Kirsten Menezes, Vanessa Aparecida Moares de Weber, Everton Castelão Tetila, Nícolas Alessandro de Souza Belete, Edson Takashi Matsubara, and Hemerson Pistori. 2020. Combining syntactic methods with LSTM to classify soybean aerial images. IEEE Geosci. Rem. Sens. Lett. 1, 1 (2020), 1--5. DOI:https://doi.org/10.1109/lgrs.2020.3014938Google Scholar
Kaouther Khazri Ayeb, Afef Kacem Echi, and Abdel Belaïd. 2015. A syntax directed system for the recognition of printed Arabic mathematical formulas. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR’15). IEEE, 186--190. DOI:https://doi.org/10.1109/ICDAR.2015.7333749Google Scholar
Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc J. Van Gool. 2008. Speeded-Up robust features (SURF). Comput. Vis. Image Underst. 110, 3 (June 2008), 346--359. DOI:https://doi.org/10.1016/j.cviu.2007.09.014Google ScholarDigital Library
Andrew Blake, Pushmeet Kohli, and Carsten Rother. 2011. Markov Random Fields for Vision and Image Processing. The MIT Press, Cambridge, MA.Google Scholar
Alexandre Boulch, Simon Houllier, Renaud Marlet, and Olivier Tournaire. 2013. Semantizing complex 3D scenes using constrained attribute grammars. In Proceedings of the 11th Eurographics/ACMSIGGRAPH Symposium on Geometry Processing (SGP’13). Eurographics Association, 33--42. DOI:https://doi.org/10.1111/cgf.12170Google ScholarDigital Library
Lubomir Bourdev, Subhransu Maji, Thomas Brox, and Jitendra Malik. 2010. Detecting people using mutually consistent poselet activations. In Proceedings of the 11th European Conference on Computer Vision (ECCV’10). Springer-Verlag, Berlin, 168--181. Retrieved from http://dl.acm.org/citation.cfm?idequals;1888212.1888227.Google ScholarCross Ref
Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng. 2011. Handbook of Markov Chain Monte Carlo. CRC Press, Boca Raton, FL. Retrieved from https://books.google.com.br/books?idequals;qfRsAIKZ4rIC.Google Scholar
Gaurav Chanda and Frank Dellaert. 2004. Grammatical Methods in Computer Vision: An Overview. Technical Report GIT-GVU-04-29. Georgia Institute of Technology. Retrieved from https://www.cc.gatech.edu/gvu/reports/2004/abstracts/04-29.html.Google Scholar
Tae Eun Choe, Hongli Deng, Feng Guo, Mun Wai Lee, and Niels Haering. 2013. Semantic video-to-video search using sub-graph grouping and matching. In Proceedings of the IEEE International Conference on Computer Vision Workshops. IEEE, 787--794. DOI:https://doi.org/10.1109/ICCVW.2013.108Google ScholarDigital Library
Jeroen Chua and Pedro F. Felzenszwalb. 2016. Scene grammars, factor graphs, and belief propagation. CoRR abs/1606.01307 (2016), 1--46.Google Scholar
Nicholas Dahm, Yongsheng Gao, Terry Caelli, and Horst Bunke. 2013. Matching non-aligned objects using a relational string-graph. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 3394--3398. DOI:https://doi.org/10.1109/ICIP.2013.6738700Google ScholarCross Ref
Lluís-Pere de las Heras, Oriol Ramos Terrades, and Josep Lladós. 2015. Attributed graph grammar for floor plan analysis. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR’15). IEEE, 726--730. DOI:https://doi.org/10.1109/ICDAR.2015.7333857Google ScholarDigital Library
Ilke Demir, Daniel G. Aliaga, and Bedrich Benes. 2015. Procedural editing of 3D building point clouds. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 2147--2155. DOI:https://doi.org/10.1109/ICCV.2015.248Google ScholarDigital Library
Vincenzo Deufemia, Michele Risi, and Genoveffa Tortora. 2014. Sketched symbol recognition using latent-dynamic conditional random fields and distance-based clustering. Pattern Recog. 47, 3 (2014), 1159--1171. DOI:https://doi.org/10.1016/j.patcog.2013.09.016Google ScholarDigital Library
Murray Eden. 1961. On the formalization of handwriting. Amer. Math. Soc. Appl. Math Symp. 12 (1961), 83--88.Google ScholarCross Ref
Haoshu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2018. Learning pose grammar to encode human body configuration for 3D pose estimation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI’18), the 30th innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 6821--6828.Google Scholar
Weiguo Feng, Rui Liu, and Ming Zhu. 2014. Fall detection for elderly person care in a vision-based home surveillance environment using a monocular camera. Sig. Image Vid. Proc. 8, 6 (2014), 1129--1138. DOI:https://doi.org/10.1007/s11760-014-0645-4Google ScholarCross Ref
G. Ferber. 1986. Classifying and validating intermittent EEG patterns with syntactic methods. Pattern Recog. 19, 4 (1986), 289--295. DOI:https://doi.org/10.1016/0031-3203(86)90054-3Google ScholarDigital Library
Amy Fire and Song-Chun Zhu. 2017. Inferring hidden statuses and actions in video by causal reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). IEEE, 48--56. DOI:https://doi.org/10.1109/CVPRW.2017.13Google ScholarCross Ref
Mariusz Flasiński and Janusz Jurek. 2014. Fundamental methodological issues of syntactic pattern recognition. Pattern Anal. Applic. 17, 3 (01 Aug. 2014), 465--480. DOI:https://doi.org/10.1007/s10044-013-0322-1Google Scholar
G. D. Forney. 2001. Codes on graphs: Normal realizations. IEEE Trans. Inf. Theor. 47, 2 (Feb. 2001), 520--548. DOI:https://doi.org/10.1109/18.910573Google ScholarDigital Library
David A. Forsyth and Jean Ponce. 2002. Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference, Upper Saddle River, NJ.Google ScholarDigital Library
King-Sun Fu and A. Rosenfeld. 1976. Pattern recognition and image processing. IEEE Trans. Comput. C-25, 12 (Dec. 1976), 1336--1346. DOI:https://doi.org/10.1109/TC.1976.1674602Google Scholar
Raghudeep Gadde, Renaud Marlet, and Nikos Paragios. 2016. Learning grammars for architecture-specific facade parsing. Int. J. Comput. Vis. 117, 3 (May 2016), 290--316. DOI:https://doi.org/10.1007/s11263-016-0887-4Google ScholarDigital Library
Zoubin Ghahramani. 2001. An introduction to hidden Markov models and Bayesian networks. Int. J. Pattern Recog. Artif. Intell. 15, 01 (2001), 9--42. DOI:https://doi.org/10.1142/S0218001401000836Google ScholarCross Ref
Josep M. Gonfaus, Marco Pedersoli, Jordi González, Andrea Vedaldi, and F. Xavier Roca. 2015. Factorized appearances for object detection. Comput. Vis. Image Underst. 138 (2015), 92--101. DOI:https://doi.org/10.1016/j.cviu.2015.04.008Google ScholarDigital Library
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the International Conference on Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672--2680.Google Scholar
Klaus Greff, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28, 10 (Oct. 2017), 2222--2232. DOI:https://doi.org/10.1109/TNNLS.2016.2582924Google ScholarCross Ref
Christian Hentschel and Harald Sack. 2014. Does one size really fit all?: Evaluating classifiers in bag-of-visual-words classification. In Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business. ACM, New York, NY.Google ScholarDigital Library
Geoffrey Hinton, Sara Sabour, and Nicholas Frosst. 2018. Matrix capsules with EM routing. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). ICLR, 1--15.Google Scholar
Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. 2011. Transforming auto-encoders. In Lecture Notes in Computer Science. Springer Berlin, 44--51. DOI:https://doi.org/10.1007/978-3-642-21735-7_6Google ScholarDigital Library
Satoshi Ikehata, Hang Yang, and Yasutaka Furukawa. 2015. Structured indoor modeling. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 1323--1331. DOI:https://doi.org/10.1109/ICCV.2015.156Google ScholarDigital Library
Phillip Isola and Ce Liu. 2013. Scene collaging: Analysis and synthesis of natural images with semantic layers. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). IEEE, Washington, DC, 3048--3055. DOI:https://doi.org/10.1109/ICCV.2013.457Google ScholarDigital Library
Tommi S. Jaakkola and David Haussler. 1999. Exploiting generative models in discriminative classifiers. In Proceedings of the Conference on Advances in Neural Information Processing Systems. The MIT Press, Cambridge, MA, 487--493. Retrieved from http://dl.acm.org/citation.cfm?idequals;340534.340715.Google Scholar
A. K. Jain, R. P. W. Duin, and Jianchang Mao. 2000. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1 (Jan. 2000), 4--37. DOI:https://doi.org/10.1109/34.824819Google ScholarDigital Library
Ahsan Jalal, Ahmad Salman, Ajmal Mian, Mark Shortis, and Faisal Shafait. 2020. Fish detection and species classification in underwater environments using deep learning with temporal information. Ecol. Inform. 57 (May 2020), 101088. DOI:https://doi.org/10.1016/j.ecoinf.2020.101088Google Scholar
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). Association for Computing Machinery, New York, NY, 675--678. DOI:https://doi.org/10.1145/2647868.2654889Google ScholarDigital Library
Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, and Song-Chun Zhu. 2018. Configurable 3D scene synthesis and 2D image rendering with per-pixel ground truth using stochastic grammars. Int. J. Comput. Vis. 126, 9 (June 2018), 920--941.Google ScholarDigital Library
Yunsheng Jiang and Jinwen Ma. 2015. Combination features and models for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 240--248.Google Scholar
Frank D. Julca-Aguilar, Harold Mouchère, Christian Viard-Gaudin, and Nina S. T. Hirata. 2017. A general framework for the recognition of online handwritten graphics. CoRR abs/1709.06389 (2017), 1--14.Google Scholar
Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, and Ali Farhadi. 2016. A diagram is worth a dozen images. In Computer Vision -- ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 235--251.Google ScholarCross Ref
Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14). The MIT Press, Cambridge, MA, 3581--3589.Google Scholar
Russell A. Kirsch. 1964. Computer interpretation of English text and picture patterns. IEEE Trans. Electron. Comput. EC-13, 4 (Aug. 1964), 363--376. DOI:https://doi.org/10.1109/PGEC.1964.263816Google ScholarCross Ref
Barbara Kitchenham and Stuart Charters. 2007. Guidelines for Performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE 2007-001. Keele University and Durham University Joint Report. Retrieved from http://www.dur.ac.uk/ebse/resources/Systematic-reviews-5-8.pdf.Google Scholar
W. W. Kong and Surendra Ranganath. 2014. Towards subject independent continuous sign language recognition: A segment and merge approach. Pattern Recog. 47, 3 (2014), 1294--1308. DOI:https://doi.org/10.1016/j.patcog.2013.09.014Google ScholarDigital Library
Adam Kortylewski, Aleksander Wieczorek, Mario Wieser, Clemens Blumer, Sonali Parbhoo, Andreas Morel-Forster, Volker Roth, and Thomas Vetter. 2019. Greedy structure learning of hierarchical compositional models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). Computer Vision Foundation/IEEE, 11612--11621. DOI:https://doi.org/10.1109/CVPR.2019.01188Google ScholarCross Ref
Mateusz Koziński, Raghudeep Gadde, Sergey Zagoruyko, Guillaume Obozinski, and Renaud Marlet. 2015. A MRF shape prior for facade parsing with occlusions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 2820--2828. DOI:https://doi.org/10.1109/CVPR.2015.7298899Google ScholarCross Ref
Mateusz Koziński and Renaud Marlet. 2014. Image parsing with graph grammars and Markov Random Fields applied to facade analysis. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. IEEE, 729--736. DOI:https://doi.org/10.1109/WACV.2014.6836030Google ScholarCross Ref
Mateusz Koziński, Guillaume Obozinski, and Renaud Marlet. 2015. Beyond procedural facade parsing: Bidirectional alignment via linear programming. In Computer Vision -- ACCV 2014, Daniel Cremers, Ian Reid, Hideo Saito, and Ming-Hsuan Yang (Eds.). Springer International Publishing, Cham, 79--94.Google Scholar
Volker Krüger and Dennis Herzog. 2013. Tracking in object action space. Comput. Vis. Image Underst. 117, 7 (2013), 764--789. DOI:https://doi.org/10.1016/j.cviu.2013.02.002Google ScholarDigital Library
Hilde Kuehne, Juergen Gall, and Thomas Serre. 2016. An end-to-end generative framework for video segmentation and recognition. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’16). IEEE, 1--8. DOI:https://doi.org/10.1109/WACV.2016.7477701Google ScholarCross Ref
Hilde Kuehne, Alexander Richard, and Juergen Gall. 2017. Weakly supervised learning of actions from transcripts. Comput. Vis. Image Underst. 163 (2017), 78--89. DOI:https://doi.org/10.1016/j.cviu.2017.06.004Google ScholarDigital Library
Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, New York, NY, 2169--2178. DOI:https://doi.org/10.1109/CVPR.2006.68Google ScholarDigital Library
T. Hoang Ngan Le, ChenChen Zhu, Yutong Zheng, Khoa Luu, and Marios Savvides. 2017. DeepSafeDrive: A grammar-aware driver parsing approach to Driver Behavioral Situational Awareness (DB-SAW). Pattern Recog. 66 (2017), 229--238. DOI:https://doi.org/10.1016/j.patcog.2016.11.028Google ScholarDigital Library
Kyuhwa Lee, Dimitri Ognibene, Hyung Jin Chang, Tae-Kyun Kim, and Yiannis Demiris. 2015. STARE: Spatio-temporal attention relocation for multiple structured activities detection. IEEE Trans. Image Proc. 24, 12 (Dec. 2015), 5916--5927. DOI:https://doi.org/10.1109/TIP.2015.2487837Google ScholarDigital Library
Eduardo Lemus, Ernesto Bribiesca, and Edgar Garduno. 2015. Surface trees Representation of boundary surfaces using a tree descriptor. J. Vis. Commun. Image Represent. 31 (2015), 101--111. DOI:https://doi.org/10.1016/j.jvcir.2015.06.004Google ScholarDigital Library
Bo Li, Yaobin Chen, and Fei-Yue Wang. 2015. Pedestrian detection based on clustered poselet models and hierarchical and-or grammar. IEEE Trans. Vehic. Technol. 64, 4 (Apr. 2015), 1435--1444. DOI:https://doi.org/10.1109/TVT.2014.2331314Google ScholarCross Ref
Bo Li, Xi Song, Tianfu Wu, Wenze Hu, and Mingtao Pei. 2014. Coupling-and-decoupling: A hierarchical model for occlusion-free object detection. Pattern Recog. 47, 10 (2014), 3254--3264. DOI:https://doi.org/10.1016/j.patcog.2014.04.016Google ScholarCross Ref
Xilai Li, Xi Song, and Tianfu Wu. 2019. AOGNets: Compositional grammatical architectures for deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, 6220--6230.Google ScholarCross Ref
Xilai Li, Tianfu Wu, Xi Song, and Hamid Krim. 2017. AOGNets: Deep AND-OR grammar networks for visual recognition. CoRR abs/1711.05847 (2017), 1--12.Google Scholar
Li Liu, Shu Wang, Yuxin Peng, Zigang Huang, Ming Liu, and Bin Hu. 2016. Mining intricate temporal rules for recognizing complex activities of daily living under uncertainty. Pattern Recog. 60 (2016), 1015--1028. DOI:https://doi.org/10.1016/j.patcog.2016.07.024Google ScholarDigital Library
Xianming Liu, Rongrong Ji, Changhu Wang, Wei Liu, Bineng Zhong, and Thomas S. Huang. 2015. Understanding image structure via hierarchical shape parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 5042--5050. DOI:https://doi.org/10.1109/CVPR.2015.7299139Google Scholar
Xiaobai Liu, Yuanlu Xu, Lei Zhu, and Yadong Mu. 2018. A stochastic attribute grammar for robust cross-view human tracking. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (Oct. 2018), 2884--2895. DOI:https://doi.org/10.1109/TCSVT.2017.2781738Google Scholar
Xiaobai Liu, Yibiao Zhao, and Song-Chun Zhu. 2014. Single-view 3D scene parsing by attributed grammar. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 684--691. DOI:https://doi.org/10.1109/CVPR.2014.93Google ScholarDigital Library
Xiaobai Liu, Yibiao Zhao, and Song-Chun Zhu. 2018. Single-view 3D scene reconstruction and parsing by attribute grammar. IEEE Trans. Pattern Anal. Mach. Intell. 40, 3 (Mar. 2018), 710--725. DOI:https://doi.org/10.1109/TPAMI.2017.2689007Google ScholarCross Ref
Yang Lu, Tianfu Wu, and Song-Chun Zhu. 2014. Online object tracking, learning, and parsing with and-or graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3462--3469. DOI:https://doi.org/10.1109/CVPR.2014.443Google ScholarDigital Library
Andelo Martinovic and Luc Van Gool. 2013. Early Parsing for 2D Stochastic Context Free Grammars. Technical Report KUL/ESAT/PSI/1301. Department of Electrical Engineering (ESAT), University Hospital Gasthuisberg, Kasteelpark Arenberg, België.Google Scholar
Andelo Martinovic and Luc Van Gool. 2013. Bayesian grammar learning for inverse procedural modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE Computer Society, Washington, DC, 201--208. DOI:https://doi.org/10.1109/CVPR.2013.33Google ScholarDigital Library
Lilyana Mihalkova, Tuyen Huynh, and Raymond J. Mooney. 2007. Mapping and revising Markov logic networks for transfer learning. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI’07). AAAI Press, 608--614. Retrieved from http://dl.acm.org/citation.cfm?idequals;1619645.1619743.Google Scholar
Darnell Moore and Irfan Essa. 2002. Recognizing multitasked activities from video using stochastic context-free grammar. In Proceedings of the 18th National Conference on Artificial Intelligence. American Association for Artificial Intelligence, 770--776.Google Scholar
Louis-Philippe Morency, Ariadna Quattoni, and Trevor Darrell. 2007. Latent-dynamic discriminative models for continuous gesture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8. DOI:https://doi.org/10.1109/CVPR.2007.383299Google ScholarCross Ref
R. Narasimhan. 1962. A Linguistic Approach to Pattern Recognition. Technical Report 121. Digital Computer Laboratory, University of Illinois, Urbana, IL.Google Scholar
Andrew Y. Ng and Michael I. Jordan. 2001. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). The MIT Press, Cambridge, MA, 841--848.Google Scholar
Andrew Y. Ng and Michael I. Jordan. 2001. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). The MIT Press, Cambridge, MA, 841--848.Google Scholar
T. Ojala, M. Pietikainen, and D. Harwood. 1994. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In Proceedings of 12th International Conference on Pattern Recognition. IEEE, 582--585. DOI:https://doi.org/10.1109/ICPR.1994.576366Google ScholarCross Ref
Eray Özkural. 2014. An application of stochastic context sensitive grammar induction to transfer learning. In Artificial General Intelligence, Ben Goertzel, Laurent Orseau, and Javier Snaider (Eds.). Springer International Publishing, Cham, 121--132.Google Scholar
Seyoung Park, Bruce Xiaohan Nie, and Song-Chun Zhu. 2018. Attribute and-or grammar for joint parsing of human pose, parts and attributes. IEEE Trans. Pattern Anal. Mach. Intell. 40, 7 (July 2018), 1555--1569. DOI:https://doi.org/10.1109/TPAMI.2017.2731842Google ScholarCross Ref
Seyoung Park and Song-Chun Zhu. 2015. Attributed grammars for joint estimation of human attributes, part and pose. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 2372--2380. DOI:https://doi.org/10.1109/ICCV.2015.273Google ScholarDigital Library
Ricardo Wandré Dias Pedro, Fátima L. S. Nunes, and Ariane Machado-Lima. 2013. Using grammars for pattern recognition in images: A systematic review. ACM Comput. Surv. 46, 2 (Nov. 2013). DOI:https://doi.org/10.1145/2543581.2543593Google Scholar
Mingtao Pei, Zhangzhang Si, Benjamin Z. Yao, and Song-Chun Zhu. 2013. Learning and parsing video events with goal and intent prediction. Comput. Vis. Image Underst. 117, 10 (Oct. 2013), 1369--1383. DOI:https://doi.org/10.1016/j.cviu.2012.12.003Google ScholarDigital Library
John L. Pfaltz and Azriel Rosenfeld. 1969. Web grammars. In Proceedings of the 1st International Joint Conference on Artificial Intelligence (IJCAI’69). Morgan Kaufmann Publishers Inc., San Francisco, CA, 609--619. Retrieved from http://dl.acm.org/citation.cfm?idequals;1624562.1624616.Google Scholar
Hamed Pirsiavash and Deva Ramanan. 2014. Parsing videos of actions with segmental grammars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE Computer Society, Washington, DC, 612--619. DOI:https://doi.org/10.1109/CVPR.2014.85Google ScholarDigital Library
Hemerson Pistori, Andrew Calway, and Peter Flach. 2013. A new strategy for applying grammatical inference to image classification problems. In Proceedings of the IEEE International Conference on Industrial Technology (ICIT’13). IEEE, 1032--1037.Google ScholarCross Ref
Siyuan Qi, Siyuan Huang, Ping Wei, and Song-Chun Zhu. 2017. Predicting human activities using stochastic grammar. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 1173--1181. DOI:https://doi.org/10.1109/iccv.2017.132Google ScholarCross Ref
Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, and Song-Chun Zhu. 2018. Human-centric indoor scene synthesis using stochastic grammar. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 5899--5908.Google ScholarCross Ref
Christian P. Robert and George Casella. 1999. The Metropolis—Hastings algorithm. In Springer Texts in Statistics. Springer New York, New York, NY, 231--283. DOI:https://doi.org/10.1007/978-1-4757-3071-5_6Google Scholar
Antonio Foncubierta Rodríguez, Henning Müller, and Adrien Depeursinge. 2017. From visual words to a visual grammar: Using language modelling for image classification. CoRR abs/1703.05571 (2017), 1--17.Google Scholar
Brandon Rothrock, Seyoung Park, and Song-Chun Zhu. 2013. Integrating grammar and segmentation for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3214--3221. DOI:https://doi.org/10.1109/CVPR.2013.413Google ScholarDigital Library
Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, 3859--3869.Google Scholar
Anderson Santos, José Marcato Junior, Jonathan de Andrade Silva, Rodrigo Pereira, Daniel Matos, Geazy Menezes, Leandro Higa, Anette Eltner, Ana Paula Ramos, Lucas Osco, and Wesley Gonçalves. 2020. Storm-drain and manhole detection using the RetinaNet method. Sensors 20, 16 (Aug. 2020), 4450. DOI:https://doi.org/10.3390/s20164450Google ScholarCross Ref
Sunita Sarawagi and William W. Cohen. 2004. Semi-Markov conditional random fields for information extraction. In Proceedings of the 17th International Conference on Neural Information Processing Systems. The MIT Press, Cambridge, MA, 1185--1192. Retrieved from http://dl.acm.org/citation.cfm?idequals;2976040.2976189.Google Scholar
M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Sig. Proc. 45, 11 (1997), 2673--2681. DOI:https://doi.org/10.1109/78.650093Google ScholarDigital Library
Ricky J. Sethi and Amit K. Roy-Chowdhury. 2010. Modeling and recognition of complex multi-person interactions in video. In Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis (MPVA’10). ACM, New York, NY, 43--46. DOI:https://doi.org/10.1145/1878039.1878049Google Scholar
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). ICLR, 1--14.Google Scholar
Kenneth Slonneger and Barry Kurtz. 1995. Formal Syntax and Semantics of Programming Languages: A Laboratory Based Approach (1st ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA.Google Scholar
Xi Song, Tianfu Wu, Yunde Jia, and Song-Chun Zhu. 2013. Discriminatively trained and-or tree models for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3278--3285. DOI:https://doi.org/10.1109/CVPR.2013.421Google ScholarDigital Library
George Stiny and James Gips. 1971. Shape grammars and the generative specification of painting and sculpture. In Information Processing, Proceedings of IFIP Congress, Vol. 2. Elsevier, North Holland Publishing Co., 1460--1465.Google Scholar
Domen Tabernik, Matej Kristan, Jeremy L. Wyatt, and Ales Leonardis. 2016. Towards deep compositional networks. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, 3470--3475. DOI:https://doi.org/10.1109/ICPR.2016.7900171Google ScholarCross Ref
Domen Tabernik, Aleš Leonardis, Marko Boben, Danijel Skočaj, and Matej Kristan. 2015. Adding discriminative power to a generative hierarchical compositional model using histograms of compositions. Comput. Vis. Image Underst. 138, C (Sept. 2015), 102--113. DOI:https://doi.org/10.1016/j.cviu.2015.04.006Google Scholar
Jawad Tayyub, Majd Hawasly, David C. Hogg, and Anthony G. Cohn. 2018. Learning hierarchical models of complex daily activities from annotated videos. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’18). IEEE, 1633--1641. DOI:https://doi.org/10.1109/WACV.2018.00182Google Scholar
Olivier Teboul, Iasonas Kokkinos, Loic Simon, Panagiotis Koutsourakis, and Nikos Paragios. 2011. Shape grammar parsing via reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE Computer Society, Washington, DC, 2273--2280. DOI:https://doi.org/10.1109/CVPR.2011.5995319Google ScholarDigital Library
Olivier Teboul, Iasonas Kokkinos, Loic Simon, Panagiotis Koutsourakis, and Nikos Paragios. 2013. Parsing facades with shape grammars and reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 35, 7 (July 2013), 1744--1756. DOI:https://doi.org/10.1109/TPAMI.2012.252Google ScholarDigital Library
Everton Castelão Tetila, Bruno Brandoli Machado, Gilberto Astolfi, Nícolas Alessandro de Souza Belete, Willian Paraguassu Amorim, Antonia Railda Roel, and Hemerson Pistori. 2020. Detection and classification of soybean pests using deep learning with UAV images. Comput. Electron. Agric. 179 (2020), 105836. DOI:https://doi.org/10.1016/j.compag.2020.105836Google ScholarCross Ref
Bin Tian, Ming Tang, and Fei-Yue Wang. 2015. Vehicle detection grammars with partial occlusion handling for traffic surveillance. Transport. Res. Part C: Emerg. Technol. 56 (2015), 80--93. DOI:https://doi.org/10.1016/j.trc.2015.02.020Google ScholarCross Ref
Nam N. Vo and Aaron F. Bobick. 2014. From stochastic grammar to Bayes network: Probabilistic parsing of complex activity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2641--2648.Google Scholar
Nam N. Vo and Aaron F. Bobick. 2016. Sequential interval network for parsing complex structured activity. Comput. Vis. Image Underst. 143 (2016), 147--158. DOI:https://doi.org/10.1016/j.cviu.2015.07.006Google ScholarDigital Library
Michael Walton, Doug Lange, and Song-Chun Zhu. 2017. Inferring context through scene understanding. In Proceedings of the AAAI Spring Symposium Series. AAAI Press, 356--360.Google Scholar
Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2013. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103, 1 (May 2013), 60--79. DOI:https://doi.org/10.1007/s11263-012-0594-8Google ScholarCross Ref
Wenguan Wang, Wenguan Wang, Yuanlu Xu, Jianbing Shen, and Song-Chun Zhu. 2018. Attentive fashion grammar network for fashion landmark detection and clothing category classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 4271--4280.Google ScholarCross Ref
Julien Weissenberg, Hayko Riemenschneider, Mukta Prasad, and Luc Van Gool. 2013. Is there a procedural logic to architecture? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Washington, DC, 185--192. DOI:https://doi.org/10.1109/CVPR.2013.31Google ScholarDigital Library
A. D. Wilson and A. F. Bobick. 1999. Parametric hidden Markov models for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21, 9 (Sep. 1999), 884--900. DOI:https://doi.org/10.1109/34.790429Google ScholarDigital Library
David Windridge, Josef Kittler, Teofilo de Campos, Fei Yan, William Christmas, and Aftab Khan. 2015. A novel Markov logic rule induction strategy for characterizing sports video footage. IEEE MultiMedia 22, 2 (Apr. 2015), 24--35. DOI:https://doi.org/10.1109/MMUL.2014.36Google ScholarDigital Library
Bingwei Wu. 2013. Two-dimensional (2D) Languages and Application to Handwritten Graphical Parsing. Technical Report. Ecole Polytechnique de l’université de Nantes. Retrieved from https://hal.archives-ouvertes.fr/hal-00861080.Google Scholar
Ying Nian Wu, Zhangzhang Si, Haifeng Gong, and Song-Chun Zhu. 2009. Learning active basis model for object detection and recognition. Int. J. Comput. Vis. 90, 2 (Aug. 2009), 198--235. DOI:https://doi.org/10.1007/s11263-009-0287-0Google Scholar
Xianglei Xing, Tianfu Wu, Song-Chun Zhu, and Ying Nian Wu. 2020. Inducing hierarchical compositional model by sparsifying generator network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 14284--14293. DOI:https://doi.org/10.1109/CVPR42600.2020.01430Google ScholarCross Ref
Xianglei Xing, Song-Chun Zhu, and Ying Nian Wu. 2019. Inducing sparse coding and And-Or grammar from generator network. In Proceedings of the AAAI Conference on Artificial Intelligence, Workshop on Network Interpretability for Deep Learning. AAAI Press, 1--4.Google Scholar
Yuanlu Xu, Lei Qin, Xiaobai Liu, Jianwen Xie, and Song-Chun Zhu. 2018. A causal and-or graph model for visibility fluent reasoning in tracking interacting objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2178--2187. DOI:https://doi.org/10.1109/CVPR.2018.00232Google ScholarCross Ref
M. S. Zarchi, R. T. Tan, C. van Gemeren, A. Monadjemi, and R. C. Veltkamp. 2016. Understanding image concepts using ISTOP model. Pattern Recog. 53, C (May 2016), 174--183. DOI:https://doi.org/10.1016/j.patcog.2015.11.010Google Scholar
Yibiao Zhao and Song-Chun Zhu. 2013. Scene parsing by integrating function, geometry and appearance models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3119--3126. DOI:https://doi.org/10.1109/CVPR.2013.401Google ScholarDigital Library
Y. Zhu, N. Nayak, U. Gaur, B. Song, and A. Roy-Chowdhury. 2013. Modeling multi-object interactions using string of feature graphs. Comput. Vis. Image Underst. 117, 10 (2013), 1313--1328. DOI:https://doi.org/10.1016/j.cviu.2012.08.009Google ScholarDigital Library
Bartosz Zieliński, Marek Skomorowski, Wadim Wojciechowski, Mariusz Korkosz, and Kamila Sprężak. 2015. Computer aided erosions and osteophytes detection based on hand radiographs. Pattern Recog. 48, 7 (2015), 2304--2317. DOI:https://doi.org/10.1016/j.patcog.2015.01.018Google ScholarDigital Library

Index Terms

Syntactic Pattern Recognition in Computer Vision: A Systematic Review
1. Mathematics of computing
  1. Mathematical software
    1. Mathematical software performance

Recommendations

Using grammars for pattern recognition in images: A systematic review

Grammars are widely used to describe string languages such as programming and natural languages and, more recently, biosequences. Moreover, since the 1980s grammars have been used in computer vision and related areas. Some factors accountable for this ...
Read More
Syntactic Pattern Recognition of the ECG

An application of the syntactic method to electrocardiogram (ECG) pattern recognition and parameter measurement is presented. Solutions to the related problems of primitive pattern selection, primitive pattern extraction, linguistic representation, and ...
Read More
Inference of Parsable Graph Grammars for Syntactic Pattern Recognition

A research into a syntactic pattern recognition model based on (edNLC) graph grammars (introduced and investigated in Janssens and Rozenberg Inform. Sci. 20 (1980), 191-216, and Janssens, Rozenberg and Verraedt Comp. Vis. Graph. Image Process. 18 (1982),...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 54, Issue 3
April 2022
836 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3461619
Editor:
Albert Zomaya
University of Sydney, Australia
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2021
- Accepted: 1 January 2021
- Revised: 1 November 2020
- Received: 1 April 2020
Published in csur Volume 54, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Computer vision
formal languages
image representation
pattern recognition
syntactic methods
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 652
  Total Downloads
- Downloads (Last 12 months)124
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Syntactic Pattern Recognition in Computer Vision: A Systematic Review

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Using grammars for pattern recognition in images: A systematic review

Syntactic Pattern Recognition of the ECG

Inference of Parsable Graph Grammars for Syntactic Pattern Recognition