skip to main content
research-article

Syntactic Pattern Recognition in Computer Vision: A Systematic Review

Published:17 April 2021Publication History
Skip Abstract Section

Abstract

Using techniques derived from the syntactic methods for visual pattern recognition is not new and was much explored in the area called syntactical or structural pattern recognition. Syntactic methods have been useful because they are intuitively simple to understand and have transparent, interpretable, and elegant representations. Their capacity to represent patterns in a semantic, hierarchical, compositional, spatial, and temporal way have made them very popular in the research community. In this article, we try to give an overview of how syntactic methods have been employed for computer vision tasks. We conduct a systematic literature review to survey the most relevant studies that use syntactic methods for pattern recognition tasks in images and videos. Our search returned 597 papers, of which 71 papers were selected for analysis. The results indicated that in most of the studies surveyed, the syntactic methods were used as a high-level structure that makes the hierarchical or semantic relationship among objects or actions to perform the most diverse tasks.

References

  1. Nosheen Abid, Adnan ul Hasan, and Faisal Shafait. 2018. DeepParse: A trainable postal address parser. In Proceedings of the Conference on Digital Image Computing: Techniques and Applications (DICTA’18). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  2. Francisco Álvaro, Joan-Andreu Sánchez, and José-Miguel Benedí. 2014. Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Pattern Recog. Lett. 35 (2014), 58--67. DOI:https://doi.org/10.1016/j.patrec.2012.09.023Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Francisco Álvaro, Joan-Andreu Sánchez, and José-Miguel Benedí. 2016. An integrated grammar-based approach for mathematical expression recognition. Pattern Recog. 51 (2016), 135--147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alexander Andreopoulos and John K. Tsotsos. 2013. 50 Years of object recognition: Directions forward. Comput. Vis. Image Underst. 117, 8 (2013), 827--891. DOI:https://doi.org/10.1016/j.cviu.2013.04.005Google ScholarGoogle ScholarCross RefCross Ref
  5. Gilberto Astolfi, Marcio Carneiro Brito Pache, Geazy Vilharva Menezes, Adair da Silva Oliveira Junior, Gabriel Kirsten Menezes, Vanessa Aparecida Moares de Weber, Everton Castelão Tetila, Nícolas Alessandro de Souza Belete, Edson Takashi Matsubara, and Hemerson Pistori. 2020. Combining syntactic methods with LSTM to classify soybean aerial images. IEEE Geosci. Rem. Sens. Lett. 1, 1 (2020), 1--5. DOI:https://doi.org/10.1109/lgrs.2020.3014938Google ScholarGoogle Scholar
  6. Kaouther Khazri Ayeb, Afef Kacem Echi, and Abdel Belaïd. 2015. A syntax directed system for the recognition of printed Arabic mathematical formulas. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR’15). IEEE, 186--190. DOI:https://doi.org/10.1109/ICDAR.2015.7333749Google ScholarGoogle Scholar
  7. Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc J. Van Gool. 2008. Speeded-Up robust features (SURF). Comput. Vis. Image Underst. 110, 3 (June 2008), 346--359. DOI:https://doi.org/10.1016/j.cviu.2007.09.014Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Andrew Blake, Pushmeet Kohli, and Carsten Rother. 2011. Markov Random Fields for Vision and Image Processing. The MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  9. Alexandre Boulch, Simon Houllier, Renaud Marlet, and Olivier Tournaire. 2013. Semantizing complex 3D scenes using constrained attribute grammars. In Proceedings of the 11th Eurographics/ACMSIGGRAPH Symposium on Geometry Processing (SGP’13). Eurographics Association, 33--42. DOI:https://doi.org/10.1111/cgf.12170Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lubomir Bourdev, Subhransu Maji, Thomas Brox, and Jitendra Malik. 2010. Detecting people using mutually consistent poselet activations. In Proceedings of the 11th European Conference on Computer Vision (ECCV’10). Springer-Verlag, Berlin, 168--181. Retrieved from http://dl.acm.org/citation.cfm?idequals;1888212.1888227.Google ScholarGoogle ScholarCross RefCross Ref
  11. Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng. 2011. Handbook of Markov Chain Monte Carlo. CRC Press, Boca Raton, FL. Retrieved from https://books.google.com.br/books?idequals;qfRsAIKZ4rIC.Google ScholarGoogle Scholar
  12. Gaurav Chanda and Frank Dellaert. 2004. Grammatical Methods in Computer Vision: An Overview. Technical Report GIT-GVU-04-29. Georgia Institute of Technology. Retrieved from https://www.cc.gatech.edu/gvu/reports/2004/abstracts/04-29.html.Google ScholarGoogle Scholar
  13. Tae Eun Choe, Hongli Deng, Feng Guo, Mun Wai Lee, and Niels Haering. 2013. Semantic video-to-video search using sub-graph grouping and matching. In Proceedings of the IEEE International Conference on Computer Vision Workshops. IEEE, 787--794. DOI:https://doi.org/10.1109/ICCVW.2013.108Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jeroen Chua and Pedro F. Felzenszwalb. 2016. Scene grammars, factor graphs, and belief propagation. CoRR abs/1606.01307 (2016), 1--46.Google ScholarGoogle Scholar
  15. Nicholas Dahm, Yongsheng Gao, Terry Caelli, and Horst Bunke. 2013. Matching non-aligned objects using a relational string-graph. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 3394--3398. DOI:https://doi.org/10.1109/ICIP.2013.6738700Google ScholarGoogle ScholarCross RefCross Ref
  16. Lluís-Pere de las Heras, Oriol Ramos Terrades, and Josep Lladós. 2015. Attributed graph grammar for floor plan analysis. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR’15). IEEE, 726--730. DOI:https://doi.org/10.1109/ICDAR.2015.7333857Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ilke Demir, Daniel G. Aliaga, and Bedrich Benes. 2015. Procedural editing of 3D building point clouds. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 2147--2155. DOI:https://doi.org/10.1109/ICCV.2015.248Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Vincenzo Deufemia, Michele Risi, and Genoveffa Tortora. 2014. Sketched symbol recognition using latent-dynamic conditional random fields and distance-based clustering. Pattern Recog. 47, 3 (2014), 1159--1171. DOI:https://doi.org/10.1016/j.patcog.2013.09.016Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Murray Eden. 1961. On the formalization of handwriting. Amer. Math. Soc. Appl. Math Symp. 12 (1961), 83--88.Google ScholarGoogle ScholarCross RefCross Ref
  20. Haoshu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2018. Learning pose grammar to encode human body configuration for 3D pose estimation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI’18), the 30th innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 6821--6828.Google ScholarGoogle Scholar
  21. Weiguo Feng, Rui Liu, and Ming Zhu. 2014. Fall detection for elderly person care in a vision-based home surveillance environment using a monocular camera. Sig. Image Vid. Proc. 8, 6 (2014), 1129--1138. DOI:https://doi.org/10.1007/s11760-014-0645-4Google ScholarGoogle ScholarCross RefCross Ref
  22. G. Ferber. 1986. Classifying and validating intermittent EEG patterns with syntactic methods. Pattern Recog. 19, 4 (1986), 289--295. DOI:https://doi.org/10.1016/0031-3203(86)90054-3Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Amy Fire and Song-Chun Zhu. 2017. Inferring hidden statuses and actions in video by causal reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). IEEE, 48--56. DOI:https://doi.org/10.1109/CVPRW.2017.13Google ScholarGoogle ScholarCross RefCross Ref
  24. Mariusz Flasiński and Janusz Jurek. 2014. Fundamental methodological issues of syntactic pattern recognition. Pattern Anal. Applic. 17, 3 (01 Aug. 2014), 465--480. DOI:https://doi.org/10.1007/s10044-013-0322-1Google ScholarGoogle Scholar
  25. G. D. Forney. 2001. Codes on graphs: Normal realizations. IEEE Trans. Inf. Theor. 47, 2 (Feb. 2001), 520--548. DOI:https://doi.org/10.1109/18.910573Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. David A. Forsyth and Jean Ponce. 2002. Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference, Upper Saddle River, NJ.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. King-Sun Fu and A. Rosenfeld. 1976. Pattern recognition and image processing. IEEE Trans. Comput. C-25, 12 (Dec. 1976), 1336--1346. DOI:https://doi.org/10.1109/TC.1976.1674602Google ScholarGoogle Scholar
  28. Raghudeep Gadde, Renaud Marlet, and Nikos Paragios. 2016. Learning grammars for architecture-specific facade parsing. Int. J. Comput. Vis. 117, 3 (May 2016), 290--316. DOI:https://doi.org/10.1007/s11263-016-0887-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zoubin Ghahramani. 2001. An introduction to hidden Markov models and Bayesian networks. Int. J. Pattern Recog. Artif. Intell. 15, 01 (2001), 9--42. DOI:https://doi.org/10.1142/S0218001401000836Google ScholarGoogle ScholarCross RefCross Ref
  30. Josep M. Gonfaus, Marco Pedersoli, Jordi González, Andrea Vedaldi, and F. Xavier Roca. 2015. Factorized appearances for object detection. Comput. Vis. Image Underst. 138 (2015), 92--101. DOI:https://doi.org/10.1016/j.cviu.2015.04.008Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the International Conference on Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672--2680.Google ScholarGoogle Scholar
  32. Klaus Greff, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28, 10 (Oct. 2017), 2222--2232. DOI:https://doi.org/10.1109/TNNLS.2016.2582924Google ScholarGoogle ScholarCross RefCross Ref
  33. Christian Hentschel and Harald Sack. 2014. Does one size really fit all?: Evaluating classifiers in bag-of-visual-words classification. In Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Geoffrey Hinton, Sara Sabour, and Nicholas Frosst. 2018. Matrix capsules with EM routing. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). ICLR, 1--15.Google ScholarGoogle Scholar
  35. Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. 2011. Transforming auto-encoders. In Lecture Notes in Computer Science. Springer Berlin, 44--51. DOI:https://doi.org/10.1007/978-3-642-21735-7_6Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Satoshi Ikehata, Hang Yang, and Yasutaka Furukawa. 2015. Structured indoor modeling. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 1323--1331. DOI:https://doi.org/10.1109/ICCV.2015.156Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Phillip Isola and Ce Liu. 2013. Scene collaging: Analysis and synthesis of natural images with semantic layers. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). IEEE, Washington, DC, 3048--3055. DOI:https://doi.org/10.1109/ICCV.2013.457Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tommi S. Jaakkola and David Haussler. 1999. Exploiting generative models in discriminative classifiers. In Proceedings of the Conference on Advances in Neural Information Processing Systems. The MIT Press, Cambridge, MA, 487--493. Retrieved from http://dl.acm.org/citation.cfm?idequals;340534.340715.Google ScholarGoogle Scholar
  39. A. K. Jain, R. P. W. Duin, and Jianchang Mao. 2000. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1 (Jan. 2000), 4--37. DOI:https://doi.org/10.1109/34.824819Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ahsan Jalal, Ahmad Salman, Ajmal Mian, Mark Shortis, and Faisal Shafait. 2020. Fish detection and species classification in underwater environments using deep learning with temporal information. Ecol. Inform. 57 (May 2020), 101088. DOI:https://doi.org/10.1016/j.ecoinf.2020.101088Google ScholarGoogle Scholar
  41. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). Association for Computing Machinery, New York, NY, 675--678. DOI:https://doi.org/10.1145/2647868.2654889Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, and Song-Chun Zhu. 2018. Configurable 3D scene synthesis and 2D image rendering with per-pixel ground truth using stochastic grammars. Int. J. Comput. Vis. 126, 9 (June 2018), 920--941.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yunsheng Jiang and Jinwen Ma. 2015. Combination features and models for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 240--248.Google ScholarGoogle Scholar
  44. Frank D. Julca-Aguilar, Harold Mouchère, Christian Viard-Gaudin, and Nina S. T. Hirata. 2017. A general framework for the recognition of online handwritten graphics. CoRR abs/1709.06389 (2017), 1--14.Google ScholarGoogle Scholar
  45. Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, and Ali Farhadi. 2016. A diagram is worth a dozen images. In Computer Vision -- ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 235--251.Google ScholarGoogle ScholarCross RefCross Ref
  46. Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14). The MIT Press, Cambridge, MA, 3581--3589.Google ScholarGoogle Scholar
  47. Russell A. Kirsch. 1964. Computer interpretation of English text and picture patterns. IEEE Trans. Electron. Comput. EC-13, 4 (Aug. 1964), 363--376. DOI:https://doi.org/10.1109/PGEC.1964.263816Google ScholarGoogle ScholarCross RefCross Ref
  48. Barbara Kitchenham and Stuart Charters. 2007. Guidelines for Performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE 2007-001. Keele University and Durham University Joint Report. Retrieved from http://www.dur.ac.uk/ebse/resources/Systematic-reviews-5-8.pdf.Google ScholarGoogle Scholar
  49. W. W. Kong and Surendra Ranganath. 2014. Towards subject independent continuous sign language recognition: A segment and merge approach. Pattern Recog. 47, 3 (2014), 1294--1308. DOI:https://doi.org/10.1016/j.patcog.2013.09.014Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Adam Kortylewski, Aleksander Wieczorek, Mario Wieser, Clemens Blumer, Sonali Parbhoo, Andreas Morel-Forster, Volker Roth, and Thomas Vetter. 2019. Greedy structure learning of hierarchical compositional models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). Computer Vision Foundation/IEEE, 11612--11621. DOI:https://doi.org/10.1109/CVPR.2019.01188Google ScholarGoogle ScholarCross RefCross Ref
  51. Mateusz Koziński, Raghudeep Gadde, Sergey Zagoruyko, Guillaume Obozinski, and Renaud Marlet. 2015. A MRF shape prior for facade parsing with occlusions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 2820--2828. DOI:https://doi.org/10.1109/CVPR.2015.7298899Google ScholarGoogle ScholarCross RefCross Ref
  52. Mateusz Koziński and Renaud Marlet. 2014. Image parsing with graph grammars and Markov Random Fields applied to facade analysis. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. IEEE, 729--736. DOI:https://doi.org/10.1109/WACV.2014.6836030Google ScholarGoogle ScholarCross RefCross Ref
  53. Mateusz Koziński, Guillaume Obozinski, and Renaud Marlet. 2015. Beyond procedural facade parsing: Bidirectional alignment via linear programming. In Computer Vision -- ACCV 2014, Daniel Cremers, Ian Reid, Hideo Saito, and Ming-Hsuan Yang (Eds.). Springer International Publishing, Cham, 79--94.Google ScholarGoogle Scholar
  54. Volker Krüger and Dennis Herzog. 2013. Tracking in object action space. Comput. Vis. Image Underst. 117, 7 (2013), 764--789. DOI:https://doi.org/10.1016/j.cviu.2013.02.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Hilde Kuehne, Juergen Gall, and Thomas Serre. 2016. An end-to-end generative framework for video segmentation and recognition. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’16). IEEE, 1--8. DOI:https://doi.org/10.1109/WACV.2016.7477701Google ScholarGoogle ScholarCross RefCross Ref
  56. Hilde Kuehne, Alexander Richard, and Juergen Gall. 2017. Weakly supervised learning of actions from transcripts. Comput. Vis. Image Underst. 163 (2017), 78--89. DOI:https://doi.org/10.1016/j.cviu.2017.06.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, New York, NY, 2169--2178. DOI:https://doi.org/10.1109/CVPR.2006.68Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. T. Hoang Ngan Le, ChenChen Zhu, Yutong Zheng, Khoa Luu, and Marios Savvides. 2017. DeepSafeDrive: A grammar-aware driver parsing approach to Driver Behavioral Situational Awareness (DB-SAW). Pattern Recog. 66 (2017), 229--238. DOI:https://doi.org/10.1016/j.patcog.2016.11.028Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Kyuhwa Lee, Dimitri Ognibene, Hyung Jin Chang, Tae-Kyun Kim, and Yiannis Demiris. 2015. STARE: Spatio-temporal attention relocation for multiple structured activities detection. IEEE Trans. Image Proc. 24, 12 (Dec. 2015), 5916--5927. DOI:https://doi.org/10.1109/TIP.2015.2487837Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Eduardo Lemus, Ernesto Bribiesca, and Edgar Garduno. 2015. Surface trees Representation of boundary surfaces using a tree descriptor. J. Vis. Commun. Image Represent. 31 (2015), 101--111. DOI:https://doi.org/10.1016/j.jvcir.2015.06.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Bo Li, Yaobin Chen, and Fei-Yue Wang. 2015. Pedestrian detection based on clustered poselet models and hierarchical and-or grammar. IEEE Trans. Vehic. Technol. 64, 4 (Apr. 2015), 1435--1444. DOI:https://doi.org/10.1109/TVT.2014.2331314Google ScholarGoogle ScholarCross RefCross Ref
  62. Bo Li, Xi Song, Tianfu Wu, Wenze Hu, and Mingtao Pei. 2014. Coupling-and-decoupling: A hierarchical model for occlusion-free object detection. Pattern Recog. 47, 10 (2014), 3254--3264. DOI:https://doi.org/10.1016/j.patcog.2014.04.016Google ScholarGoogle ScholarCross RefCross Ref
  63. Xilai Li, Xi Song, and Tianfu Wu. 2019. AOGNets: Compositional grammatical architectures for deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, 6220--6230.Google ScholarGoogle ScholarCross RefCross Ref
  64. Xilai Li, Tianfu Wu, Xi Song, and Hamid Krim. 2017. AOGNets: Deep AND-OR grammar networks for visual recognition. CoRR abs/1711.05847 (2017), 1--12.Google ScholarGoogle Scholar
  65. Li Liu, Shu Wang, Yuxin Peng, Zigang Huang, Ming Liu, and Bin Hu. 2016. Mining intricate temporal rules for recognizing complex activities of daily living under uncertainty. Pattern Recog. 60 (2016), 1015--1028. DOI:https://doi.org/10.1016/j.patcog.2016.07.024Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Xianming Liu, Rongrong Ji, Changhu Wang, Wei Liu, Bineng Zhong, and Thomas S. Huang. 2015. Understanding image structure via hierarchical shape parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 5042--5050. DOI:https://doi.org/10.1109/CVPR.2015.7299139Google ScholarGoogle Scholar
  67. Xiaobai Liu, Yuanlu Xu, Lei Zhu, and Yadong Mu. 2018. A stochastic attribute grammar for robust cross-view human tracking. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (Oct. 2018), 2884--2895. DOI:https://doi.org/10.1109/TCSVT.2017.2781738Google ScholarGoogle Scholar
  68. Xiaobai Liu, Yibiao Zhao, and Song-Chun Zhu. 2014. Single-view 3D scene parsing by attributed grammar. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 684--691. DOI:https://doi.org/10.1109/CVPR.2014.93Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Xiaobai Liu, Yibiao Zhao, and Song-Chun Zhu. 2018. Single-view 3D scene reconstruction and parsing by attribute grammar. IEEE Trans. Pattern Anal. Mach. Intell. 40, 3 (Mar. 2018), 710--725. DOI:https://doi.org/10.1109/TPAMI.2017.2689007Google ScholarGoogle ScholarCross RefCross Ref
  70. Yang Lu, Tianfu Wu, and Song-Chun Zhu. 2014. Online object tracking, learning, and parsing with and-or graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3462--3469. DOI:https://doi.org/10.1109/CVPR.2014.443Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Andelo Martinovic and Luc Van Gool. 2013. Early Parsing for 2D Stochastic Context Free Grammars. Technical Report KUL/ESAT/PSI/1301. Department of Electrical Engineering (ESAT), University Hospital Gasthuisberg, Kasteelpark Arenberg, België.Google ScholarGoogle Scholar
  72. Andelo Martinovic and Luc Van Gool. 2013. Bayesian grammar learning for inverse procedural modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE Computer Society, Washington, DC, 201--208. DOI:https://doi.org/10.1109/CVPR.2013.33Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Lilyana Mihalkova, Tuyen Huynh, and Raymond J. Mooney. 2007. Mapping and revising Markov logic networks for transfer learning. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI’07). AAAI Press, 608--614. Retrieved from http://dl.acm.org/citation.cfm?idequals;1619645.1619743.Google ScholarGoogle Scholar
  74. Darnell Moore and Irfan Essa. 2002. Recognizing multitasked activities from video using stochastic context-free grammar. In Proceedings of the 18th National Conference on Artificial Intelligence. American Association for Artificial Intelligence, 770--776.Google ScholarGoogle Scholar
  75. Louis-Philippe Morency, Ariadna Quattoni, and Trevor Darrell. 2007. Latent-dynamic discriminative models for continuous gesture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8. DOI:https://doi.org/10.1109/CVPR.2007.383299Google ScholarGoogle ScholarCross RefCross Ref
  76. R. Narasimhan. 1962. A Linguistic Approach to Pattern Recognition. Technical Report 121. Digital Computer Laboratory, University of Illinois, Urbana, IL.Google ScholarGoogle Scholar
  77. Andrew Y. Ng and Michael I. Jordan. 2001. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). The MIT Press, Cambridge, MA, 841--848.Google ScholarGoogle Scholar
  78. Andrew Y. Ng and Michael I. Jordan. 2001. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). The MIT Press, Cambridge, MA, 841--848.Google ScholarGoogle Scholar
  79. T. Ojala, M. Pietikainen, and D. Harwood. 1994. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In Proceedings of 12th International Conference on Pattern Recognition. IEEE, 582--585. DOI:https://doi.org/10.1109/ICPR.1994.576366Google ScholarGoogle ScholarCross RefCross Ref
  80. Eray Özkural. 2014. An application of stochastic context sensitive grammar induction to transfer learning. In Artificial General Intelligence, Ben Goertzel, Laurent Orseau, and Javier Snaider (Eds.). Springer International Publishing, Cham, 121--132.Google ScholarGoogle Scholar
  81. Seyoung Park, Bruce Xiaohan Nie, and Song-Chun Zhu. 2018. Attribute and-or grammar for joint parsing of human pose, parts and attributes. IEEE Trans. Pattern Anal. Mach. Intell. 40, 7 (July 2018), 1555--1569. DOI:https://doi.org/10.1109/TPAMI.2017.2731842Google ScholarGoogle ScholarCross RefCross Ref
  82. Seyoung Park and Song-Chun Zhu. 2015. Attributed grammars for joint estimation of human attributes, part and pose. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 2372--2380. DOI:https://doi.org/10.1109/ICCV.2015.273Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Ricardo Wandré Dias Pedro, Fátima L. S. Nunes, and Ariane Machado-Lima. 2013. Using grammars for pattern recognition in images: A systematic review. ACM Comput. Surv. 46, 2 (Nov. 2013). DOI:https://doi.org/10.1145/2543581.2543593Google ScholarGoogle Scholar
  84. Mingtao Pei, Zhangzhang Si, Benjamin Z. Yao, and Song-Chun Zhu. 2013. Learning and parsing video events with goal and intent prediction. Comput. Vis. Image Underst. 117, 10 (Oct. 2013), 1369--1383. DOI:https://doi.org/10.1016/j.cviu.2012.12.003Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. John L. Pfaltz and Azriel Rosenfeld. 1969. Web grammars. In Proceedings of the 1st International Joint Conference on Artificial Intelligence (IJCAI’69). Morgan Kaufmann Publishers Inc., San Francisco, CA, 609--619. Retrieved from http://dl.acm.org/citation.cfm?idequals;1624562.1624616.Google ScholarGoogle Scholar
  86. Hamed Pirsiavash and Deva Ramanan. 2014. Parsing videos of actions with segmental grammars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE Computer Society, Washington, DC, 612--619. DOI:https://doi.org/10.1109/CVPR.2014.85Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Hemerson Pistori, Andrew Calway, and Peter Flach. 2013. A new strategy for applying grammatical inference to image classification problems. In Proceedings of the IEEE International Conference on Industrial Technology (ICIT’13). IEEE, 1032--1037.Google ScholarGoogle ScholarCross RefCross Ref
  88. Siyuan Qi, Siyuan Huang, Ping Wei, and Song-Chun Zhu. 2017. Predicting human activities using stochastic grammar. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 1173--1181. DOI:https://doi.org/10.1109/iccv.2017.132Google ScholarGoogle ScholarCross RefCross Ref
  89. Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, and Song-Chun Zhu. 2018. Human-centric indoor scene synthesis using stochastic grammar. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 5899--5908.Google ScholarGoogle ScholarCross RefCross Ref
  90. Christian P. Robert and George Casella. 1999. The Metropolis—Hastings algorithm. In Springer Texts in Statistics. Springer New York, New York, NY, 231--283. DOI:https://doi.org/10.1007/978-1-4757-3071-5_6Google ScholarGoogle Scholar
  91. Antonio Foncubierta Rodríguez, Henning Müller, and Adrien Depeursinge. 2017. From visual words to a visual grammar: Using language modelling for image classification. CoRR abs/1703.05571 (2017), 1--17.Google ScholarGoogle Scholar
  92. Brandon Rothrock, Seyoung Park, and Song-Chun Zhu. 2013. Integrating grammar and segmentation for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3214--3221. DOI:https://doi.org/10.1109/CVPR.2013.413Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, 3859--3869.Google ScholarGoogle Scholar
  94. Anderson Santos, José Marcato Junior, Jonathan de Andrade Silva, Rodrigo Pereira, Daniel Matos, Geazy Menezes, Leandro Higa, Anette Eltner, Ana Paula Ramos, Lucas Osco, and Wesley Gonçalves. 2020. Storm-drain and manhole detection using the RetinaNet method. Sensors 20, 16 (Aug. 2020), 4450. DOI:https://doi.org/10.3390/s20164450Google ScholarGoogle ScholarCross RefCross Ref
  95. Sunita Sarawagi and William W. Cohen. 2004. Semi-Markov conditional random fields for information extraction. In Proceedings of the 17th International Conference on Neural Information Processing Systems. The MIT Press, Cambridge, MA, 1185--1192. Retrieved from http://dl.acm.org/citation.cfm?idequals;2976040.2976189.Google ScholarGoogle Scholar
  96. M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Sig. Proc. 45, 11 (1997), 2673--2681. DOI:https://doi.org/10.1109/78.650093Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Ricky J. Sethi and Amit K. Roy-Chowdhury. 2010. Modeling and recognition of complex multi-person interactions in video. In Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis (MPVA’10). ACM, New York, NY, 43--46. DOI:https://doi.org/10.1145/1878039.1878049Google ScholarGoogle Scholar
  98. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). ICLR, 1--14.Google ScholarGoogle Scholar
  99. Kenneth Slonneger and Barry Kurtz. 1995. Formal Syntax and Semantics of Programming Languages: A Laboratory Based Approach (1st ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA.Google ScholarGoogle Scholar
  100. Xi Song, Tianfu Wu, Yunde Jia, and Song-Chun Zhu. 2013. Discriminatively trained and-or tree models for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3278--3285. DOI:https://doi.org/10.1109/CVPR.2013.421Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. George Stiny and James Gips. 1971. Shape grammars and the generative specification of painting and sculpture. In Information Processing, Proceedings of IFIP Congress, Vol. 2. Elsevier, North Holland Publishing Co., 1460--1465.Google ScholarGoogle Scholar
  102. Domen Tabernik, Matej Kristan, Jeremy L. Wyatt, and Ales Leonardis. 2016. Towards deep compositional networks. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, 3470--3475. DOI:https://doi.org/10.1109/ICPR.2016.7900171Google ScholarGoogle ScholarCross RefCross Ref
  103. Domen Tabernik, Aleš Leonardis, Marko Boben, Danijel Skočaj, and Matej Kristan. 2015. Adding discriminative power to a generative hierarchical compositional model using histograms of compositions. Comput. Vis. Image Underst. 138, C (Sept. 2015), 102--113. DOI:https://doi.org/10.1016/j.cviu.2015.04.006Google ScholarGoogle Scholar
  104. Jawad Tayyub, Majd Hawasly, David C. Hogg, and Anthony G. Cohn. 2018. Learning hierarchical models of complex daily activities from annotated videos. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’18). IEEE, 1633--1641. DOI:https://doi.org/10.1109/WACV.2018.00182Google ScholarGoogle Scholar
  105. Olivier Teboul, Iasonas Kokkinos, Loic Simon, Panagiotis Koutsourakis, and Nikos Paragios. 2011. Shape grammar parsing via reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE Computer Society, Washington, DC, 2273--2280. DOI:https://doi.org/10.1109/CVPR.2011.5995319Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Olivier Teboul, Iasonas Kokkinos, Loic Simon, Panagiotis Koutsourakis, and Nikos Paragios. 2013. Parsing facades with shape grammars and reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 35, 7 (July 2013), 1744--1756. DOI:https://doi.org/10.1109/TPAMI.2012.252Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Everton Castelão Tetila, Bruno Brandoli Machado, Gilberto Astolfi, Nícolas Alessandro de Souza Belete, Willian Paraguassu Amorim, Antonia Railda Roel, and Hemerson Pistori. 2020. Detection and classification of soybean pests using deep learning with UAV images. Comput. Electron. Agric. 179 (2020), 105836. DOI:https://doi.org/10.1016/j.compag.2020.105836Google ScholarGoogle ScholarCross RefCross Ref
  108. Bin Tian, Ming Tang, and Fei-Yue Wang. 2015. Vehicle detection grammars with partial occlusion handling for traffic surveillance. Transport. Res. Part C: Emerg. Technol. 56 (2015), 80--93. DOI:https://doi.org/10.1016/j.trc.2015.02.020Google ScholarGoogle ScholarCross RefCross Ref
  109. Nam N. Vo and Aaron F. Bobick. 2014. From stochastic grammar to Bayes network: Probabilistic parsing of complex activity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2641--2648.Google ScholarGoogle Scholar
  110. Nam N. Vo and Aaron F. Bobick. 2016. Sequential interval network for parsing complex structured activity. Comput. Vis. Image Underst. 143 (2016), 147--158. DOI:https://doi.org/10.1016/j.cviu.2015.07.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Michael Walton, Doug Lange, and Song-Chun Zhu. 2017. Inferring context through scene understanding. In Proceedings of the AAAI Spring Symposium Series. AAAI Press, 356--360.Google ScholarGoogle Scholar
  112. Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2013. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103, 1 (May 2013), 60--79. DOI:https://doi.org/10.1007/s11263-012-0594-8Google ScholarGoogle ScholarCross RefCross Ref
  113. Wenguan Wang, Wenguan Wang, Yuanlu Xu, Jianbing Shen, and Song-Chun Zhu. 2018. Attentive fashion grammar network for fashion landmark detection and clothing category classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 4271--4280.Google ScholarGoogle ScholarCross RefCross Ref
  114. Julien Weissenberg, Hayko Riemenschneider, Mukta Prasad, and Luc Van Gool. 2013. Is there a procedural logic to architecture? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Washington, DC, 185--192. DOI:https://doi.org/10.1109/CVPR.2013.31Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. A. D. Wilson and A. F. Bobick. 1999. Parametric hidden Markov models for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21, 9 (Sep. 1999), 884--900. DOI:https://doi.org/10.1109/34.790429Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. David Windridge, Josef Kittler, Teofilo de Campos, Fei Yan, William Christmas, and Aftab Khan. 2015. A novel Markov logic rule induction strategy for characterizing sports video footage. IEEE MultiMedia 22, 2 (Apr. 2015), 24--35. DOI:https://doi.org/10.1109/MMUL.2014.36Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Bingwei Wu. 2013. Two-dimensional (2D) Languages and Application to Handwritten Graphical Parsing. Technical Report. Ecole Polytechnique de l’université de Nantes. Retrieved from https://hal.archives-ouvertes.fr/hal-00861080.Google ScholarGoogle Scholar
  118. Ying Nian Wu, Zhangzhang Si, Haifeng Gong, and Song-Chun Zhu. 2009. Learning active basis model for object detection and recognition. Int. J. Comput. Vis. 90, 2 (Aug. 2009), 198--235. DOI:https://doi.org/10.1007/s11263-009-0287-0Google ScholarGoogle Scholar
  119. Xianglei Xing, Tianfu Wu, Song-Chun Zhu, and Ying Nian Wu. 2020. Inducing hierarchical compositional model by sparsifying generator network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 14284--14293. DOI:https://doi.org/10.1109/CVPR42600.2020.01430Google ScholarGoogle ScholarCross RefCross Ref
  120. Xianglei Xing, Song-Chun Zhu, and Ying Nian Wu. 2019. Inducing sparse coding and And-Or grammar from generator network. In Proceedings of the AAAI Conference on Artificial Intelligence, Workshop on Network Interpretability for Deep Learning. AAAI Press, 1--4.Google ScholarGoogle Scholar
  121. Yuanlu Xu, Lei Qin, Xiaobai Liu, Jianwen Xie, and Song-Chun Zhu. 2018. A causal and-or graph model for visibility fluent reasoning in tracking interacting objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2178--2187. DOI:https://doi.org/10.1109/CVPR.2018.00232Google ScholarGoogle ScholarCross RefCross Ref
  122. M. S. Zarchi, R. T. Tan, C. van Gemeren, A. Monadjemi, and R. C. Veltkamp. 2016. Understanding image concepts using ISTOP model. Pattern Recog. 53, C (May 2016), 174--183. DOI:https://doi.org/10.1016/j.patcog.2015.11.010Google ScholarGoogle Scholar
  123. Yibiao Zhao and Song-Chun Zhu. 2013. Scene parsing by integrating function, geometry and appearance models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3119--3126. DOI:https://doi.org/10.1109/CVPR.2013.401Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Y. Zhu, N. Nayak, U. Gaur, B. Song, and A. Roy-Chowdhury. 2013. Modeling multi-object interactions using string of feature graphs. Comput. Vis. Image Underst. 117, 10 (2013), 1313--1328. DOI:https://doi.org/10.1016/j.cviu.2012.08.009Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. Bartosz Zieliński, Marek Skomorowski, Wadim Wojciechowski, Mariusz Korkosz, and Kamila Sprężak. 2015. Computer aided erosions and osteophytes detection based on hand radiographs. Pattern Recog. 48, 7 (2015), 2304--2317. DOI:https://doi.org/10.1016/j.patcog.2015.01.018Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Syntactic Pattern Recognition in Computer Vision: A Systematic Review

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 54, Issue 3
      April 2022
      836 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/3461619
      Issue’s Table of Contents

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 April 2021
      • Accepted: 1 January 2021
      • Revised: 1 November 2020
      • Received: 1 April 2020
      Published in csur Volume 54, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format