Abstract
Using techniques derived from the syntactic methods for visual pattern recognition is not new and was much explored in the area called syntactical or structural pattern recognition. Syntactic methods have been useful because they are intuitively simple to understand and have transparent, interpretable, and elegant representations. Their capacity to represent patterns in a semantic, hierarchical, compositional, spatial, and temporal way have made them very popular in the research community. In this article, we try to give an overview of how syntactic methods have been employed for computer vision tasks. We conduct a systematic literature review to survey the most relevant studies that use syntactic methods for pattern recognition tasks in images and videos. Our search returned 597 papers, of which 71 papers were selected for analysis. The results indicated that in most of the studies surveyed, the syntactic methods were used as a high-level structure that makes the hierarchical or semantic relationship among objects or actions to perform the most diverse tasks.
- Nosheen Abid, Adnan ul Hasan, and Faisal Shafait. 2018. DeepParse: A trainable postal address parser. In Proceedings of the Conference on Digital Image Computing: Techniques and Applications (DICTA’18). IEEE, 1--8.Google ScholarCross Ref
- Francisco Álvaro, Joan-Andreu Sánchez, and José-Miguel Benedí. 2014. Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Pattern Recog. Lett. 35 (2014), 58--67. DOI:https://doi.org/10.1016/j.patrec.2012.09.023Google ScholarDigital Library
- Francisco Álvaro, Joan-Andreu Sánchez, and José-Miguel Benedí. 2016. An integrated grammar-based approach for mathematical expression recognition. Pattern Recog. 51 (2016), 135--147.Google ScholarDigital Library
- Alexander Andreopoulos and John K. Tsotsos. 2013. 50 Years of object recognition: Directions forward. Comput. Vis. Image Underst. 117, 8 (2013), 827--891. DOI:https://doi.org/10.1016/j.cviu.2013.04.005Google ScholarCross Ref
- Gilberto Astolfi, Marcio Carneiro Brito Pache, Geazy Vilharva Menezes, Adair da Silva Oliveira Junior, Gabriel Kirsten Menezes, Vanessa Aparecida Moares de Weber, Everton Castelão Tetila, Nícolas Alessandro de Souza Belete, Edson Takashi Matsubara, and Hemerson Pistori. 2020. Combining syntactic methods with LSTM to classify soybean aerial images. IEEE Geosci. Rem. Sens. Lett. 1, 1 (2020), 1--5. DOI:https://doi.org/10.1109/lgrs.2020.3014938Google Scholar
- Kaouther Khazri Ayeb, Afef Kacem Echi, and Abdel Belaïd. 2015. A syntax directed system for the recognition of printed Arabic mathematical formulas. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR’15). IEEE, 186--190. DOI:https://doi.org/10.1109/ICDAR.2015.7333749Google Scholar
- Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc J. Van Gool. 2008. Speeded-Up robust features (SURF). Comput. Vis. Image Underst. 110, 3 (June 2008), 346--359. DOI:https://doi.org/10.1016/j.cviu.2007.09.014Google ScholarDigital Library
- Andrew Blake, Pushmeet Kohli, and Carsten Rother. 2011. Markov Random Fields for Vision and Image Processing. The MIT Press, Cambridge, MA.Google Scholar
- Alexandre Boulch, Simon Houllier, Renaud Marlet, and Olivier Tournaire. 2013. Semantizing complex 3D scenes using constrained attribute grammars. In Proceedings of the 11th Eurographics/ACMSIGGRAPH Symposium on Geometry Processing (SGP’13). Eurographics Association, 33--42. DOI:https://doi.org/10.1111/cgf.12170Google ScholarDigital Library
- Lubomir Bourdev, Subhransu Maji, Thomas Brox, and Jitendra Malik. 2010. Detecting people using mutually consistent poselet activations. In Proceedings of the 11th European Conference on Computer Vision (ECCV’10). Springer-Verlag, Berlin, 168--181. Retrieved from http://dl.acm.org/citation.cfm?idequals;1888212.1888227.Google ScholarCross Ref
- Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng. 2011. Handbook of Markov Chain Monte Carlo. CRC Press, Boca Raton, FL. Retrieved from https://books.google.com.br/books?idequals;qfRsAIKZ4rIC.Google Scholar
- Gaurav Chanda and Frank Dellaert. 2004. Grammatical Methods in Computer Vision: An Overview. Technical Report GIT-GVU-04-29. Georgia Institute of Technology. Retrieved from https://www.cc.gatech.edu/gvu/reports/2004/abstracts/04-29.html.Google Scholar
- Tae Eun Choe, Hongli Deng, Feng Guo, Mun Wai Lee, and Niels Haering. 2013. Semantic video-to-video search using sub-graph grouping and matching. In Proceedings of the IEEE International Conference on Computer Vision Workshops. IEEE, 787--794. DOI:https://doi.org/10.1109/ICCVW.2013.108Google ScholarDigital Library
- Jeroen Chua and Pedro F. Felzenszwalb. 2016. Scene grammars, factor graphs, and belief propagation. CoRR abs/1606.01307 (2016), 1--46.Google Scholar
- Nicholas Dahm, Yongsheng Gao, Terry Caelli, and Horst Bunke. 2013. Matching non-aligned objects using a relational string-graph. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 3394--3398. DOI:https://doi.org/10.1109/ICIP.2013.6738700Google ScholarCross Ref
- Lluís-Pere de las Heras, Oriol Ramos Terrades, and Josep Lladós. 2015. Attributed graph grammar for floor plan analysis. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR’15). IEEE, 726--730. DOI:https://doi.org/10.1109/ICDAR.2015.7333857Google ScholarDigital Library
- Ilke Demir, Daniel G. Aliaga, and Bedrich Benes. 2015. Procedural editing of 3D building point clouds. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 2147--2155. DOI:https://doi.org/10.1109/ICCV.2015.248Google ScholarDigital Library
- Vincenzo Deufemia, Michele Risi, and Genoveffa Tortora. 2014. Sketched symbol recognition using latent-dynamic conditional random fields and distance-based clustering. Pattern Recog. 47, 3 (2014), 1159--1171. DOI:https://doi.org/10.1016/j.patcog.2013.09.016Google ScholarDigital Library
- Murray Eden. 1961. On the formalization of handwriting. Amer. Math. Soc. Appl. Math Symp. 12 (1961), 83--88.Google ScholarCross Ref
- Haoshu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2018. Learning pose grammar to encode human body configuration for 3D pose estimation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI’18), the 30th innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 6821--6828.Google Scholar
- Weiguo Feng, Rui Liu, and Ming Zhu. 2014. Fall detection for elderly person care in a vision-based home surveillance environment using a monocular camera. Sig. Image Vid. Proc. 8, 6 (2014), 1129--1138. DOI:https://doi.org/10.1007/s11760-014-0645-4Google ScholarCross Ref
- G. Ferber. 1986. Classifying and validating intermittent EEG patterns with syntactic methods. Pattern Recog. 19, 4 (1986), 289--295. DOI:https://doi.org/10.1016/0031-3203(86)90054-3Google ScholarDigital Library
- Amy Fire and Song-Chun Zhu. 2017. Inferring hidden statuses and actions in video by causal reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). IEEE, 48--56. DOI:https://doi.org/10.1109/CVPRW.2017.13Google ScholarCross Ref
- Mariusz Flasiński and Janusz Jurek. 2014. Fundamental methodological issues of syntactic pattern recognition. Pattern Anal. Applic. 17, 3 (01 Aug. 2014), 465--480. DOI:https://doi.org/10.1007/s10044-013-0322-1Google Scholar
- G. D. Forney. 2001. Codes on graphs: Normal realizations. IEEE Trans. Inf. Theor. 47, 2 (Feb. 2001), 520--548. DOI:https://doi.org/10.1109/18.910573Google ScholarDigital Library
- David A. Forsyth and Jean Ponce. 2002. Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference, Upper Saddle River, NJ.Google ScholarDigital Library
- King-Sun Fu and A. Rosenfeld. 1976. Pattern recognition and image processing. IEEE Trans. Comput. C-25, 12 (Dec. 1976), 1336--1346. DOI:https://doi.org/10.1109/TC.1976.1674602Google Scholar
- Raghudeep Gadde, Renaud Marlet, and Nikos Paragios. 2016. Learning grammars for architecture-specific facade parsing. Int. J. Comput. Vis. 117, 3 (May 2016), 290--316. DOI:https://doi.org/10.1007/s11263-016-0887-4Google ScholarDigital Library
- Zoubin Ghahramani. 2001. An introduction to hidden Markov models and Bayesian networks. Int. J. Pattern Recog. Artif. Intell. 15, 01 (2001), 9--42. DOI:https://doi.org/10.1142/S0218001401000836Google ScholarCross Ref
- Josep M. Gonfaus, Marco Pedersoli, Jordi González, Andrea Vedaldi, and F. Xavier Roca. 2015. Factorized appearances for object detection. Comput. Vis. Image Underst. 138 (2015), 92--101. DOI:https://doi.org/10.1016/j.cviu.2015.04.008Google ScholarDigital Library
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the International Conference on Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672--2680.Google Scholar
- Klaus Greff, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28, 10 (Oct. 2017), 2222--2232. DOI:https://doi.org/10.1109/TNNLS.2016.2582924Google ScholarCross Ref
- Christian Hentschel and Harald Sack. 2014. Does one size really fit all?: Evaluating classifiers in bag-of-visual-words classification. In Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business. ACM, New York, NY.Google ScholarDigital Library
- Geoffrey Hinton, Sara Sabour, and Nicholas Frosst. 2018. Matrix capsules with EM routing. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). ICLR, 1--15.Google Scholar
- Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. 2011. Transforming auto-encoders. In Lecture Notes in Computer Science. Springer Berlin, 44--51. DOI:https://doi.org/10.1007/978-3-642-21735-7_6Google ScholarDigital Library
- Satoshi Ikehata, Hang Yang, and Yasutaka Furukawa. 2015. Structured indoor modeling. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 1323--1331. DOI:https://doi.org/10.1109/ICCV.2015.156Google ScholarDigital Library
- Phillip Isola and Ce Liu. 2013. Scene collaging: Analysis and synthesis of natural images with semantic layers. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). IEEE, Washington, DC, 3048--3055. DOI:https://doi.org/10.1109/ICCV.2013.457Google ScholarDigital Library
- Tommi S. Jaakkola and David Haussler. 1999. Exploiting generative models in discriminative classifiers. In Proceedings of the Conference on Advances in Neural Information Processing Systems. The MIT Press, Cambridge, MA, 487--493. Retrieved from http://dl.acm.org/citation.cfm?idequals;340534.340715.Google Scholar
- A. K. Jain, R. P. W. Duin, and Jianchang Mao. 2000. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1 (Jan. 2000), 4--37. DOI:https://doi.org/10.1109/34.824819Google ScholarDigital Library
- Ahsan Jalal, Ahmad Salman, Ajmal Mian, Mark Shortis, and Faisal Shafait. 2020. Fish detection and species classification in underwater environments using deep learning with temporal information. Ecol. Inform. 57 (May 2020), 101088. DOI:https://doi.org/10.1016/j.ecoinf.2020.101088Google Scholar
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). Association for Computing Machinery, New York, NY, 675--678. DOI:https://doi.org/10.1145/2647868.2654889Google ScholarDigital Library
- Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, and Song-Chun Zhu. 2018. Configurable 3D scene synthesis and 2D image rendering with per-pixel ground truth using stochastic grammars. Int. J. Comput. Vis. 126, 9 (June 2018), 920--941.Google ScholarDigital Library
- Yunsheng Jiang and Jinwen Ma. 2015. Combination features and models for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 240--248.Google Scholar
- Frank D. Julca-Aguilar, Harold Mouchère, Christian Viard-Gaudin, and Nina S. T. Hirata. 2017. A general framework for the recognition of online handwritten graphics. CoRR abs/1709.06389 (2017), 1--14.Google Scholar
- Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, and Ali Farhadi. 2016. A diagram is worth a dozen images. In Computer Vision -- ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 235--251.Google ScholarCross Ref
- Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14). The MIT Press, Cambridge, MA, 3581--3589.Google Scholar
- Russell A. Kirsch. 1964. Computer interpretation of English text and picture patterns. IEEE Trans. Electron. Comput. EC-13, 4 (Aug. 1964), 363--376. DOI:https://doi.org/10.1109/PGEC.1964.263816Google ScholarCross Ref
- Barbara Kitchenham and Stuart Charters. 2007. Guidelines for Performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE 2007-001. Keele University and Durham University Joint Report. Retrieved from http://www.dur.ac.uk/ebse/resources/Systematic-reviews-5-8.pdf.Google Scholar
- W. W. Kong and Surendra Ranganath. 2014. Towards subject independent continuous sign language recognition: A segment and merge approach. Pattern Recog. 47, 3 (2014), 1294--1308. DOI:https://doi.org/10.1016/j.patcog.2013.09.014Google ScholarDigital Library
- Adam Kortylewski, Aleksander Wieczorek, Mario Wieser, Clemens Blumer, Sonali Parbhoo, Andreas Morel-Forster, Volker Roth, and Thomas Vetter. 2019. Greedy structure learning of hierarchical compositional models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). Computer Vision Foundation/IEEE, 11612--11621. DOI:https://doi.org/10.1109/CVPR.2019.01188Google ScholarCross Ref
- Mateusz Koziński, Raghudeep Gadde, Sergey Zagoruyko, Guillaume Obozinski, and Renaud Marlet. 2015. A MRF shape prior for facade parsing with occlusions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 2820--2828. DOI:https://doi.org/10.1109/CVPR.2015.7298899Google ScholarCross Ref
- Mateusz Koziński and Renaud Marlet. 2014. Image parsing with graph grammars and Markov Random Fields applied to facade analysis. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. IEEE, 729--736. DOI:https://doi.org/10.1109/WACV.2014.6836030Google ScholarCross Ref
- Mateusz Koziński, Guillaume Obozinski, and Renaud Marlet. 2015. Beyond procedural facade parsing: Bidirectional alignment via linear programming. In Computer Vision -- ACCV 2014, Daniel Cremers, Ian Reid, Hideo Saito, and Ming-Hsuan Yang (Eds.). Springer International Publishing, Cham, 79--94.Google Scholar
- Volker Krüger and Dennis Herzog. 2013. Tracking in object action space. Comput. Vis. Image Underst. 117, 7 (2013), 764--789. DOI:https://doi.org/10.1016/j.cviu.2013.02.002Google ScholarDigital Library
- Hilde Kuehne, Juergen Gall, and Thomas Serre. 2016. An end-to-end generative framework for video segmentation and recognition. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’16). IEEE, 1--8. DOI:https://doi.org/10.1109/WACV.2016.7477701Google ScholarCross Ref
- Hilde Kuehne, Alexander Richard, and Juergen Gall. 2017. Weakly supervised learning of actions from transcripts. Comput. Vis. Image Underst. 163 (2017), 78--89. DOI:https://doi.org/10.1016/j.cviu.2017.06.004Google ScholarDigital Library
- Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, New York, NY, 2169--2178. DOI:https://doi.org/10.1109/CVPR.2006.68Google ScholarDigital Library
- T. Hoang Ngan Le, ChenChen Zhu, Yutong Zheng, Khoa Luu, and Marios Savvides. 2017. DeepSafeDrive: A grammar-aware driver parsing approach to Driver Behavioral Situational Awareness (DB-SAW). Pattern Recog. 66 (2017), 229--238. DOI:https://doi.org/10.1016/j.patcog.2016.11.028Google ScholarDigital Library
- Kyuhwa Lee, Dimitri Ognibene, Hyung Jin Chang, Tae-Kyun Kim, and Yiannis Demiris. 2015. STARE: Spatio-temporal attention relocation for multiple structured activities detection. IEEE Trans. Image Proc. 24, 12 (Dec. 2015), 5916--5927. DOI:https://doi.org/10.1109/TIP.2015.2487837Google ScholarDigital Library
- Eduardo Lemus, Ernesto Bribiesca, and Edgar Garduno. 2015. Surface trees Representation of boundary surfaces using a tree descriptor. J. Vis. Commun. Image Represent. 31 (2015), 101--111. DOI:https://doi.org/10.1016/j.jvcir.2015.06.004Google ScholarDigital Library
- Bo Li, Yaobin Chen, and Fei-Yue Wang. 2015. Pedestrian detection based on clustered poselet models and hierarchical and-or grammar. IEEE Trans. Vehic. Technol. 64, 4 (Apr. 2015), 1435--1444. DOI:https://doi.org/10.1109/TVT.2014.2331314Google ScholarCross Ref
- Bo Li, Xi Song, Tianfu Wu, Wenze Hu, and Mingtao Pei. 2014. Coupling-and-decoupling: A hierarchical model for occlusion-free object detection. Pattern Recog. 47, 10 (2014), 3254--3264. DOI:https://doi.org/10.1016/j.patcog.2014.04.016Google ScholarCross Ref
- Xilai Li, Xi Song, and Tianfu Wu. 2019. AOGNets: Compositional grammatical architectures for deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, 6220--6230.Google ScholarCross Ref
- Xilai Li, Tianfu Wu, Xi Song, and Hamid Krim. 2017. AOGNets: Deep AND-OR grammar networks for visual recognition. CoRR abs/1711.05847 (2017), 1--12.Google Scholar
- Li Liu, Shu Wang, Yuxin Peng, Zigang Huang, Ming Liu, and Bin Hu. 2016. Mining intricate temporal rules for recognizing complex activities of daily living under uncertainty. Pattern Recog. 60 (2016), 1015--1028. DOI:https://doi.org/10.1016/j.patcog.2016.07.024Google ScholarDigital Library
- Xianming Liu, Rongrong Ji, Changhu Wang, Wei Liu, Bineng Zhong, and Thomas S. Huang. 2015. Understanding image structure via hierarchical shape parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 5042--5050. DOI:https://doi.org/10.1109/CVPR.2015.7299139Google Scholar
- Xiaobai Liu, Yuanlu Xu, Lei Zhu, and Yadong Mu. 2018. A stochastic attribute grammar for robust cross-view human tracking. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (Oct. 2018), 2884--2895. DOI:https://doi.org/10.1109/TCSVT.2017.2781738Google Scholar
- Xiaobai Liu, Yibiao Zhao, and Song-Chun Zhu. 2014. Single-view 3D scene parsing by attributed grammar. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 684--691. DOI:https://doi.org/10.1109/CVPR.2014.93Google ScholarDigital Library
- Xiaobai Liu, Yibiao Zhao, and Song-Chun Zhu. 2018. Single-view 3D scene reconstruction and parsing by attribute grammar. IEEE Trans. Pattern Anal. Mach. Intell. 40, 3 (Mar. 2018), 710--725. DOI:https://doi.org/10.1109/TPAMI.2017.2689007Google ScholarCross Ref
- Yang Lu, Tianfu Wu, and Song-Chun Zhu. 2014. Online object tracking, learning, and parsing with and-or graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3462--3469. DOI:https://doi.org/10.1109/CVPR.2014.443Google ScholarDigital Library
- Andelo Martinovic and Luc Van Gool. 2013. Early Parsing for 2D Stochastic Context Free Grammars. Technical Report KUL/ESAT/PSI/1301. Department of Electrical Engineering (ESAT), University Hospital Gasthuisberg, Kasteelpark Arenberg, België.Google Scholar
- Andelo Martinovic and Luc Van Gool. 2013. Bayesian grammar learning for inverse procedural modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE Computer Society, Washington, DC, 201--208. DOI:https://doi.org/10.1109/CVPR.2013.33Google ScholarDigital Library
- Lilyana Mihalkova, Tuyen Huynh, and Raymond J. Mooney. 2007. Mapping and revising Markov logic networks for transfer learning. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI’07). AAAI Press, 608--614. Retrieved from http://dl.acm.org/citation.cfm?idequals;1619645.1619743.Google Scholar
- Darnell Moore and Irfan Essa. 2002. Recognizing multitasked activities from video using stochastic context-free grammar. In Proceedings of the 18th National Conference on Artificial Intelligence. American Association for Artificial Intelligence, 770--776.Google Scholar
- Louis-Philippe Morency, Ariadna Quattoni, and Trevor Darrell. 2007. Latent-dynamic discriminative models for continuous gesture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8. DOI:https://doi.org/10.1109/CVPR.2007.383299Google ScholarCross Ref
- R. Narasimhan. 1962. A Linguistic Approach to Pattern Recognition. Technical Report 121. Digital Computer Laboratory, University of Illinois, Urbana, IL.Google Scholar
- Andrew Y. Ng and Michael I. Jordan. 2001. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). The MIT Press, Cambridge, MA, 841--848.Google Scholar
- Andrew Y. Ng and Michael I. Jordan. 2001. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). The MIT Press, Cambridge, MA, 841--848.Google Scholar
- T. Ojala, M. Pietikainen, and D. Harwood. 1994. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In Proceedings of 12th International Conference on Pattern Recognition. IEEE, 582--585. DOI:https://doi.org/10.1109/ICPR.1994.576366Google ScholarCross Ref
- Eray Özkural. 2014. An application of stochastic context sensitive grammar induction to transfer learning. In Artificial General Intelligence, Ben Goertzel, Laurent Orseau, and Javier Snaider (Eds.). Springer International Publishing, Cham, 121--132.Google Scholar
- Seyoung Park, Bruce Xiaohan Nie, and Song-Chun Zhu. 2018. Attribute and-or grammar for joint parsing of human pose, parts and attributes. IEEE Trans. Pattern Anal. Mach. Intell. 40, 7 (July 2018), 1555--1569. DOI:https://doi.org/10.1109/TPAMI.2017.2731842Google ScholarCross Ref
- Seyoung Park and Song-Chun Zhu. 2015. Attributed grammars for joint estimation of human attributes, part and pose. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 2372--2380. DOI:https://doi.org/10.1109/ICCV.2015.273Google ScholarDigital Library
- Ricardo Wandré Dias Pedro, Fátima L. S. Nunes, and Ariane Machado-Lima. 2013. Using grammars for pattern recognition in images: A systematic review. ACM Comput. Surv. 46, 2 (Nov. 2013). DOI:https://doi.org/10.1145/2543581.2543593Google Scholar
- Mingtao Pei, Zhangzhang Si, Benjamin Z. Yao, and Song-Chun Zhu. 2013. Learning and parsing video events with goal and intent prediction. Comput. Vis. Image Underst. 117, 10 (Oct. 2013), 1369--1383. DOI:https://doi.org/10.1016/j.cviu.2012.12.003Google ScholarDigital Library
- John L. Pfaltz and Azriel Rosenfeld. 1969. Web grammars. In Proceedings of the 1st International Joint Conference on Artificial Intelligence (IJCAI’69). Morgan Kaufmann Publishers Inc., San Francisco, CA, 609--619. Retrieved from http://dl.acm.org/citation.cfm?idequals;1624562.1624616.Google Scholar
- Hamed Pirsiavash and Deva Ramanan. 2014. Parsing videos of actions with segmental grammars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE Computer Society, Washington, DC, 612--619. DOI:https://doi.org/10.1109/CVPR.2014.85Google ScholarDigital Library
- Hemerson Pistori, Andrew Calway, and Peter Flach. 2013. A new strategy for applying grammatical inference to image classification problems. In Proceedings of the IEEE International Conference on Industrial Technology (ICIT’13). IEEE, 1032--1037.Google ScholarCross Ref
- Siyuan Qi, Siyuan Huang, Ping Wei, and Song-Chun Zhu. 2017. Predicting human activities using stochastic grammar. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 1173--1181. DOI:https://doi.org/10.1109/iccv.2017.132Google ScholarCross Ref
- Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, and Song-Chun Zhu. 2018. Human-centric indoor scene synthesis using stochastic grammar. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 5899--5908.Google ScholarCross Ref
- Christian P. Robert and George Casella. 1999. The Metropolis—Hastings algorithm. In Springer Texts in Statistics. Springer New York, New York, NY, 231--283. DOI:https://doi.org/10.1007/978-1-4757-3071-5_6Google Scholar
- Antonio Foncubierta Rodríguez, Henning Müller, and Adrien Depeursinge. 2017. From visual words to a visual grammar: Using language modelling for image classification. CoRR abs/1703.05571 (2017), 1--17.Google Scholar
- Brandon Rothrock, Seyoung Park, and Song-Chun Zhu. 2013. Integrating grammar and segmentation for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3214--3221. DOI:https://doi.org/10.1109/CVPR.2013.413Google ScholarDigital Library
- Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, 3859--3869.Google Scholar
- Anderson Santos, José Marcato Junior, Jonathan de Andrade Silva, Rodrigo Pereira, Daniel Matos, Geazy Menezes, Leandro Higa, Anette Eltner, Ana Paula Ramos, Lucas Osco, and Wesley Gonçalves. 2020. Storm-drain and manhole detection using the RetinaNet method. Sensors 20, 16 (Aug. 2020), 4450. DOI:https://doi.org/10.3390/s20164450Google ScholarCross Ref
- Sunita Sarawagi and William W. Cohen. 2004. Semi-Markov conditional random fields for information extraction. In Proceedings of the 17th International Conference on Neural Information Processing Systems. The MIT Press, Cambridge, MA, 1185--1192. Retrieved from http://dl.acm.org/citation.cfm?idequals;2976040.2976189.Google Scholar
- M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Sig. Proc. 45, 11 (1997), 2673--2681. DOI:https://doi.org/10.1109/78.650093Google ScholarDigital Library
- Ricky J. Sethi and Amit K. Roy-Chowdhury. 2010. Modeling and recognition of complex multi-person interactions in video. In Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis (MPVA’10). ACM, New York, NY, 43--46. DOI:https://doi.org/10.1145/1878039.1878049Google Scholar
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). ICLR, 1--14.Google Scholar
- Kenneth Slonneger and Barry Kurtz. 1995. Formal Syntax and Semantics of Programming Languages: A Laboratory Based Approach (1st ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA.Google Scholar
- Xi Song, Tianfu Wu, Yunde Jia, and Song-Chun Zhu. 2013. Discriminatively trained and-or tree models for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3278--3285. DOI:https://doi.org/10.1109/CVPR.2013.421Google ScholarDigital Library
- George Stiny and James Gips. 1971. Shape grammars and the generative specification of painting and sculpture. In Information Processing, Proceedings of IFIP Congress, Vol. 2. Elsevier, North Holland Publishing Co., 1460--1465.Google Scholar
- Domen Tabernik, Matej Kristan, Jeremy L. Wyatt, and Ales Leonardis. 2016. Towards deep compositional networks. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, 3470--3475. DOI:https://doi.org/10.1109/ICPR.2016.7900171Google ScholarCross Ref
- Domen Tabernik, Aleš Leonardis, Marko Boben, Danijel Skočaj, and Matej Kristan. 2015. Adding discriminative power to a generative hierarchical compositional model using histograms of compositions. Comput. Vis. Image Underst. 138, C (Sept. 2015), 102--113. DOI:https://doi.org/10.1016/j.cviu.2015.04.006Google Scholar
- Jawad Tayyub, Majd Hawasly, David C. Hogg, and Anthony G. Cohn. 2018. Learning hierarchical models of complex daily activities from annotated videos. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’18). IEEE, 1633--1641. DOI:https://doi.org/10.1109/WACV.2018.00182Google Scholar
- Olivier Teboul, Iasonas Kokkinos, Loic Simon, Panagiotis Koutsourakis, and Nikos Paragios. 2011. Shape grammar parsing via reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE Computer Society, Washington, DC, 2273--2280. DOI:https://doi.org/10.1109/CVPR.2011.5995319Google ScholarDigital Library
- Olivier Teboul, Iasonas Kokkinos, Loic Simon, Panagiotis Koutsourakis, and Nikos Paragios. 2013. Parsing facades with shape grammars and reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 35, 7 (July 2013), 1744--1756. DOI:https://doi.org/10.1109/TPAMI.2012.252Google ScholarDigital Library
- Everton Castelão Tetila, Bruno Brandoli Machado, Gilberto Astolfi, Nícolas Alessandro de Souza Belete, Willian Paraguassu Amorim, Antonia Railda Roel, and Hemerson Pistori. 2020. Detection and classification of soybean pests using deep learning with UAV images. Comput. Electron. Agric. 179 (2020), 105836. DOI:https://doi.org/10.1016/j.compag.2020.105836Google ScholarCross Ref
- Bin Tian, Ming Tang, and Fei-Yue Wang. 2015. Vehicle detection grammars with partial occlusion handling for traffic surveillance. Transport. Res. Part C: Emerg. Technol. 56 (2015), 80--93. DOI:https://doi.org/10.1016/j.trc.2015.02.020Google ScholarCross Ref
- Nam N. Vo and Aaron F. Bobick. 2014. From stochastic grammar to Bayes network: Probabilistic parsing of complex activity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2641--2648.Google Scholar
- Nam N. Vo and Aaron F. Bobick. 2016. Sequential interval network for parsing complex structured activity. Comput. Vis. Image Underst. 143 (2016), 147--158. DOI:https://doi.org/10.1016/j.cviu.2015.07.006Google ScholarDigital Library
- Michael Walton, Doug Lange, and Song-Chun Zhu. 2017. Inferring context through scene understanding. In Proceedings of the AAAI Spring Symposium Series. AAAI Press, 356--360.Google Scholar
- Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2013. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103, 1 (May 2013), 60--79. DOI:https://doi.org/10.1007/s11263-012-0594-8Google ScholarCross Ref
- Wenguan Wang, Wenguan Wang, Yuanlu Xu, Jianbing Shen, and Song-Chun Zhu. 2018. Attentive fashion grammar network for fashion landmark detection and clothing category classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 4271--4280.Google ScholarCross Ref
- Julien Weissenberg, Hayko Riemenschneider, Mukta Prasad, and Luc Van Gool. 2013. Is there a procedural logic to architecture? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Washington, DC, 185--192. DOI:https://doi.org/10.1109/CVPR.2013.31Google ScholarDigital Library
- A. D. Wilson and A. F. Bobick. 1999. Parametric hidden Markov models for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21, 9 (Sep. 1999), 884--900. DOI:https://doi.org/10.1109/34.790429Google ScholarDigital Library
- David Windridge, Josef Kittler, Teofilo de Campos, Fei Yan, William Christmas, and Aftab Khan. 2015. A novel Markov logic rule induction strategy for characterizing sports video footage. IEEE MultiMedia 22, 2 (Apr. 2015), 24--35. DOI:https://doi.org/10.1109/MMUL.2014.36Google ScholarDigital Library
- Bingwei Wu. 2013. Two-dimensional (2D) Languages and Application to Handwritten Graphical Parsing. Technical Report. Ecole Polytechnique de l’université de Nantes. Retrieved from https://hal.archives-ouvertes.fr/hal-00861080.Google Scholar
- Ying Nian Wu, Zhangzhang Si, Haifeng Gong, and Song-Chun Zhu. 2009. Learning active basis model for object detection and recognition. Int. J. Comput. Vis. 90, 2 (Aug. 2009), 198--235. DOI:https://doi.org/10.1007/s11263-009-0287-0Google Scholar
- Xianglei Xing, Tianfu Wu, Song-Chun Zhu, and Ying Nian Wu. 2020. Inducing hierarchical compositional model by sparsifying generator network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 14284--14293. DOI:https://doi.org/10.1109/CVPR42600.2020.01430Google ScholarCross Ref
- Xianglei Xing, Song-Chun Zhu, and Ying Nian Wu. 2019. Inducing sparse coding and And-Or grammar from generator network. In Proceedings of the AAAI Conference on Artificial Intelligence, Workshop on Network Interpretability for Deep Learning. AAAI Press, 1--4.Google Scholar
- Yuanlu Xu, Lei Qin, Xiaobai Liu, Jianwen Xie, and Song-Chun Zhu. 2018. A causal and-or graph model for visibility fluent reasoning in tracking interacting objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2178--2187. DOI:https://doi.org/10.1109/CVPR.2018.00232Google ScholarCross Ref
- M. S. Zarchi, R. T. Tan, C. van Gemeren, A. Monadjemi, and R. C. Veltkamp. 2016. Understanding image concepts using ISTOP model. Pattern Recog. 53, C (May 2016), 174--183. DOI:https://doi.org/10.1016/j.patcog.2015.11.010Google Scholar
- Yibiao Zhao and Song-Chun Zhu. 2013. Scene parsing by integrating function, geometry and appearance models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3119--3126. DOI:https://doi.org/10.1109/CVPR.2013.401Google ScholarDigital Library
- Y. Zhu, N. Nayak, U. Gaur, B. Song, and A. Roy-Chowdhury. 2013. Modeling multi-object interactions using string of feature graphs. Comput. Vis. Image Underst. 117, 10 (2013), 1313--1328. DOI:https://doi.org/10.1016/j.cviu.2012.08.009Google ScholarDigital Library
- Bartosz Zieliński, Marek Skomorowski, Wadim Wojciechowski, Mariusz Korkosz, and Kamila Sprężak. 2015. Computer aided erosions and osteophytes detection based on hand radiographs. Pattern Recog. 48, 7 (2015), 2304--2317. DOI:https://doi.org/10.1016/j.patcog.2015.01.018Google ScholarDigital Library
Index Terms
- Syntactic Pattern Recognition in Computer Vision: A Systematic Review
Recommendations
Using grammars for pattern recognition in images: A systematic review
Grammars are widely used to describe string languages such as programming and natural languages and, more recently, biosequences. Moreover, since the 1980s grammars have been used in computer vision and related areas. Some factors accountable for this ...
Syntactic Pattern Recognition of the ECG
An application of the syntactic method to electrocardiogram (ECG) pattern recognition and parameter measurement is presented. Solutions to the related problems of primitive pattern selection, primitive pattern extraction, linguistic representation, and ...
Inference of Parsable Graph Grammars for Syntactic Pattern Recognition
A research into a syntactic pattern recognition model based on (edNLC) graph grammars (introduced and investigated in Janssens and Rozenberg Inform. Sci. 20 (1980), 191-216, and Janssens, Rozenberg and Verraedt Comp. Vis. Graph. Image Process. 18 (1982),...
Comments