Abstract
Scene understanding is a popular research direction. In this area, many attempts focus on the problem of naming objects in the complex natural scene, and visual semantic integration model (VSIM) is the representative. This model consists of two parts: semantic level and visual level. In the first level, it uses a four-level pachinko allocation model (PAM) to capture the semantics behind images. However, this four-level PAM is inflexible and lacks of considerations of common subtopics that represent the background semantics. To address these problems, we use hierarchical PAM (hPAM) to replace PAM. Since hPAM is flexible, we investigate two variations of hPAM to boost VSIM in this paper. We derive the Gibbs sampler to learn the proposed models. Empirical results validate that our works can obtain better performance than the state-of-the-art algorithms.
Similar content being viewed by others
Notes
downloaded at http://cs.brown.edu/pff/
downloaded at http://vision.stanford.edu/projects/totalscene/
downloaded at http://people.csail.mit.edu/myungjin/HContext.html
References
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Boureau YL, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognitionral scene categories. In: Conference on Computer Vision and Pattern Recognition, pp 2559–2566
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recog 37(9):1757–1771
Chakraborty I, Elgammal A (2013) Visual-semantic scene understanding by sharing labels in a context network. CoRR
Choi MJ, Lim JJ, Torralba A, Willsky AS (2010) Exploiting hierarchical context on a large database of object categories. In: Conference on Computer Vision and Pattern Recognition, pp 129–136
Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Conference on Computer Vision and Pattern Recognition, pp 524–531
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Frnkranz J, Hllermeier E, Menca EL, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153
Griffiths TL, Steyvers M (2004) Finding scientific topics. In: National academy of Sciences of the United States of America, vol. 101, pp 5228–5235
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Conference on Computer Vision and Pattern Recognition, pp 2169–2178
Li LJ, Socher R, Li FF (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: Conference on Computer Vision and Pattern Recognition, pp 2036–2043
Li W, McCallum A (2006) Pachinko allocation: Dag-structured mixture models of topic correlations. In: International Conference on Machine Learning, pp 577–584
Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: International Conference on Computer Vision, pp 2486–2493
Malisiewicz TJ, Huang JC, Efros AA (2006) Detecting objects via multiple segmentations and latent topic models. Carnegie Mellon University Tech Report
Mimno D, Li W, McCallum A (2007) Mixtures of hierarchical topics with pachinko allocation. In: International Conference on Machine Learning, pp 633–640
Rasiwasia N, Vasconcelos N (2013) Latent Dirichlet allocation models for image classification. IEEE Trans Pattern Anal Mach Intell 35(11):2665–2679
Russakovsky O, Lin Y, Yu K, Fei-Fei L (2012) Locality-constrained linear coding for image classification. In: European conference on Computer Vision, pp 1–15
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Conference on Computer Vision and Pattern Recognition, pp 3360–3367
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Conference on Computer Vision and Pattern Recognition, pp 1794–1801
Yang Y, Huang Z, Shen HT, Zhou X (2011) Mining multi-tag association for image tagging. World Wide Web J 14(2):133–156
Yang Y, Huang Z, Yang Y, Shen HT, Luo J (2013) Local image tagging via graph regularized joint group sparsity. Pattern Recog 46(5):1358–1368
Yang Y, Yang Y, Shen HT (2013) Effective transfer tagging from image to video. ACM Trans Multimedia Comput Commun Appl 9 (2). Article No. 14
Yang Y, Zha ZJ, Gao Y, Zhu X, Chua TS (2014) Exploiting web images for robust semantic video indexing via sample-specific loss. IEEE Trans Multimedia 16(6):1677–1689
Zhang L, Gao Y, Hong C, Feng Y, Zhu J, Cai D (2014) Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE Trans Cybernetics 44(8): 1408–1419
Zhang L, Gao Y, Xia Y, Dai Q, Li X (2014) A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. IEEE Transactions on Industrial Electronics
Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discrminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22 (12):5071–5084
Zhang L, Ji R, Xia Y, Zhang Y, Li X (2014) Learning a probabilistic topology discovering model for scene categorization. IEEE Transactions on Neural Networks and Learning Systems PP(99)
Zhang L, Song M, Deng X, Bu J, Chen C (2011) Large-scale outdoor scene classification by boosting a set of highly discriminative and low redundant graphlets. In: IEEE International Conference on Data Mining Workshops, pp 847–852
Acknowledgments
This work was supported by National Nature Science Foundation of China (NSFC) under the Grant No. 61170092, 61133011, and 61103091.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ouyang, J., Li, X. & Li, H. Boosting scene understanding by hierarchical pachinko allocation. Multimed Tools Appl 75, 12581–12595 (2016). https://doi.org/10.1007/s11042-014-2414-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2414-3