Abstract
Image composition has always been one of the primary considerations for professional photography, and good composition can greatly enhance the perception of an image. However, it is difficult for most ordinary users to compose images quickly. To solve this challenge, we propose a new view adjustment method to improve image composition in this paper, assisting users in adjusting views promptly. We establish a composition feature extraction module, which can extract composition information of images based on professional photographic composition rules. Based on this module, we construct an image aesthetic assessment network to score the adjustment results. Additionally, we employ the concept of the hyper-net in our design to build a parameter generation-prediction network, enabling the application of view adjustment rules. In addition, we create a usable view adjustment dataset based on an existing image cropping dataset, which includes samples of displacement adjustment, scale adjustment, rotation adjustment, and their combined adjustment types to facilitate model training. Through experimentation, our model has demonstrated outstanding performance with an MAE score of 0.0392, an MSE score of 0.0037, and an IoU score of 0.783. Experimental results indicate that our proposed view adjustment method effectively improves the image composition, enabling real-time guidance for users to capture well-composed photographs.










Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
References
Zhong, L., Li, F.-H., Huang, H.-Z., Zhang, Y., Lu, S.-P., Wang, J.: Aesthetic-guided outward image cropping. ACM Trans Graphics 40(6), 1–13 (2021)
Su, Y.-C., Vemulapalli, R., Weiss, B., Chu, C.-T., Mansfield, P. A., ShapiraL., & Pitts, C. (2021). Camera View Adjustment Prediction for Improving Image Composition. arXiv:2104.07608 [Cs]. http://arxiv.org/abs/2104.07608
Wang, Y., Ke, Y., Wang, K., Guo, J., Qin, F.: A composition-oriented aesthetic view recommendation network supervised by the simplified golden ratio theory. Expert Syst. Appl. 195, 116500 (2022). https://doi.org/10.1016/j.eswa.2022.116500
Niu, Y., Chen, S., Song, B., Chen, Z., Liu, W.: Comment-guided semantics-aware image aesthetics assessment. IEEE Trans. Circuits Syst. Video Technol. 33(3), 1487–1492 (2023). https://doi.org/10.1109/TCSVT.2022.3201510
Li, L., Huang, Y., Wu, J., Yang, Y., Li, Y., Guo, Y., Shi, G.: Theme-aware visual attribute reasoning for image aesthetics assessment. IEEE Trans. Circuits Syst. Video Technol. 33(9), 4798–4811 (2023). https://doi.org/10.1109/TCSVT.2023.3249185
Nie, X., Hu, B., Gao, X., Li, L., Zhang, X., & Xiao, B. (2023). BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment. Proceedings of the 31st ACM International Conference on Multimedia. pp 5514–5522. https://doi.org/10.1145/3581783.3611996
Kosugi, S., Yamasaki, T.: Crowd-powered photo enhancement featuring an active learning-based local filter. IEEE Trans. Circuits Syst. Video Technol. 33(7), 3145–3158 (2023). https://doi.org/10.1109/TCSVT.2023.3233989
Xu, Y., Xu, W., Wang, M., Li, L., Sang, G., Wei, P., Zhu, L.: Saliency-aware image cropping with latent region pair. Expert Syst. Appl. 171(114596), Q1 (2021). https://doi.org/10.1016/j.eswa.2021.114596
Hong, C., Du, S., Xian, K., Lu, H., Cao, Z., Zhong, W.: Composing photos like a photographer. IEEE/CVF Conf Comput Vision Pattern Recogn (CVPR) 2021, 7053–7062 (2021). https://doi.org/10.1109/CVPR46437.2021.00698
Celona, L., Ciocca, G., Napoletano, P.: A grid anchor-based cropping approach exploiting image aesthetics, geometric composition, and semantics. Expert Syst. Appl. 186(115852), Q1 (2021). https://doi.org/10.1016/j.eswa.2021.115852
Lu, P., Zhang, H., Peng, X., Jin, X.: Learning the relation between interested objects and aesthetic region for image cropping. IEEE Trans. Multimed 23(3618–3630), Q1 (2021). https://doi.org/10.1109/TMM.2020.3029882
Zhang, Y., Li, X., Li, X.: Reinforcement learning cropping method based on comprehensive feature and aesthetics assessment. IET Image Process. 16(5), 1415–1423 (2022). https://doi.org/10.1049/ipr2.12420
Horanyi, N., Xia, K., Yi, K.M., Bojja, A.K., Leonardis, A., Chang, H.J.: Repurposing existing deep networks for caption and aesthetic-guided image cropping. Pattern Recogn. 126, 108485 (2022). https://doi.org/10.1016/j.patcog.2021.108485
Zhang, X., Li, Z., Jiang, J.: Emotion attention-aware collaborative deep reinforcement learning for image cropping. IEEE Trans. Multimed 23(2545–2560), Q1 (2021). https://doi.org/10.1109/TMM.2020.3013350
Jia, G., Huang, H., Fu, C., He, R.: Rethinking image cropping: exploring diverse compositions from global views. IEEE/CVF Conf Comput Vision Pattern Recogn (CVPR) 2022, 2436–2445 (2022). https://doi.org/10.1109/CVPR52688.2022.00248
Chang, H., Zhang, H., Jiang, L., Liu, C., Freeman, W.T.: MaskGIT: masked generative image transformer. IEEE/CVF Conf Comput Vision Pattern Recogn (CVPR) 2022, 11305–11315 (2022). https://doi.org/10.1109/CVPR52688.2022.01103
Li, X., Ren, Y., Ren, H., Shi, C., Zhang, X., Wang, L., Mumtaz, I., Wu, X.: Perceptual image outpainting assisted by low-level feature fusion and multi-patch discriminator. Comput Mater Cont. 71(3), 5021–5037 (2022). https://doi.org/10.32604/cmc.2022.023071
Yang, C.-A., Tan, C.-Y., Fan, W.-C., Yang, C.-F., Wu, M.-L., & Wang, Y.-C. F. (2022). Scene Graph Expansion for Semantics-Guided Image Outpainting (arXiv:2205.02958; Version 1). arXiv. http://arxiv.org/abs/2205.02958
Wei, G., Guo, J., Ke, Y., Wang, K., Yang, S., Sheng, N.: A three-stage GAN model based on edge and color prediction for image outpainting. Expert Syst. Appl. 214, 119136 (2023). https://doi.org/10.1016/j.eswa.2022.119136
Ke, Y., Sheng, N., Wei, G., Wang, K., Qin, F., Guo, J.: Subject-aware image outpainting. Signal Image Video Process 17(5), 2661–2669 (2023). https://doi.org/10.1007/s11760-022-02444-4
Klocek, S., Maziarka, Ł., Wołczyk, M., Tabor, J., Nowak, J., & Śmieja, M. (2019). Hypernetwork Functional Image Representation (I. V. Tetko, V. Kůrková, P. Karpov, & F. Theis, Eds.; Vol. 11731, pp. 496–510). Springer International Publishing. https://doi.org/10.1007/978-3-030-30493-5_48
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition (arXiv:1512.03385). arXiv. http://arxiv.org/abs/1512.03385
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 8691, 346–361 (2014). https://doi.org/10.1007/978-3-319-10578-9_23
Zhang, B., Niu, L., & Zhang, L. (2021). Image Composition Assessment with Saliency-augmented Multi-pattern Pooling (arXiv:2104.03133). arXiv. http://arxiv.org/abs/2104.03133
Zeng, H., Li, L., Cao, Z., Zhang, L.: Reliable and efficient image cropping: a grid anchor based approach. IEEE/CVF Conf Comput Vision Pattern Recogn (CVPR) 2019, 5942–5950 (2019). https://doi.org/10.1109/CVPR.2019.00610
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, Li.: ImageNet: a large-scale hierarchical image database. IEEE Conf Comput Vision Pattern Recogn 2009, 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. CoRR. https://www.semanticscholar.org/paper/Adam%3A-A-Method-for-Stochastic-Optimization-Kingma-Ba/a6cb366736791bcccc5c8639de5a8f9636bf87e8
Chen, Y.-L., Klopp, J., Sun, M., Chien, S.-Y., & Ma, K.-L. (2017). Learning to Compose with Professional Photographs on the Web. Proceedings of the 25th ACM International Conference on Multimedia. pp 37–45. https://doi.org/10.1145/3123266.3123274
Wei, Z., Zhang, J., Shen, X., Lin, Z., Mech, R., Hoai, M., Samaras, D.: Good view hunting: learning photo composition from dense view pairs. IEEE/CVF Conf Comput Vision Pattern Recogn 2018, 5437–5446 (2018). https://doi.org/10.1109/CVPR.2018.00570
Li, D., Zhang, J., Huang, K., Yang, M.-H.: Composing good shots by exploiting mutual relations. IEEE/CVF Conf Comput Vision Pattern Recogn (CVPR) 2020, 4212–4221 (2020). https://doi.org/10.1109/CVPR42600.2020.00427
Li, D., Wu, H., Zhang, J., Huang, K.: A2-RL: aesthetics aware reinforcement learning for image cropping. IEEE/CVF Conf Comput Vision Pattern Recogn 2018, 8193–8201 (2018). https://doi.org/10.1109/CVPR.2018.00855
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. IEEE Conf Comput Vision Pattern Recogn 2009, 1597–1604 (2009). https://doi.org/10.1109/CVPR.2009.5206596
Wang, C., Niu, L., Zhang, B., Zhang, L.: Image cropping with spatial-aware feature and rank consistency. IEEE/CVF Conf Comput Vision Pattern Recogn (CVPR) 2023, 10052–10061 (2023)
Shi, T., Chen, C., He, Y., Song, W., Hao, A.: Joint probability distribution regression for image cropping. IEEE Int Conf Image Process (ICIP) 2023, 990–994 (2023)
Zhong, Z., Cheng, M., Wu, Z., Yuan, Y., Zheng, Y., Li, J., Hu, H., Lin, S., Sato, Y., Sato, I.: ClipCrop: conditioned cropping driven by vision-language model. IEEE/CVF Int Conf Comput Vision Workshops (ICCVW) 2023, 294–304 (2022)
Author information
Authors and Affiliations
Contributions
Nan Sheng: methodology, software, writing—original draft. Yongzhen Ke: conceptualization, methodology, supervision, project administration,writing—review & editing. Shuai Yang: software, writing—review & editing, formal analysis, visualization. Yong Yang: methodology, writing—review & editing. Liming Chen: resources, validation, visualization.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Communicated by Bing-kun Bao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sheng, N., Ke, Y., Yang, S. et al. View adjustment: helping users improve photographic composition. Multimedia Systems 30, 293 (2024). https://doi.org/10.1007/s00530-024-01490-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01490-x