Abstract
The problem of generating textual descriptions for the visual data has gained research attention in the recent years. In contrast to that the problem of generating visual data from textual descriptions is still very challenging, because it requires the combination of both Natural Language Processing (NLP) and Computer Vision techniques. The existing methods utilize the Generative Adversarial Networks (GANs) and generate the uncompressed images from textual description. However, in practice, most of the visual data are processed and transmitted in the compressed representation. Hence, the proposed work attempts to generate the visual data directly in the compressed representation form using Deep Convolutional GANs (DCGANs) to achieve the storage and computational efficiency. We propose GAN models for compressed image generation from text. The first model is directly trained with JPEG compressed DCT images (compressed domain) to generate the compressed images from text descriptions. The second model is trained with RGB images (pixel domain) to generate JPEG compressed DCT representation from text descriptions. The proposed models are tested on an open source benchmark dataset Oxford-102 Flower images using both RGB and JPEG compressed versions, and accomplished the state-of-the-art performance in the JPEG compressed domain. The code will be publicly released at GitHub after acceptance of paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ashual, O., Wolf, L.: Specifying object attributes and relations in interactive scene generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4561–4569 (2019)
Bell, T., Adjeroh, D., Mukherjee, A.: Pattern matching in compressed texts and images (2001)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Gueguen, L., Sergeev, A., Kadlec, B., Liu, R., Yosinski, J.: Faster neural networks straight from jpeg. In: Advances in Neural Information Processing Systems, pp. 3933–3944 (2018)
He, S., et al.: Context-aware layout to image generation with enhanced object appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15049–15058 (2021)
Jason, B.: Tips for training stable generative adversarial networks (2019). https://machinelearningmastery.com/how-to-train-stable-generative-adversarial-networks/
Javed, M., Nagabhushan, P., Chaudhuri, B.B.: Extraction of line-word-character segments directly from run-length compressed printed text-documents. In: 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), pp. 1–4. IEEE (2013)
Javed, M., Nagabhushan, P., Chaudhuri, B.: Extraction of projection profile, run-histogram and entropy features straight from run-length compressed text-documents. arXiv preprint arXiv:1404.0627 (2014)
Javed, M., Nagabhushan, P., Chaudhuri, B.B.: A review on document image analysis techniques directly in the compressed domain. Artif. Intell. Rev. 50(4), 539–568 (2018)
Javed, M., Nagabhushan, P., Chaudhuri, B.B.: A direct approach for word and character segmentation in run-length compressed documents with an application to word spotting. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 216–220. IEEE (2015)
Kang, B., Tripathi, S., Nguyen, T.Q.: Generating images in compressed domain using generative adversarial networks. IEEE Access 8, 180977–180991 (2020). https://doi.org/10.1109/ACCESS.2020.3027800
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Mukhopadhyay, J.: Image and video processing in the compressed domain. Chapman and Hall/CRC (2011)
Nilsback, M.E., Zisserman, A.: 102 category flower dataset (2008). https://www.robots.ox.ac.uk/~vgg/data/flowers/102/
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543 (2014)
Qiao, T., Zhang, J., Xu, D., Tao, D.: MIRRORGAN: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)
de Queiroz, R.L., Eschbach, R.: Fast segmentation of the jpeg-compressed documents. J. Electron. Imaging 7(2), 367–378 (1998)
Rajesh, B., Javed, M., Ratnesh, Srivastava, S.: DCT-compCNN: a novel image classification network using jpeg compressed DCT coefficients. In: 2019 IEEE Conference on Information and Communication Technology, pp. 1–6 (2019). https://doi.org/10.1109/CICT48419.2019.9066242
Rajesh, B., Javed, M., Nagabhushan, P.: Automatic tracing and extraction of text-line and word segments directly in jpeg compressed document images. IET Image Processing, April 2020
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. Adv. Neural. Inf. Process. Syst. 29, 2234–2242 (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Tompkins, D.A., Kossentini, F.: A fast segmentation algorithm for bi-level image compression using jbig2. In: Proceedings. 1999 International Conference on Image Processing, 1999. ICIP 99, vol. 1, pp. 224–228. IEEE (1999)
Vovk, V.: The fundamental nature of the log loss function. In: Beklemishev, L.D., Blass, A., Dershowitz, N., Finkbeiner, B., Schulte, W. (eds.) Fields of Logic and Computation II. LNCS, vol. 9300, pp. 307–318. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23534-9_20
Xu, T., et al.: ATTNGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Yin, G., Liu, B., Sheng, L., Yu, N., Wang, X., Shao, J.: Semantics disentangling for text-to-image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2327–2336 (2019)
Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rajesh, B., Dusa, N., Javed, M., Dubey, S.R., Nagabhushan, P. (2023). T2CI-GAN: Text to Compressed Image Generation Using Generative Adversarial Network. In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1777. Springer, Cham. https://doi.org/10.1007/978-3-031-31417-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-31417-9_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31416-2
Online ISBN: 978-3-031-31417-9
eBook Packages: Computer ScienceComputer Science (R0)