LSTMVAEF: Vivid Layout via LSTM-Based Variational Autoencoder Framework

He, Jie; Wu, Xingjiao; Hu, Wenxin; Yang, Jing

doi:10.1007/978-3-030-86331-9_12

Jie He ORCID: orcid.org/0000-0002-9308-8886¹¹,
Xingjiao Wu ORCID: orcid.org/0000-0001-9146-051X¹¹,
Wenxin Hu¹¹ &
…
Jing Yang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12822))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3730 Accesses

Abstract

The lack of training data is still a challenge in the Document Layout Analysis task (DLA). Synthetic data is an effective way to tackle this challenge. In this paper, we propose an LSTM-based Variational Autoencoder framework (LSTMVAF) to synthesize layouts for DLA. Compared with the previous method, our method can generate more complicated layouts and only need training data from DLA without extra annotation. We use LSTM models as basic models to learn the potential representing of class and position information of elements within a page. It is worth mentioning that we design a weight adaptation strategy to help model train faster. The experiment shows our model can generate more vivid layouts that only need a few real document pages.

J. He and X. Wu—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Learning for Document Layout Generation: A First Reproducible Quantitative Evaluation and a Baseline Model

Self-refined variational transformer for image-conditioned layout generation

Article 16 September 2024

Data Synthesis for Document Layout Analysis

Notes

1.
https://github.com/Pandooora/LSTMVAEFD.

References

Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)
Google Scholar
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Chapter Google Scholar
Clark, C.A., Divvala, S.: Looking beyond text: extracting figures, tables and captions from computer science papers. In: AAAI (2015)
Google Scholar
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Article Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (1999)
Article Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS (2014)
Google Scholar
Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273–278. IEEE (2013)
Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: NIPS, pp. 5767–5777 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, R., Zhang, S., Li, T., He, R.: Beyond face rotation: global and local perception GAN for photorealistic and identity preserving frontal view synthesis. In: ICCV, pp. 2439–2448 (2017)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)
Google Scholar
Jia, X., Gavves, E., Fernando, B., Tuytelaars, T.: Guiding the long-short term memory model for image caption generation. In: ICCV, pp. 2407–2415 (2015)
Google Scholar
Jyothi, A.A., Durand, T., He, J., Sigal, L., Mori, G.: Layoutvae: stochastic scene layout generation from a label set. In: ICCV, pp. 9895–9904 (2019)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)
Google Scholar
Li, J., Yang, J., Hertzmann, A., Zhang, J., Xu, T.: Layoutgan: generating graphic layouts with wireframe discriminators. In: ICLR (2019)
Google Scholar
Li, W., et al.: Object-driven text-to-image synthesis via adversarial training. In: CVPR, pp. 12174–12182 (2019)
Google Scholar
Mehri, M., Nayef, N., Héroux, P., Gomez-Krämer, P., Mullot, R.: Learning texture features for enhancement and segmentation of historical document images. In: ICDAR, pp. 47–54 (2015)
Google Scholar
Patil, A.G., Ben-Eliezer, O., Perel, O., Averbuch-Elor, H.: READ: recursive autoencoders for document layout generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2316–2325 (2020)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2015)
Google Scholar
Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music. In: ICML (2018)
Google Scholar
Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: Mocogan: decomposing motion and content for video generation. In: CVPR, pp. 1526–1535 (2018)
Google Scholar
Wang, T., Wan, X.: T-CVAE: transformer-based conditioned variational autoencoder for story completion. In: IJCAI, pp. 5233–5239 (2019)
Google Scholar
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 63–79. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_5
Chapter Google Scholar
Wu, X., Hu, Z., Du, X., Yang, J., He, L.: Document layout analysis via dynamic residual feature fusion. In: ICME (2021)
Google Scholar
Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Lee Giles, C.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: CVPR, pp. 5315–5324 (2017)
Google Scholar
Yeh, R.A., Chen, C., Yian Lim, T., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models. In: CVPR, pp. 5485–5493 (2017)
Google Scholar
Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: CVPR, pp. 4694–4702 (2015)
Google Scholar
Zheng, X., Qiao, X., Cao, Y., Lau, R.W.: Content-aware generative modeling of graphic design layouts. ACM Trans. Graph. 38(4), 1–15 (2019)
Article Google Scholar
Zheng, Y., Kong, S., Zhu, W., Ye, H.: Scalable document image information extraction with application to domain-specific analysis. In: IEEE International Conference on Big Data (2019)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the Fundamental Research Funds for the Central Universities, the 2020 East China Normal University Outstanding Doctoral Students Academic Innovation Ability Improvement Project (YBNLTS2020-042), and the computation is performed in ECNU Multifunctional Platform for Innovation (001).

Author information

Authors and Affiliations

East China Normal University, Shanghai, China
Jie He, Xingjiao Wu, Wenxin Hu & Jing Yang

Authors

Jie He
View author publications
You can also search for this author in PubMed Google Scholar
Xingjiao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jing Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Yang .

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, J., Wu, X., Hu, W., Yang, J. (2021). LSTMVAEF: Vivid Layout via LSTM-Based Variational Autoencoder Framework. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12822. Springer, Cham. https://doi.org/10.1007/978-3-030-86331-9_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-86331-9_12
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86330-2
Online ISBN: 978-3-030-86331-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

LSTMVAEF: Vivid Layout via LSTM-Based Variational Autoencoder Framework

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning for Document Layout Generation: A First Reproducible Quantitative Evaluation and a Baseline Model

Self-refined variational transformer for image-conditioned layout generation

Data Synthesis for Document Layout Analysis

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

LSTMVAEF: Vivid Layout via LSTM-Based Variational Autoencoder Framework

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning for Document Layout Generation: A First Reproducible Quantitative Evaluation and a Baseline Model

Self-refined variational transformer for image-conditioned layout generation

Data Synthesis for Document Layout Analysis

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation