Intelligent design of multimedia content in Alibaba

Liu, Kui-long; Li, Wei; Yang, Chang-yuan; Yang, Guang

doi:10.1631/FITEE.1900580

Intelligent design of multimedia content in Alibaba

Report
Published: 13 February 2020

Volume 20, pages 1657–1664, (2019)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

541 Accesses
3 Altmetric
Explore all metrics

Abstract

Multimedia content is an integral part of Alibaba’s business ecosystem and is in great demand. The production of multimedia content usually requires high technology and much money. With the rapid development of artificial intelligence (AI) technology in recent years, to meet the design requirements of multimedia content, many AI auxiliary tools for the production of multimedia content have emerged and become more and more widely used in Alibaba’s business ecology. Related applications include mainly auxiliary design, graphic design, video generation, and page production. In this report, a general pipeline of the AI auxiliary tools is introduced. Four representative tools applied in the Alibaba Group are presented for the applications mentioned above. The value brought by multimedia content design combined with AI technology has been well verified in business through these tools. This reflects the great role played by AI technology in promoting the production of multimedia content. The application prospects of the combination of multimedia content design and AI are also indicated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Azadi S, Fisher M, Kim VG, et al., 2018. Multi-content GAN for few-shot font style transfer. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.7564–7573.
Bradski G, Kaehler A, 2008. Learning OpenCV: Computer Vision with the OpenCV Library. O’Reilly Media, Inc.
Bretan M, Weinberg G, Heck L, 2016. A unit selection methodology for music generation using deep neural networks. https://doi.org/1612.03789
Cao Z, Simon T, Wei SE, et al., 2017. Realtime multi-person 2D pose estimation using part affinity fields. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.7291–7299. https://doi.org/10.1109/CVPR.2017.143
Chen LC, Zhu YK, Papandreou G, et al., 2018. Encoderdecoder with atrous separable convolution for semantic image segmentation. https://doi.org/1802.02611
Chollet F, 2017. Xception: deep learning with depthwise separable convolutions. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.1251–1258. https://doi.org/10.1109/CVPR.2017.195
Goodfellow IJ, Pouget-Abadie J, Mirza M, et al., 2014. Generative adversarial nets. Proc 27^th Int Conf on Neural Information Processing Systems, p.2672–2680.
He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770–778. https://doi.org/10.1109/CVPR.2016.90
He KM, Gkioxari G, Dollàr P, et al., 2017. Mask R-CNN. Proc IEEE Int Conf on Computer Vision, p.2961–2969. https://doi.org/10.1109/ICCV.2017.322
Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neur Comput, 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Huang X, Peng YX, 2019. TPCKT: two-level progressive cross-media knowledge transfer. IEEE Trans Multim, 21(11):2850–2862. https://doi.org/10.1109/TMM.2019.2911456
Article Google Scholar
Kim KS, Zhang DN, Kang MC, et al., 2013. Improved simple linear iterative clustering superpixels. IEEE Int Symp on Consumer Electronics, p.259–260. https://doi.org/10.1109/ISCE.2013.6570216
Levin A, Lischinski D, Weiss Y, 2007. A closed-form solution to natural image matting. IEEE Trans Patt Anal Mach Intell, 30(2):228–242. https://doi.org/10.1109/TPAMI.2007.1177
Article Google Scholar
Lin TY, Dollàr P, Girshick R, et al., 2017. Feature pyramid networks for object detection. Proc Conf on Computer Vision and Pattern Recognition, p.2117–2125. https://doi.org/10.1109/CVPR.2017.106
Ngiam J, Khosla A, Kim M, et al., 2011. Multimodal deep learning. Proc 28^th Int Conf on Machine Learning, p.689–696.
Papandreou G, Zhu T, Chen LC, et al., 2018. PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. https://doi.org/1803.08225
Peng YX, Zhu WW, Zhao Y, et al., 2017. Cross-media analysis and reasoning: advances and directions. Front Inform Technol Electron Eng, 18(1):44–57. https://doi.org/10.1631/FITEE.1601787
Article Google Scholar
Peng YX, Huang X, Zhao YZ, 2018. An overview of cross-media retrieval: concepts, methodologies, benchmarks, and challenges. IEEE Trans Circ Syst Video Technol, 28(9):2372–2385. https://doi.org/10.1109/TCSVT.2017.2705068
Article Google Scholar
Peng YX, Qi JW, Huang X, 2019. Current research status and prospects on multimedia content understanding. J Comput Res Devel, 56(1):187–212 (in Chinese). https://doi.org/10.7544/issn1000-1239.2019.20180770
Google Scholar
Ren SQ, He KM, Girshick R, et al., 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell, 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Ristani E, Tomasi C, 2018. Features for multi-target multicamera tracking and re-identification. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6036–6046. https://doi.org/10.1109/CVPR.2018.00632
Rother C, Kolmogorov V, Blake A, 2004. “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Trans Graph, 23(3):309–314. https://doi.org/10.1145/1015706.1015720
Article Google Scholar
Simonyan K, Zisserman A, 2015. Very deep convolutional networks for large-scale image recognition. https://doi.org/1409.1556
Song SJ, Zhang W, Liu JY, et al., 2019. Unsupervised person image generation with semantic parsing transformation. https://doi.org/1904.03379
Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Advances in Neural Information Processing Systems, p.5998–6008.
Xia FT, Wang P, Chen XJ, et al., 2017. Joint multi-person pose estimation and semantic part segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6769–6778. https://doi.org/10.1109/CVPR.2017.644
Zhang SF, Zhu XY, Lei Z, et al., 2017. S³FD: single shot scale-invariant face detector. Proc IEEE Int Conf on Computer Vision, p.192–201. https://doi.org/10.1109/ICCV.2017.30
Zhou BL, Khosla A, Lapedriza A, et al., 2016. Learning deep features for discriminative localization. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2921–2929. https://doi.org/10.1109/CVPR.2016.319
Zhu JY, Park T, Isola P, et al., 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proc IEEE Int Conf on Computer Vision, p.2223–2232. https://doi.org/10.1109/ICCV.2017.244

Download references

Author information

Authors and Affiliations

Alibaba Group, Hangzhou, 311121, China
Kui-long Liu, Wei Li, Chang-yuan Yang & Guang Yang

Authors

Kui-long Liu
View author publications
Search author on:PubMed Google Scholar
Wei Li
View author publications
Search author on:PubMed Google Scholar
Chang-yuan Yang
View author publications
Search author on:PubMed Google Scholar
Guang Yang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Chang-yuan Yang.

Additional information

Compliance with ethics guidelines

Kui-long LIU, Wei LI, Chang-yuan YANG, and Guang YANG declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Kl., Li, W., Yang, Cy. et al. Intelligent design of multimedia content in Alibaba. Front Inform Technol Electron Eng 20, 1657–1664 (2019). https://doi.org/10.1631/FITEE.1900580

Download citation

Received: 23 October 2019
Accepted: 15 December 2019
Published: 13 February 2020
Issue Date: December 2019
DOI: https://doi.org/10.1631/FITEE.1900580

Key words

CLC number

TP391

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intelligent design of multimedia content in Alibaba

Abstract

Access this article

Subscribe and save

Buy Now

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Subscribe and save

Buy Now