DHA: Product Title Generation with Discriminative Hierarchical Attention for E-commerce

Zhu, Wenya; Zhang, Yinghua; Zhang, Yu; Zhou, Yuhang; Feng, Yinfu; Wu, Yuxiang; Da, Qing; Zeng, Anxiang

doi:10.1007/978-3-031-05981-0_22

DHA: Product Title Generation with Discriminative Hierarchical Attention for E-commerce

Wenya Zhu¹³,
Yinghua Zhang¹⁴,
Yu Zhang¹⁵,
Yuhang Zhou¹³,
Yinfu Feng¹³,
Yuxiang Wu¹⁶,
Qing Da¹³ &
…
Anxiang Zeng¹⁷

Conference paper
First Online: 10 May 2022

1475 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13282))

Abstract

Product titles play an important role in E-Commerce sites. However, manually crafting product titles needs tremendous time and human effort. It is expected that product titles can be automatically generated, but existing generation methods usually require densely labeled data that are unavailable in the real world. We formulate a novel product title generation task that generates the title from the image and auxiliary information (e.g., category) to address the gap. To generate titles that are consistent with search queries, we construct the first large-scale dataset (AEPro) and propose a Discriminative Hierarchical Attention (DHA) model. The DHA model first identifies the image regions related to the product of interest (POI) with a POI attention module. Then, based on the title context, the identified image regions are further revised by a generation attention module. Finally, the titles are generated by dynamically attending to these image regions. Experiments on the AEPro dataset demonstrate the effectiveness of the DHA model. Besides, online A/B testing results show that \(61.8\%\) of the titles generated by the DHA model are accepted directly or with minor modifications. The exposure rate of the products with machine-generated titles is improved by \(40\%\).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Dale, R., Green, S.J., Milosavljevic, M., Paris, C., Verspoor, C., Williams, S.: The realities of generating natural language from databases. In: Proceedings of the 11th Australian Joint Conference on Artificial Intelligence, pp. 13–17. Citeseer (1998)
Google Scholar
Denkowski, M., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380 (2014)
Google Scholar
Duma, D., Klein, E.: Generating natural language from linked data: unsupervised template extraction. In: Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)-Long Papers, pp. 83–94 (2013)
Google Scholar
Farhadi, A., et al.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2
Chapter Google Scholar
Gong, Y., Wang, L., Hodosh, M., Hockenmaier, J., Lazebnik, S.: Improving image-sentence embeddings using large weakly annotated photo collections. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 529–545. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_35
Chapter Google Scholar
Gupta, A., Verma, Y., Jawahar, C.: Choosing linguistics over vision to describe images. In: Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kiros, R., Salakhutdinov, R., Zemel, R.: Multimodal neural language models. In: International Conference on Machine Learning, pp. 595–603 (2014)
Google Scholar
Kulkarni, G., et al.: Babytalk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2891–2903 (2013)
Article Google Scholar
Li, J., Ebrahimpour, M.K., Moghtaderi, A., Yu, Y.Y.: Image captioning with weakly-supervised attention penalty. arXiv preprint arXiv:1903.02507 (2019)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, C., Mao, J., Sha, F., Yuille, A.: Attention correctness in neural image captioning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Mei, H., Bansal, M., Walter, M.R.: What to talk about and how? Selective generation using LSTMs with coarse-to-fine alignment. arXiv preprint arXiv:1509.00838 (2015)
Mitchell, M., et al.: Midge: generating image descriptions from computer vision detections. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 747–756. Association for Computational Linguistics (2012)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Russakovsky, O., Bernstein, M., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
de Souza, J.G.C., et al.: Generating e-commerce product titles and predicting their quality. In: Proceedings of the 11th International Conference on Natural Language Generation, pp. 233–243 (2018)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Google Scholar
Wang, H., Zhang, Y., Yu, X.: An overview of image caption generation methods. Comput. Intell. Neurosci. 2020 (2020)
Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
Google Scholar
Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 711–727. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_42
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Alibaba, Hangzhou, China
Wenya Zhu, Yuhang Zhou, Yinfu Feng & Qing Da
Hong Kong University of Science and Technology, Kowloon, Hong Kong
Yinghua Zhang
Southern University of Science and Technology, Shenzhen, China
Yu Zhang
University College London, London, UK
Yuxiang Wu
Nanyang Technological University, Singapore, Singapore
Anxiang Zeng

Authors

Wenya Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yinghua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yinfu Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yuxiang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qing Da
View author publications
You can also search for this author in PubMed Google Scholar
Anxiang Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenya Zhu .

Editor information

Editors and Affiliations

Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal
João Gama
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
Tianrui Li
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Yang Yu
School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
Enhong Chen
JD iCity, JD Technology & JD Intelligent Cities Research, Beijing, China
Yu Zheng
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
Fei Teng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, W. et al. (2022). DHA: Product Title Generation with Discriminative Hierarchical Attention for E-commerce. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13282. Springer, Cham. https://doi.org/10.1007/978-3-031-05981-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-05981-0_22
Published: 10 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05980-3
Online ISBN: 978-3-031-05981-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics