Skip to main content

DHA: Product Title Generation with Discriminative Hierarchical Attention for E-commerce

  • Conference paper
  • First Online:
  • 1475 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13282))

Abstract

Product titles play an important role in E-Commerce sites. However, manually crafting product titles needs tremendous time and human effort. It is expected that product titles can be automatically generated, but existing generation methods usually require densely labeled data that are unavailable in the real world. We formulate a novel product title generation task that generates the title from the image and auxiliary information (e.g., category) to address the gap. To generate titles that are consistent with search queries, we construct the first large-scale dataset (AEPro) and propose a Discriminative Hierarchical Attention (DHA) model. The DHA model first identifies the image regions related to the product of interest (POI) with a POI attention module. Then, based on the title context, the identified image regions are further revised by a generation attention module. Finally, the titles are generated by dynamically attending to these image regions. Experiments on the AEPro dataset demonstrate the effectiveness of the DHA model. Besides, online A/B testing results show that \(61.8\%\) of the titles generated by the DHA model are accepted directly or with minor modifications. The exposure rate of the products with machine-generated titles is improved by \(40\%\).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://dbpedia.org.

  2. 2.

    https://github.com/eBay/sigir-2019-ecom-challenge.

References

  1. Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)

    Google Scholar 

  2. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  3. Dale, R., Green, S.J., Milosavljevic, M., Paris, C., Verspoor, C., Williams, S.: The realities of generating natural language from databases. In: Proceedings of the 11th Australian Joint Conference on Artificial Intelligence, pp. 13–17. Citeseer (1998)

    Google Scholar 

  4. Denkowski, M., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380 (2014)

    Google Scholar 

  5. Duma, D., Klein, E.: Generating natural language from linked data: unsupervised template extraction. In: Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)-Long Papers, pp. 83–94 (2013)

    Google Scholar 

  6. Farhadi, A., et al.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2

    Chapter  Google Scholar 

  7. Gong, Y., Wang, L., Hodosh, M., Hockenmaier, J., Lazebnik, S.: Improving image-sentence embeddings using large weakly annotated photo collections. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 529–545. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_35

    Chapter  Google Scholar 

  8. Gupta, A., Verma, Y., Jawahar, C.: Choosing linguistics over vision to describe images. In: Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  10. Kiros, R., Salakhutdinov, R., Zemel, R.: Multimodal neural language models. In: International Conference on Machine Learning, pp. 595–603 (2014)

    Google Scholar 

  11. Kulkarni, G., et al.: Babytalk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2891–2903 (2013)

    Article  Google Scholar 

  12. Li, J., Ebrahimpour, M.K., Moghtaderi, A., Yu, Y.Y.: Image captioning with weakly-supervised attention penalty. arXiv preprint arXiv:1903.02507 (2019)

  13. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)

    Google Scholar 

  14. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  15. Liu, C., Mao, J., Sha, F., Yuille, A.: Attention correctness in neural image captioning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  16. Mei, H., Bansal, M., Walter, M.R.: What to talk about and how? Selective generation using LSTMs with coarse-to-fine alignment. arXiv preprint arXiv:1509.00838 (2015)

  17. Mitchell, M., et al.: Midge: generating image descriptions from computer vision detections. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 747–756. Association for Computational Linguistics (2012)

    Google Scholar 

  18. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. pp. 311–318. Association for Computational Linguistics (2002)

    Google Scholar 

  19. Russakovsky, O., Bernstein, M., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  20. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

    Google Scholar 

  21. de Souza, J.G.C., et al.: Generating e-commerce product titles and predicting their quality. In: Proceedings of the 11th International Conference on Natural Language Generation, pp. 233–243 (2018)

    Google Scholar 

  22. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)

    Google Scholar 

  23. Wang, H., Zhang, Y., Yu, X.: An overview of image caption generation methods. Comput. Intell. Neurosci. 2020 (2020)

    Google Scholar 

  24. Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)

    Google Scholar 

  25. Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 711–727. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_42

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenya Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, W. et al. (2022). DHA: Product Title Generation with Discriminative Hierarchical Attention for E-commerce. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13282. Springer, Cham. https://doi.org/10.1007/978-3-031-05981-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05981-0_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05980-3

  • Online ISBN: 978-3-031-05981-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics