skip to main content
research-article

Three-stage Transferable and Generative Crowdsourced Comment Integration Framework Based on Zero- and Few-shot Learning with Domain Distribution Alignment

Published: 12 January 2024 Publication History

Abstract

Online shopping has become a crucial way to encourage daily consumption, where the User-generated, or crowdsourced product comments, can offer a broad range of feedback on e-commerce products. As a result, integrating critical opinions or major attitudes from the crowdsourced comments can provide valuable feedback for marketing strategy adjustment or product-quality monitoring. Unfortunately, the scarcity of annotated ground truth on the integrated comment, or the limited gold integration reference, has incurred the infeasibility of the regular supervised-learning-based comment integration. To resolve this problem, in this article, inspired by the principle of Transfer Learning, we propose a three-stage transferable and generative crowdsourced comment integration framework (TTGCIF) based on zero-and-few-shot learning with the support of domain distribution alignment. The proposed framework aims at generating abstractive integrated comment in target domain via the enhanced neural text generation model, by referring the available integration resource in related source domains, to avoid the exhausted effort on resource annotation devoted to the target domain. Specifically, at the first stage, to enhance the domain transferability, representations on the crowdsourced comments have been aligned up between the source and target domain, by minimizing the domain distribution discrepancy in the kernel space. At the second stage, Zero-shot comment integration mechanism has been adopted to deal with the dilemma that none of the gold integration reference may be available in target domain. In other words, taking the sample-level semantic prototype as input, the enhanced neural text generation model in TTGCIF is trained to learn data semantic association among different domains via semantic prototype transduction, so that the “unlabeled” crowdsourced comments in target domain can be associated with existing integration references in related source domains. At the third stage, based on the parameters trained at the second stage, fast domain adaptation mechanism in a Few-shot manner has also been adopted by seeking most potential parameters along the gradient direction constrained by instances across multiple source domains. In this way, parameters in TTGCIF can be sensitive to any alteration on training data, ensuring that even if only few annotated resource in target domain are available for “Fine-tune,” TTGCIF can still react promptly to achieve effective target domain adaptation. According to the experimental results, TTGCIF can achieve the best transferable product comment integration performance in target domain, with fast and stable domain adaption effect depending on no more than 10% annotated resource in target domain. More importantly, even if TTGCIF has not been fine-tuned on the target domain, yet by referring to the available integration resource in related source domains, the integrated comments generated by TTGCIF on the target domain are still superior to those generated by models already fine-tuned on the target domain.

References

[1]
Antreas Antoniou, Harrison Edwards, and Amos Storkey. 2019. How to train your MAML. In Proceedings of the 7th International Conference on Learning Representations. 1–11.
[2]
Yue Cao, Hui Liu, and Xiaojun Wan. 2020. Jointly learning to align and summarize for neural cross-lingual summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 6220– 6231.
[3]
Rich Caruana and Alexandru Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning. 161–168.
[4]
Sentao Chen, Le Han, Xiaolan Liu, Zongyao He, and Xiaowei Yang. 2020. Subspace distribution adaptation frameworks for domain adaptation. IEEE Trans. Neural Netw. Learn. Syst. 31, 12 (2020), 5204–5218.
[5]
Yi-Syuan Chen, Yun-Zhu Song, and Hong-Han Shuai. 2022. SPEC: Summary preference decomposition for low-resource abstractive summarization. IEEE/ACM Trans. Audio, Speech. Lang. Process. 31 (2022), 603–618.
[6]
Zhenrong Deng, Fuxin Ma, Rushi Lan, Wenming Huang, and Xiaonan Luo. 2021. A two-stage Chinese text summarization algorithm using keyword information and adversarial learning. Neurocomputing 425 (2021), 117–126.
[7]
Nina Dethlefs. 2017. Domain transfer for deep natural language generation from abstract meaning representations. IEEE Comput. Intell. Mag. 12, 3 (2017), 18–28.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1. 4171–4186.
[9]
Apoorva Dornadula, Austin Narcomey, Ranjay Krishna, Michael Bernstein, and Fei-Fei Li. 2019. Visual relationships as functions: Enabling few-shot scene graph prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0–0.
[10]
Xiangyu Duan, Mingming Yin, Min Zhang, Boxing Chen, and Weihua Luo. 2019. Zero-shot cross-lingual abstractive sentence summarization through teaching generation and attention. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 3162–3172.
[11]
Alexander R. Fabbri, Simeng Han, Haoyuan Li, Haoran Li, Marjan Ghazvininejad, Shafiq Joty, Dragomir Radev, and Yashar Mehdad. 2021. Improving zero and few-shot abstractive summarization with intermediate fine-tuning and data augmentation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 704–717.
[12]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning. PMLR, 1126–1135.
[13]
Mahak Gambhir and Vishal Gupta. 2017. Recent automatic text summarization techniques: A survey. Artific. Intell. Rev. 47, 1 (2017), 1–66.
[14]
Sergey Golovanov, Rauf Kurbanov, Sergey Nikolenko, Kyryl Truskovskyi, Alexander Tselousov, and Thomas Wolf. 2019. Large-scale transfer learning for natural language generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 6053–6058.
[15]
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning. PMLR, 2790–2799.
[16]
Panagiotis G. Ipeirotis, Foster Provost, Victor S. Sheng, and Jing Wang. 2014. Repeated labeling using multiple noisy labelers. Data Min. Knowl. Discov. 28, 2 (2014), 402–441.
[17]
Xiaodong Jia, Ming Zhao, Yuan Di, Qibo Yang, and Jay Lee. 2017. Assessment of data suitability for machine prognosis using maximum mean discrepancy. IEEE Trans. Industr. Electr. 65, 7 (2017), 5872–5881.
[18]
Yaser Keneshloo, Naren Ramakrishnan, and Chandan K. Reddy. 2019. Deep transfer reinforcement learning for text summarization. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 675–683.
[19]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2020. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7871–7880.
[20]
Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. 2018. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.
[21]
Haoliang Li, Sinno Jialin Pan, Shiqi Wang, and Alex C. Kot. 2019. Heterogeneous domain adaptation via nonlinear matrix factorization. IEEE Trans. Neural Netw. Learn. Syst. 31, 3 (2019), 984–996.
[22]
Linqing Liu, Yao Lu, Min Yang, Qiang Qu, Jia Zhu, and Hongyan Li. 2018. Generative adversarial network for abstractive text summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[23]
Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 3730–3740.
[24]
Zihan Liu, Jamin Shin, Yan Xu, Genta Indra Winata, Peng Xu, Andrea Madotto, and Pascale Fung. 2019. Zero-shot cross-lingual dialogue systems with transferable latent variables. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 1297–1303.
[25]
Tinghuai Ma, Huan Rong, Yongsheng Hao, Jie Cao, Yuan Tian, and Mznah A. Al-Rodhaan. 2019. A novel sentiment polarity detection framework for Chinese. IEEE Trans. Affect. Comput. 13, 1 (2019), 60–74.
[26]
M. L. Menéndez, J. A. Pardo, L. Pardo, and M. C. Pardo. 1997. The Jensen-Shannon divergence. J. Franklin Inst. 334, 2 (1997), 307–318.
[27]
Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. Retrieved from https://arXiv:1803.02999
[28]
Kun Qian and Zhou Yu. 2019. Domain adaptive dialog generation via meta learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2639–2649.
[29]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21 (2019), 1–67.
[30]
Huan Rong, Tinghuai Ma, Jie Cao, Yuan Tian, Abdullah Al-Dhelaan, and Mznah Al-Rodhaan. 2019. Deep rolling: A novel emotion prediction model for a multi-participant communication context. Info. Sci. 488 (2019), 158–180.
[31]
Huan Rong, Victor S. Sheng, Tinghuai Ma, Yang Zhou, and Mznah A. Al-Rodhaan. 2020. A self-play and sentimentemphasized comment integration framework based on deep q-learning in a crowdsourcing scenario. IEEE Trans. Knowl. Data Eng. 34, 3 (2020), 1021–1037.
[32]
Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 379–389.
[33]
Justyna Sarzynska-Wawer, Aleksander Wawer, Aleksandra Pawlak, Julia Szymanowska, Izabela Stefaniak, Michal Jarkiewicz, and Lukasz Okruszek. 2021. Detecting formal thought disorder by deep contextualized word representations. Psych. Res. 304 (2021), 114135.
[34]
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1073–1083.
[35]
Shi-qi Shen, Yun Chen, Cheng Yang, Zhi-yuan Liu, Mao-song Sun, et al. 2018. Zero-shot cross-lingual neural headline generation. IEEE/ACM Trans. Audio, Speech, Lang. Process. 26, 12 (2018), 2319–2327.
[36]
Victor S. Sheng and Jing Zhang. 2019. Machine learning with crowdsourcing: A brief summary of the past research and future directions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9837–9843.
[37]
Victor S. Sheng, Jing Zhang, Bin Gu, and Xindong Wu. 2017. Majority voting and pairing with multiple noisy labeling. IEEE Trans. Knowl. Data Eng. 31, 7 (2017), 1355–1368.
[38]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Adv. Neural Info. Process. Syst. 30 (2017).
[39]
Tham Vo. 2021. SE4ExSum: An integrated semantic-aware neural approach with graph convolutional network for extractive text summarization. Trans. Asian Low-Resour. Lang. Info. Process. 20, 6 (2021), 1–22.
[40]
Guodong Wang, Anna Ledwoch, Ramin M. Hasani, Radu Grosu, and Alexandra Brintrup. 2019. A generative neural network model for the quality prediction of work in progress products. Appl. Soft Comput. 85 (2019), 105683.
[41]
Wei Wang, Vincent W. Zheng, Han Yu, and Chunyan Miao. 2019. A survey of zero-shot learning: Settings, methods, and applications. ACM Trans. Intell. Syst. Technol. 10, 2 (2019), 1–37.
[42]
Pengfei Wei, Yiping Ke, and Chi Keong Goh. 2018. A general domain specific feature transfer framework for hybrid domain adaptation. IEEE Trans. Knowl. Data Eng. 31, 8 (2018), 1440–1451.
[43]
Karl Weiss, Taghi M. Khoshgoftaar, and DingDing Wang. 2016. A survey of transfer learning. J. Big Data 3, 1 (2016), 1–40.
[44]
Min Yang, Qiang Qu, Jia Zhu, Ying Shen, and Zhou Zhao. 2018. Cross-domain aspect/sentiment-aware abstractive review summarization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1531–1534.
[45]
Werner Zellinger, Bernhard A. Moser, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, and Susanne Saminger-Platz. 2019. Robust unsupervised domain adaptation for neural networks via moment alignment. Info. Sci. 483 (2019), 174–191.
[46]
Haofeng Zhang, Li Liu, Yang Long, Zheng Zhang, and Ling Shao. 2020. Deep transductive network for generalized zero shot learning. Pattern Recogn. 105 (2020), 107370.
[47]
Jing Zhang, Xindong Wu, and Victor S. Shengs. 2014. Active learning with imbalanced multiple noisy labeling. IEEE Trans. Cybernet. 45, 5 (2014), 1095–1107.
[48]
Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the International Conference on Machine Learning. PMLR, 11328–11339.
[49]
Tiancheng Zhao and Maxine Eskenazi. 2018. Zero-shot dialog generation with cross-domain latent actions. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue. 1–10.
[50]
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth inference in crowdsourcing: Is the problem solved? Proc. VLDB Endow. 10, 5 (2017), 541–552.
[51]
Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2020. A comprehensive survey on transfer learning. Proc. IEEE 109, 1 (2020), 43–76.

Index Terms

  1. Three-stage Transferable and Generative Crowdsourced Comment Integration Framework Based on Zero- and Few-shot Learning with Domain Distribution Alignment

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 3
    April 2024
    663 pages
    EISSN:1556-472X
    DOI:10.1145/3613567
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 January 2024
    Online AM: 11 December 2023
    Accepted: 29 November 2023
    Revised: 09 April 2023
    Received: 14 May 2022
    Published in TKDD Volume 18, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. crowdsourcing
    2. comment integration
    3. natural language generation
    4. transfer learning

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Foundation of China
    • Jiangsu Natural Science Foundation (Basic Research Program)
    • National Key Research and Development Program
    • Research Center of the Female Scientific and Medical Colleges, Deanship of Scientific Research, King Saud University, Saudi Arabia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 197
      Total Downloads
    • Downloads (Last 12 months)124
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media