skip to main content
10.1145/3638884.3638961acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccipConference Proceedingsconference-collections
research-article

New Energy Power Domain Question-Method Extraction And Soft Clustering

Published:23 April 2024Publication History

ABSTRACT

In recent years, as the field of new energy power has gradually become a research hotspot, there are more and more research results related to new energy power. This paper first proposes to Fine-tune the Chinese LLaMA large language model to realize the extraction of research questions and methods in new energy power results. The fine-tuning dataset is constructed by the combination of rule template and gpt-3.5 enhancement, which avoids the costly and time-consuming problem caused by manual construction. The fine-tuning method adopts LoRA high-efficiency fine-tuning to save computing resources; Then, F1 value is used as the evaluation index to compare the extraction effect of the model under different fine-tuning datasets. The results show that the model has a good extraction effect on the research questions and method terms when training the dataset constructed by the combination of rule template and gpt-3.5 enhancement. Finally, according to the extracted research question phrases, BTM(Biterm Topic Model) is used to study the distribution of topic words, and soft clustering of research question phrases is carried out according to the obtained topic words, so as to realize the correlation between the research results and professional terms, which provides the foundation for the future establishment of the knowledge graph and knowledge base of new energy power.

CCS CONCEPTS • Theory of computation • Theory and algorithms for application domains • Unsupervised learning and clustering

References

  1. YANG Wei, SUN Deyan, ZHANG Xiaohui, .Named entity recognition for intelligent answer system in power service[J]. Computer Engineering and Design,2019,40(12):3625-3630(in Chinese).Google ScholarGoogle Scholar
  2. YANG Q Y, JIANG J, FENG X Y, Named entity recognition of power substation knowledge based on transformer-BiLSTM-CRF network[C]//2020 International Conference on Smart Grids and Energy Systems (SGES).Perth:IEE,2020:952-956.Google ScholarGoogle Scholar
  3. XU Huifang, ZHANG Zhonghao, TAN Yuanpeng, .Research on entity recognition technology in power grid dispatching field[J].Electric Power Construction,2021,42(10):71-77(in Chinese).Google ScholarGoogle Scholar
  4. CHEN Peng, TAI Bin, SHI Ying, et al. Text Entity Extraction of Power Equipment Defects Based on BERT-BI-LSTM-CRF Algorithm. Power System Technology,2023,47(10):4367-4376.Google ScholarGoogle Scholar
  5. Vaswani A, Shazeer N, Parmar N, Attention is all you need[C/OL]//Guyon I, Luxburg U V, Bengio S, Advances in Neural Information Processing Systems: volume 30. Curran Associates, Inc.,2017. https://proceedings.neurips.cc/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.Google ScholarGoogle Scholar
  6. Devlin J, Chang M W, Lee K, Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.Google ScholarGoogle Scholar
  7. Radford A, Wu J, Child R, Language models are unsupervised multitask learners[J]. OpenAI blog, 2019, 1(8):9.Google ScholarGoogle Scholar
  8. Touvron H, Lavril T, Izacard G, Llama: Open and efficient foundation language models[J].arXiv preprint arXiv:2302.13971, 2023Google ScholarGoogle Scholar
  9. Taori R, Gulrajani I, Zhang T, Stanford alpaca: An instruction-following llama model[J/OL]. GitHub repository, 2023. https://github.com/tatsu-lab/stanford_alpaca.Google ScholarGoogle Scholar
  10. Hu E J, yelong shen, Wallis P, LoRA: Low-rank adaptation of large language models[C/OL]//International Conference on LearninRepresentations. 2022. https://openreview.net/forum?id=nZeVKeeFYf9.Google ScholarGoogle Scholar
  11. Aghajanyan A, Zettlemoyer L, Gupta S. Intrinsic dimensionality explains the effectiveness of language model fine-tuning[J]. arXiv preprint arXiv:2012.13255, 2020.Google ScholarGoogle Scholar
  12. Brown T, Mann B, Ryder N, Language models are few-shot learners[J]. Advances in neural information processing systems, 2020, 33:1877-1901.Google ScholarGoogle Scholar
  13. Touvron H, Martin L, Stone K, Llama 2: Open foundation and fine-tuned chat models[J]. arXiv preprint arXiv:2307.09288, 2023Google ScholarGoogle Scholar
  14. Rasley J, Rajbhandari S, Ruwase O, Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020: 3505-3506.Google ScholarGoogle Scholar
  15. Bommasani R, Liang P, Lee T. Holistic evaluation of language models[J]. Annals of the New York Academy of Sciences,2023.Google ScholarGoogle Scholar
  16. https://github.com/ymcui/Chinese-LLaMA-AlpacaGoogle ScholarGoogle Scholar
  17. FENG Bin, ZHANG Wenwen, TANG Xin, Power Equipment Defect Record Text Mining Based on BiLSTM-Attention Neural Network, Proceedings of the CSEE, 2020,40(S1): 1-10Google ScholarGoogle Scholar
  18. LI Yanxuan. Research and implementation of abstract-based paper classification and recommendation model. Beijing: Beijing University of Posts and Telecommunications, 2019.Google ScholarGoogle Scholar
  19. ZHAI Yujia, TIAN Jingwen, ZHAO Ban. Algorithm Term Extraction and Innovation Evolution Path Construction Based on BERT-BiLSTM-CRF Model. Information Science, 2022,40(4):71-78Google ScholarGoogle Scholar
  20. HAN Hongqi, XU Shuo, GUI Jie, Term Hierarchical Relation Extraction Method Based on Morphology Rule Template. Information Science, 2013,32(7):708-715.Google ScholarGoogle Scholar
  21. Xiaohui Yan, Jiafeng Guo, Yanyan Lan, A biterm topic model for short texts.Proceedings of the 22nd international conference on World Wide Web.May,2013.Pages 1445–1456.https://doi.org/10.1145/2488388.2488514Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. GAO Huiying, GONG Mengqiu, YU Sijia. Identification of Medical Service Quality Factors Based on COA-BTM Model. Journal of Beijing Institute of Technology, 2022,(11):1167-1174Google ScholarGoogle Scholar
  23. XU Feifei, CHEN Saihong, TIAN Yu. Hot topic detection based on BTM model and improved clustering algorithm. Computer Applications And Software, 2022,(5):283-290.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICCIP '23: Proceedings of the 2023 9th International Conference on Communication and Information Processing
    December 2023
    648 pages
    ISBN:9798400708909
    DOI:10.1145/3638884

    Copyright © 2023 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 23 April 2024

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate61of301submissions,20%
  • Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)1

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format