research-article

New Energy Power Domain Question-Method Extraction And Soft Clustering

Authors:
Wenjing Liu

State Grid Information & Telecommunication Branch, China

State Grid Information & Telecommunication Branch, China

0009-0000-0112-4986
View Profile

,
Suxiang Zhang

State Grid Information & Telecommunication Branch, China

State Grid Information & Telecommunication Branch, China

0009-0007-3049-803X
View Profile

,
Yang Sun

Beijing University of Posts and Telecommunications, China

Beijing University of Posts and Telecommunications, China

0009-0009-7429-0497
View Profile

,
Xing Sheng

State Grid Corporation of China, China

State Grid Corporation of China, China

0009-0007-8185-1825
View Profile

,
Zhidong Wu

State Grid Information & Telecommunication Branch, China

State Grid Information & Telecommunication Branch, China

0009-0009-9082-8234
View Profile

ICCIP '23: Proceedings of the 2023 9th International Conference on Communication and Information ProcessingDecember 2023Pages 484–491https://doi.org/10.1145/3638884.3638961

Published:23 April 2024Publication History

ICCIP '23: Proceedings of the 2023 9th International Conference on Communication and Information Processing

Pages 484–491

ABSTRACT

In recent years, as the field of new energy power has gradually become a research hotspot, there are more and more research results related to new energy power. This paper first proposes to Fine-tune the Chinese LLaMA large language model to realize the extraction of research questions and methods in new energy power results. The fine-tuning dataset is constructed by the combination of rule template and gpt-3.5 enhancement, which avoids the costly and time-consuming problem caused by manual construction. The fine-tuning method adopts LoRA high-efficiency fine-tuning to save computing resources; Then, F1 value is used as the evaluation index to compare the extraction effect of the model under different fine-tuning datasets. The results show that the model has a good extraction effect on the research questions and method terms when training the dataset constructed by the combination of rule template and gpt-3.5 enhancement. Finally, according to the extracted research question phrases, BTM(Biterm Topic Model) is used to study the distribution of topic words, and soft clustering of research question phrases is carried out according to the obtained topic words, so as to realize the correlation between the research results and professional terms, which provides the foundation for the future establishment of the knowledge graph and knowledge base of new energy power.

CCS CONCEPTS • Theory of computation • Theory and algorithms for application domains • Unsupervised learning and clustering

References

YANG Wei, SUN Deyan, ZHANG Xiaohui, .Named entity recognition for intelligent answer system in power service[J]. Computer Engineering and Design,2019,40(12):3625-3630(in Chinese).Google Scholar
YANG Q Y, JIANG J, FENG X Y, Named entity recognition of power substation knowledge based on transformer-BiLSTM-CRF network[C]//2020 International Conference on Smart Grids and Energy Systems (SGES).Perth:IEE,2020:952-956.Google Scholar
XU Huifang, ZHANG Zhonghao, TAN Yuanpeng, .Research on entity recognition technology in power grid dispatching field[J].Electric Power Construction,2021,42(10):71-77(in Chinese).Google Scholar
CHEN Peng, TAI Bin, SHI Ying, et al. Text Entity Extraction of Power Equipment Defects Based on BERT-BI-LSTM-CRF Algorithm. Power System Technology,2023,47(10):4367-4376.Google Scholar
Vaswani A, Shazeer N, Parmar N, Attention is all you need[C/OL]//Guyon I, Luxburg U V, Bengio S, Advances in Neural Information Processing Systems: volume 30. Curran Associates, Inc.,2017. https://proceedings.neurips.cc/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.Google Scholar
Devlin J, Chang M W, Lee K, Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.Google Scholar
Radford A, Wu J, Child R, Language models are unsupervised multitask learners[J]. OpenAI blog, 2019, 1(8):9.Google Scholar
Touvron H, Lavril T, Izacard G, Llama: Open and efficient foundation language models[J].arXiv preprint arXiv:2302.13971, 2023Google Scholar
Taori R, Gulrajani I, Zhang T, Stanford alpaca: An instruction-following llama model[J/OL]. GitHub repository, 2023. https://github.com/tatsu-lab/stanford_alpaca.Google Scholar
Hu E J, yelong shen, Wallis P, LoRA: Low-rank adaptation of large language models[C/OL]//International Conference on LearninRepresentations. 2022. https://openreview.net/forum?id=nZeVKeeFYf9.Google Scholar
Aghajanyan A, Zettlemoyer L, Gupta S. Intrinsic dimensionality explains the effectiveness of language model fine-tuning[J]. arXiv preprint arXiv:2012.13255, 2020.Google Scholar
Brown T, Mann B, Ryder N, Language models are few-shot learners[J]. Advances in neural information processing systems, 2020, 33:1877-1901.Google Scholar
Touvron H, Martin L, Stone K, Llama 2: Open foundation and fine-tuned chat models[J]. arXiv preprint arXiv:2307.09288, 2023Google Scholar
Rasley J, Rajbhandari S, Ruwase O, Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020: 3505-3506.Google Scholar
Bommasani R, Liang P, Lee T. Holistic evaluation of language models[J]. Annals of the New York Academy of Sciences,2023.Google Scholar
https://github.com/ymcui/Chinese-LLaMA-AlpacaGoogle Scholar
FENG Bin, ZHANG Wenwen, TANG Xin, Power Equipment Defect Record Text Mining Based on BiLSTM-Attention Neural Network, Proceedings of the CSEE, 2020,40(S1): 1-10Google Scholar
LI Yanxuan. Research and implementation of abstract-based paper classification and recommendation model. Beijing: Beijing University of Posts and Telecommunications, 2019.Google Scholar
ZHAI Yujia, TIAN Jingwen, ZHAO Ban. Algorithm Term Extraction and Innovation Evolution Path Construction Based on BERT-BiLSTM-CRF Model. Information Science, 2022,40(4):71-78Google Scholar
HAN Hongqi, XU Shuo, GUI Jie, Term Hierarchical Relation Extraction Method Based on Morphology Rule Template. Information Science, 2013,32(7):708-715.Google Scholar
Xiaohui Yan, Jiafeng Guo, Yanyan Lan, A biterm topic model for short texts.Proceedings of the 22nd international conference on World Wide Web.May,2013.Pages 1445–1456.https://doi.org/10.1145/2488388.2488514Google ScholarDigital Library
GAO Huiying, GONG Mengqiu, YU Sijia. Identification of Medical Service Quality Factors Based on COA-BTM Model. Journal of Beijing Institute of Technology, 2022,(11):1167-1174Google Scholar
XU Feifei, CHEN Saihong, TIAN Yu. Hot topic detection based on BTM model and improved clustering algorithm. Computer Applications And Software, 2022,(5):283-290.Google Scholar

Recommendations

A graph-based topic extraction method enabling simple interactive customization
DocEng '13: Proceedings of the 2013 ACM symposium on Document engineering

It is often desirable to identify the concepts that are present in a corpus. A popular way to deal with this objective is to discover clusters of words or topics, for which many algorithms exist in the literature. Yet most of these methods lack the ...
Read More
Extracting time series variation of topic popularity in microblogs
iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services

Extracting topics and their popularities in microblogs is a promising approach to discover popular topics in the world. To challenge this task, some methods that estimate popularity of topics based on Latent Dirichlet Allocation (LDA) has been proposed. ...
Read More
Soft clustering criterion functions for partitional document clustering: a summary of results
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

Recently published studies have shown that partitional clustering algorithms that optimize certain criterion functions, which measure key aspects of inter- and intra-cluster similarity, are very effective in producing hard clustering solutions for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICCIP '23: Proceedings of the 2023 9th International Conference on Communication and Information Processing
December 2023
648 pages
ISBN:9798400708909
DOI:10.1145/3638884

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 April 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Biterm Topic Model
Chinese LLaMA Fine-Tuning
Soft clustering
Terminology extraction
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate61of301submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 1
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

New Energy Power Domain Question-Method Extraction And Soft Clustering

ICCIP '23: Proceedings of the 2023 9th International Conference on Communication and Information Processing

ABSTRACT

References

Cited By

Recommendations

A graph-based topic extraction method enabling simple interactive customization

Extracting time series variation of topic popularity in microblogs

Soft clustering criterion functions for partitional document clustering: a summary of results

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

New Energy Power Domain Question-Method Extraction And Soft Clustering

ICCIP '23: Proceedings of the 2023 9th International Conference on Communication and Information Processing

ABSTRACT

References

Cited By

Recommendations

A graph-based topic extraction method enabling simple interactive customization

Extracting time series variation of topic popularity in microblogs

Soft clustering criterion functions for partitional document clustering: a summary of results

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media