research-article

A Chinese Information Operation and Maintenance Knowledge Retrieval Corpus

Authors:
Kai Sun

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

0000-0002-1920-7197
View Profile

,
Tiancheng Zhao

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

0009-0001-8748-0953
View Profile

,
Yangling Chen

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

0009-0003-3955-5329
View Profile

,
Shuo Han

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

0009-0007-2320-6334
View Profile

,
Yue Wu

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

0009-0001-7461-334X
View Profile

,
Nan Xiang

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

Information and Communication Branch, State Grid Nanjing Power Supply Company, China

0009-0000-7368-953X
View Profile

IoTAAI '23: Proceedings of the 2023 5th International Conference on Internet of Things, Automation and Artificial IntelligenceNovember 2023Pages 214–218https://doi.org/10.1145/3653081.3653118

Published:03 May 2024Publication History

IoTAAI '23: Proceedings of the 2023 5th International Conference on Internet of Things, Automation and Artificial Intelligence

Pages 214–218

ABSTRACT

The lack of large-scale information operation and maintenance corpus greatly limits the development of information technology operation and maintenance management, especially in non-English languages. To improve this situation, in this paper, we introduce a large-scale Chinese information operation and maintenance knowledge retrieval corpus and release it publicly. How to collect a large amount of retrieval corpora in different languages is a key point in building such a corpus. In this paper, we first collect a large amount of Chinese information operation and maintenance knowledge corpus related to high-frequency words in various fields using search engines, and then generate relevant questions for the corpus using ChatGPT (https://chat.openai.com/). Finally, we recruit three annotators to manually check the quality of the retrieval corpus. After this process, we have built a Chinese information operation and maintenance knowledge corpus containing 2000 retrieval questions. To verify the quality of the corpus, we divide it into two parts: a training set containing 1500 retrieval questions and a test set containing 500 retrieval questions, and test several well-known retrieval methods on them (https://pan.baidu.com/s/1rLWqHZJhE9nEOYg3OTC1Ag). The experimental results not only prove the high quality of the corpus but also provide a solid baseline performance for further research on this corpus.

References

Kim C, Haas C T, Liapi K A. Rapid, on-site spatial information acquisition and its use for infrastructure operation and maintenance[J]. Automation in Construction, 2005, 14(5): 666-684.Google ScholarCross Ref
Yang L, Li G, Zhang Z, Operations & maintenance optimization of wind turbines integrating wind and aging information[J]. IEEE Transactions on Sustainable Energy, 2020, 12(1): 211-221.Google ScholarCross Ref
Gao X, Pishdad-Bozorgi P. BIM-enabled facilities operation and maintenance: A review[J]. Advanced engineering informatics, 2019, 39: 227-247.Google ScholarDigital Library
Kou L, Li Y, Zhang F, Review on monitoring, operation and maintenance of smart offshore wind farms[J]. Sensors, 2022, 22(8): 2822.Google ScholarCross Ref
Zhu C, Du X, Zhao E, Research on Preprocessing Method for Massive Operations and Maintenance Data Based on Fuzzy Correlation[C]. 2023 4th International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI). IEEE, 2023: 395-398.Google Scholar
Jia J, Fu H, Zhang Z, Diagnosis of power operation and maintenance records based on pre-training model and prompt learning[C]. 2022 21st International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES). IEEE, 2022: 58-61.Google Scholar
Lu W, Zhang X, Lu H, Deep hierarchical encoding model for sentence semantic matching[J]. Journal of Visual Communication and Image Representation, 2020, 71: 102794.Google ScholarCross Ref
Zhang X, Lu W, Li F, Deep feature fusion model for sentence semantic matching[J]. Computers, Materials and Continua, 2019.Google Scholar
Zhang X, Lu W, Zhang G, Chinese sentence semantic matching based on multi-granularity fusion model[C]. Pacific-Asia Conference on Knowledge Discovery and Data Mining. Cham: Springer International Publishing, 2020: 246-257.Google Scholar
Zhang J, Liu Y, Ma S, Relevance estimation with multiple information sources on search engine result pages[C]. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 627-636.Google Scholar
Luo C, Zheng Y, Liu Y, SogouT-16: a new web corpus to embrace IR research[C]. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017: 1233-1236.Google Scholar
Wang A, Singh A, Michael J, GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[C]. International Conference on Learning Representations. 2018.Google ScholarCross Ref
Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.Google Scholar
Cui Y, Che W, Liu T, Pre-training with whole word masking for chinese bert[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.Google ScholarDigital Library
Zhang X, Liu Z, Xiang Y, Complicate Then Simplify: A Novel Way to Explore Pre-trained Models for Text Classification[C]. Proceedings of the 29th International Conference on Computational Linguistics. 2022: 1136-1145.Google Scholar
Cui Y, Che W, Liu T, Revisiting Pre-Trained Models for Chinese Natural Language Processing[C]. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020: 657-668.Google ScholarCross Ref
Sun Z, Li X, Sun X, ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information[C]. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021: 2065-2075.Google Scholar

Index Terms

A Chinese Information Operation and Maintenance Knowledge Retrieval Corpus
1. Information systems
  1. Information retrieval

Recommendations

Exploiting a Chinese-English bilingual wordlist for English-Chinese cross language information retrieval
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

We investigated using the LDC English/Chinese bilingual wordlists for English-Chinese cross language retrieval. It is shown that the Chinese-to-English wordlist can be considered as both a phrase and word dictionary, and is preferable to the English-to-...
Read More
A Chinese dictionary construction algorithm for information retrieval

In this article we propose a method for constructing, from raw Chinese text, a statistics-based automatic dictionary. The method makes use of local statistical information (i.e., data within a document) to identify and discard repeated string patterns, ...
Read More
Corpus-Based statistics of pre-qin chinese
CLSW'12: Proceedings of the 13th Chinese conference on Chinese Lexical Semantics

The Pre-Qin Chinese plays a key role in the history of Chinese. However, for the lack of annotated corpus, the overview of Pre-Qin Chinese vocabulary is still not clear. This paper introduces the corpus of 25 Pre-Qin classical texts, which are under ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

IoTAAI '23: Proceedings of the 2023 5th International Conference on Internet of Things, Automation and Artificial Intelligence
November 2023
902 pages
ISBN:9798400716485
DOI:10.1145/3653081

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 May 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 3
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Chinese Information Operation and Maintenance Knowledge Retrieval Corpus

IoTAAI '23: Proceedings of the 2023 5th International Conference on Internet of Things, Automation and Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploiting a Chinese-English bilingual wordlist for English-Chinese cross language information retrieval

A Chinese dictionary construction algorithm for information retrieval

Corpus-Based statistics of pre-qin chinese

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Chinese Information Operation and Maintenance Knowledge Retrieval Corpus

IoTAAI '23: Proceedings of the 2023 5th International Conference on Internet of Things, Automation and Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploiting a Chinese-English bilingual wordlist for English-Chinese cross language information retrieval

A Chinese dictionary construction algorithm for information retrieval

Corpus-Based statistics of pre-qin chinese

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media