skip to main content
10.1145/3653081.3653118acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiotaaiConference Proceedingsconference-collections
research-article

A Chinese Information Operation and Maintenance Knowledge Retrieval Corpus

Authors Info & Claims
Published:03 May 2024Publication History

ABSTRACT

The lack of large-scale information operation and maintenance corpus greatly limits the development of information technology operation and maintenance management, especially in non-English languages. To improve this situation, in this paper, we introduce a large-scale Chinese information operation and maintenance knowledge retrieval corpus and release it publicly. How to collect a large amount of retrieval corpora in different languages is a key point in building such a corpus. In this paper, we first collect a large amount of Chinese information operation and maintenance knowledge corpus related to high-frequency words in various fields using search engines, and then generate relevant questions for the corpus using ChatGPT (https://chat.openai.com/). Finally, we recruit three annotators to manually check the quality of the retrieval corpus. After this process, we have built a Chinese information operation and maintenance knowledge corpus containing 2000 retrieval questions. To verify the quality of the corpus, we divide it into two parts: a training set containing 1500 retrieval questions and a test set containing 500 retrieval questions, and test several well-known retrieval methods on them (https://pan.baidu.com/s/1rLWqHZJhE9nEOYg3OTC1Ag). The experimental results not only prove the high quality of the corpus but also provide a solid baseline performance for further research on this corpus.

References

  1. Kim C, Haas C T, Liapi K A. Rapid, on-site spatial information acquisition and its use for infrastructure operation and maintenance[J]. Automation in Construction, 2005, 14(5): 666-684.Google ScholarGoogle ScholarCross RefCross Ref
  2. Yang L, Li G, Zhang Z, Operations & maintenance optimization of wind turbines integrating wind and aging information[J]. IEEE Transactions on Sustainable Energy, 2020, 12(1): 211-221.Google ScholarGoogle ScholarCross RefCross Ref
  3. Gao X, Pishdad-Bozorgi P. BIM-enabled facilities operation and maintenance: A review[J]. Advanced engineering informatics, 2019, 39: 227-247.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kou L, Li Y, Zhang F, Review on monitoring, operation and maintenance of smart offshore wind farms[J]. Sensors, 2022, 22(8): 2822.Google ScholarGoogle ScholarCross RefCross Ref
  5. Zhu C, Du X, Zhao E, Research on Preprocessing Method for Massive Operations and Maintenance Data Based on Fuzzy Correlation[C]. 2023 4th International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI). IEEE, 2023: 395-398.Google ScholarGoogle Scholar
  6. Jia J, Fu H, Zhang Z, Diagnosis of power operation and maintenance records based on pre-training model and prompt learning[C]. 2022 21st International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES). IEEE, 2022: 58-61.Google ScholarGoogle Scholar
  7. Lu W, Zhang X, Lu H, Deep hierarchical encoding model for sentence semantic matching[J]. Journal of Visual Communication and Image Representation, 2020, 71: 102794.Google ScholarGoogle ScholarCross RefCross Ref
  8. Zhang X, Lu W, Li F, Deep feature fusion model for sentence semantic matching[J]. Computers, Materials and Continua, 2019.Google ScholarGoogle Scholar
  9. Zhang X, Lu W, Zhang G, Chinese sentence semantic matching based on multi-granularity fusion model[C]. Pacific-Asia Conference on Knowledge Discovery and Data Mining. Cham: Springer International Publishing, 2020: 246-257.Google ScholarGoogle Scholar
  10. Zhang J, Liu Y, Ma S, Relevance estimation with multiple information sources on search engine result pages[C]. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 627-636.Google ScholarGoogle Scholar
  11. Luo C, Zheng Y, Liu Y, SogouT-16: a new web corpus to embrace IR research[C]. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017: 1233-1236.Google ScholarGoogle Scholar
  12. Wang A, Singh A, Michael J, GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[C]. International Conference on Learning Representations. 2018.Google ScholarGoogle ScholarCross RefCross Ref
  13. Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.Google ScholarGoogle Scholar
  14. Cui Y, Che W, Liu T, Pre-training with whole word masking for chinese bert[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zhang X, Liu Z, Xiang Y, Complicate Then Simplify: A Novel Way to Explore Pre-trained Models for Text Classification[C]. Proceedings of the 29th International Conference on Computational Linguistics. 2022: 1136-1145.Google ScholarGoogle Scholar
  16. Cui Y, Che W, Liu T, Revisiting Pre-Trained Models for Chinese Natural Language Processing[C]. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020: 657-668.Google ScholarGoogle ScholarCross RefCross Ref
  17. Sun Z, Li X, Sun X, ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information[C]. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021: 2065-2075.Google ScholarGoogle Scholar

Index Terms

  1. A Chinese Information Operation and Maintenance Knowledge Retrieval Corpus

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      IoTAAI '23: Proceedings of the 2023 5th International Conference on Internet of Things, Automation and Artificial Intelligence
      November 2023
      902 pages
      ISBN:9798400716485
      DOI:10.1145/3653081

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 May 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format