skip to main content
10.1145/2396761.2398645acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

A co-training based method for chinese patent semantic annotation

Published: 29 October 2012 Publication History

Abstract

Patents are public and scientific literatures protected by the law, and their abstracts highly contained valuable information. Patent's semantic annotation can effectively protect intellectual property rights and promote corporations' scientific research innovation. Currently, automatic patent annotation mainly used supervised machine learning algorithms, which required abundant expensive labeled patent data. Due to lack of enough labeled Chinese patent data, this paper adopted a semi-supervised machine learning method named co-training, which started from a little labeled data. This method combined keyword extraction with list extraction, and incrementally annotated functional clauses in patent abstract. Experiment results indicated this method can gradually improve the recall without sacrificing the precision.

References

[1]
United States Patent & Trademark Office. 1977. Eighth technology assessment and forecast report. Technical Report. Washington D.C.
[2]
Alberts, D., Yang, C. B., et al. Introduction to patent searching. Current challenges in patent information retrieval. (2011), 29: 3--43.
[3]
Xue X. and W.B,Croft. Automatic query generation for patent search, In Proceeding of the 18th ACM conference on Information and knowledge management (Hong Kong, China, November 02 - 06, 2009). CIKM '09. ACM, New York, NY, 3--8. DOI= http://doi.acm.org/10.1145/1645953.1646295
[4]
Iwayama, M., Fujii, A., Kando N., and Takano, A. Overview of patent retrieval task at NTCIR-3. In Proceedings of the ACL-2003 workshop on Patent corpus processing-Volume 20 (Sapporo, Japan, July 07 - 12, 2003). PATENT '03. Stroudsburg, PA, USA, 24--32. DOI= http://doi.acm.org/10.3115/1119303.1119306.
[5]
Fujii, A., Iwayama, M., and Kando, N. Overview of the patent retrieval task at the NTCIR-6 workshop. In Proceedings of NTCIR-6 Workshop Meeting (Tokyo, Japan, May 15 - 18, 2007), 359--365.
[6]
Nanba, H., Fujii, A., Iwayama, M., and Hashimoto, T. Overview of the patent mining task at the NTCIR-8 workshop. In Proceeding of the Eighth NTCIR Workshop (Tokyo, Japan, June 15 - 18, 2010), 293--302.
[7]
Goto, I. Lu, B, Chow, K. P, Sumita, E and Tsou, B, K. Overview of the patent machine translation task at the NTCIR-9 Workshop. In Proceeding of the NTCIR-9 Workshop (Tokyo, Japan, December 06 - 09, 2011), 559--578.
[8]
Parapatics, P. and Dittenbach, M. Patent claim decomposition for improved information extraction. In Proceedings of the second Workshop on Patent Information Retrieval (Hong Kong, China, November 02 - 06, 2009). PaIR '09. ACM, New York, NY, 33--36. DOI= http://doi.acm.org/10.1145/1651343.1651351.
[9]
Feng, G. P. Chen, X, Peng, Z. H. A rules and statistical learning based method for Chinese patent information extraction. In Proceedings of 8th Web Information System and Applications Conference (Chongqing, China, October 21 -- 23). WISA '10. IEEE, 114--118. DOI= http://doi.ieeecomputersociety.org/10.1109/WISA.2011.29.
[10]
Brin, S. Extracting patterns and relations from the world wide web. In Proceedings of WebDB Workshop at 6th International Conference on Extending Database Technology (Valencia, Spain, March 23--27, 1998) WebDB '98. Springer-Verlag London, UK. 172--183.
[11]
Agichtein, E. and Gravano, L. Snow-ball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM International Conference on Digital Libraries (San Antonio, TX, USA, June 02 -- 07, 2000). ACMDL '00, ACM, New York, NY, 85--94. DOI= http://doi.acm.org/10.1145/336597.336644.
[12]
Blum, A. and T. Mitchell. Combining labeled and unlabeled data with co-training, In Proceedings of the Workshop on Computational Learning Theory (Madison, WI, USA, July 24 - 26, 1998). COLT '98, ACM, New York, NY, 92 -- 100. DOI= http://doi.acm.org/10.1145/279943.279962

Cited By

View all
  • (2020)A clustering-based approach for the evaluation of candidate emerging technologiesScientometrics10.1007/s11192-020-03535-0Online publication date: 3-Jun-2020
  • (2019)A Method of Annotating Disease Names in TCM Patents Based on Co-trainingAdvances on P2P, Parallel, Grid, Cloud and Internet Computing10.1007/978-3-030-33509-0_35(389-398)Online publication date: 20-Oct-2019
  • (2018)A Semi-Automatic Annotation Method of Effect Clue Words for Chinese Patents Based on Co-TrainingInternational Journal of Data Warehousing and Mining10.4018/IJDWM.201810010114:4(1-19)Online publication date: 1-Oct-2018
  • Show More Cited By

Index Terms

  1. A co-training based method for chinese patent semantic annotation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
    October 2012
    2840 pages
    ISBN:9781450311564
    DOI:10.1145/2396761
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. co-training
    2. information extraction
    3. patent mining
    4. semantic annotation

    Qualifiers

    • Poster

    Conference

    CIKM'12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)A clustering-based approach for the evaluation of candidate emerging technologiesScientometrics10.1007/s11192-020-03535-0Online publication date: 3-Jun-2020
    • (2019)A Method of Annotating Disease Names in TCM Patents Based on Co-trainingAdvances on P2P, Parallel, Grid, Cloud and Internet Computing10.1007/978-3-030-33509-0_35(389-398)Online publication date: 20-Oct-2019
    • (2018)A Semi-Automatic Annotation Method of Effect Clue Words for Chinese Patents Based on Co-TrainingInternational Journal of Data Warehousing and Mining10.4018/IJDWM.201810010114:4(1-19)Online publication date: 1-Oct-2018
    • (2017)PaEffExtr: A Method to Extract Effect Statements Automatically from PatentsComplex, Intelligent, and Software Intensive Systems10.1007/978-3-319-61566-0_62(667-676)Online publication date: 5-Jul-2017
    • (2017)The Construction Method of Clue Words Thesaurus in Chinese Patents Based on Iteration and Self-filteringAdvances in Internetworking, Data & Web Technologies10.1007/978-3-319-59463-7_12(119-125)Online publication date: 28-May-2017
    • (2013)Technology and Effect Matrix for Patent ClusteringProceedings of the 2013 10th Web Information System and Application Conference10.1109/WISA.2013.33(128-132)Online publication date: 10-Nov-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media