skip to main content
10.1145/2910896.2925465acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
poster

Inventor Name Disambiguation for a Patent Database Using a Random Forest and DBSCAN

Authors Info & Claims
Published:19 June 2016Publication History

ABSTRACT

Inventor name disambiguation is the task that distinguishes each unique inventor from all other inventor records in a patent database. This task is essential for processing person name queries in order to get information related to a specific inventor, e.g. a list of all that inventor's patents. Using earlier work on author name disambiguation, we apply it to inventor name disambiguation. A random forest classifier is trained to classify whether each pair of inventor records is the same person. The DBSCAN algorithm is use for inventor record clustering, and its distance function is derived using the random forest classifier. For scalability, blocking functions are used to reduce the complexity of record matching and enable parallelization since each block can be run simultaneously. Tested on the USPTO patent database, 12 million inventor records were disambiguated in 6.5 hours. Evaluation on the labeled datasets from USPTO PatentsView competition shows our algorithm outperforms all algorithms submitted to the competition.

References

  1. L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'96), volume 96, pages 226--231, 1996.Google ScholarGoogle Scholar
  3. J. Huang, S. Ertekin, and C. L. Giles. Efficient name disambiguation for large-scale databases. In Proceedings of the 10th European Conference on Principle and Practice of Knowledge Discovery in Databases(PKDD'06), pages 536--544, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Khabsa, P. Treeratpituk, and C. L. Giles. Large scale author name disambiguation in digital libraries. In IEEE International Conference on Big Data, pages 41--42, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Khabsa, P. Treeratpituk, and C. L. Giles. Online person name disambiguation with constraints. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries(JCDL'15), pages 37--46, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. O. Tange et al. Gnu parallel-the command-line power tool. The USENIX Magazine, 36(1):42--47, 2011.Google ScholarGoogle Scholar
  7. P. Treeratpituk and C. L. Giles. Disambiguating authors in academic publications using random forests. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries(JCDL'09), pages 39--48, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. L. Ventura, R. Nugent, and E. R. Fuchs. Seeing the non-stars:(some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records. Research Policy, 2015.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Inventor Name Disambiguation for a Patent Database Using a Random Forest and DBSCAN

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries
      June 2016
      316 pages
      ISBN:9781450342292
      DOI:10.1145/2910896

      Copyright © 2016 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 June 2016

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      JCDL '16 Paper Acceptance Rate15of52submissions,29%Overall Acceptance Rate415of1,482submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader