skip to main content
10.1145/2339530.2339741acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

PatentMiner: topic-driven patent analysis and mining

Authors Info & Claims
Published:12 August 2012Publication History

ABSTRACT

Patenting is one of the most important ways to protect company's core business concepts and proprietary technologies. Analyzing large volume of patent data can uncover the potential competitive or collaborative relations among companies in certain areas, which can provide valuable information to develop strategies for intellectual property (IP), R&D, and marketing. In this paper, we present a novel topic-driven patent analysis and mining system. Instead of merely searching over patent content, we focus on studying the heterogeneous patent network derived from the patent database, which is represented by several types of objects (companies, inventors, and technical content) jointly evolving over time. We design and implement a general topic-driven framework for analyzing and mining the heterogeneous patent network. Specifically, we propose a dynamic probabilistic model to characterize the topical evolution of these objects within the patent network. Based on this modeling framework, we derive several patent analytics tools that can be directly used for IP and R&D strategy planning, including a heterogeneous network co-ranking method, a topic-level competitor evolution analysis algorithm, and a method to summarize the search results. We evaluate the proposed methods on a real-world patent database. The experimental results show that the proposed techniques clearly outperform the corresponding baseline methods.

Skip Supplemental Material Section

Supplemental Material

310_w_talk_2.mp4

mp4

249.4 MB

References

  1. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In SIGIR 2004, pages 25--32, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR'98, pages 335--336, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. In ICML'00, pages 167--174, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the trec-2005 enterprise track. In TREC 2005 Conference Notebook, pages 199--205, 2005.Google ScholarGoogle Scholar
  6. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI'04, pages 10--10, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. L. Griffiths and M. Steyvers. Finding scientific topics. In PNAS'04, pages 5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. M. Hertzum and A. M. Pejtersen. The information-seeking practices of engineers: Searching for documents as well as for people. Information Processing & Management, 36(5):761--778, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Hofmann. Probabilistic latent semantic indexing. In SIGIR'99, pages 50--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. McCallum. Multi-label text classification with a mixture model trained by em. In Proceedings of AAAI'99 Workshop on Text Learning, 1999.Google ScholarGoogle Scholar
  11. Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: modeling facets and opinions in weblogs. In WWW'07, pages 171--180, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report SIDL-WP-1999-0120, Stanford University, 1999.Google ScholarGoogle Scholar
  13. M. Steyvers, P. Smyth, and T. Griffiths. Probabilistic author-topic models for information discovery. In KDD'04, pages 306--315, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Tang, R. Jin, and J. Zhang. A topic modeling approach and its integration into the random walk framework for academic search. In ICDM'08, pages 1055--1060, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD'09, pages 807--816, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Tang, L. Yao, D. Zhang, and J. Zhang. A combination approach to web user profiling. ACM TKDD, 5(1):1--44, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Tang, J. Zhang, R. Jin, Z. Yang, K. Cai, L. Zhang, and Z. Su. Topic level expertise search over heterogeneous networks. Machine Learning Journal, 82(2):211--237, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD'08, pages 990--998, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y.-H. Tseng, C.-J. Lin, and Y.-I. Lin. Text mining techniques for patent analysis. Inf. Process. Manage., 43:1216--1247, September 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. van Rijsbergen. Information Retrieval. But-terworths, London, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. Wan, J. Yang, and J. Xiao. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In ACL'07, pages 552--559, 2007.Google ScholarGoogle Scholar
  22. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR'01, pages 334--342, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Zhang, J. Tang, and J. Li. Expert finding in a social network. In DASFAA'07, pages 1066--1069, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  24. X. Zhu and J. Lafferty. Harmonic mixtures: Combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. In ICML'05, pages 1052--1059, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PatentMiner: topic-driven patent analysis and mining

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2012
        1616 pages
        ISBN:9781450314626
        DOI:10.1145/2339530

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader