ABSTRACT
In the current scenario of the world for Research and Development leading to patenting, content classification in accordance with the subject areas to which it belongs to is a challenging task. This is because today's R&D draws its novelty/newness not in one technical area but a unique combination of different technical areas. For example, a Typical ICT patent may be a composite effect for advancing the knowledge in some combination of Control Engg, Electronic Components, Databases Technology, Information retrieval methodology, Internet and Wireless technology, Speech, Signal, and Image Processing etc. In this paper, the work has been reported for the content classification for a newly drafted patent document using Probabilistic Latent Semantic Analysis technique. The probabilistic latent semantic analysis (PLSA) is used for automated indexing of the document by creating an indexer which tokenizes the documents and creates a proper generative model. Herein a singular value decomposition model is used for compacting the size of term document matrix and their co-occurrences in the matrix. The objective is to take up the large document corpora generated from the past patent document to categorize documents based on the concept generated model. The approach is illustrated and has been tested for by an example classification of the content for two typical US Patent Classes, and has been found to work well for them.
- Atsushi Fujii, Makoto lwayama, Noriko kando, Introduction to the Special issue on patnet proceesing, Information Processing & Management, Science Direct, Volume 43, issue 5, September 2007 Google ScholarDigital Library
- Kuei-Kuei Lai, Shiao-Jun, Wu, Using the Patent co-citation approach to establish a new patent classification system, Information Processing and Management International Journal (ACM) Volume 41, issue2, 2005 Google ScholarDigital Library
- Andreea Moldovan, Radu Ioan Bot, Gert Wanka, Latent Semantic indexing for Patent Documents, International Journal of Applied mathematics and Computer Science, 2005, Vol, 15, No, 4, 551--560.Google Scholar
- S. Deerwester, S. T. Dumais, G. W. Furnas, Landauer. T. K., and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 1990.Google Scholar
- T. Hofmann, J. Puzicha, and M. I. Jordan. Unsupervised learning from dyadic data. In Advances in Neural Information Processing Systems, volume 11. MIT Press, 1999 Google ScholarDigital Library
- Yanhong Liang, Runhua Tan, Chaoyang Wang, Zhiguang, Computer aided Classification of Patents Oriented to TRIZ, proceeding of the 2009 IEEE. 978-1-4244-4870-8.Google Scholar
- R. H. Tan, Q. Y. Tan, C. Y. Yuan, "Theory of Inventive Problem Solving (TRIZ)---The process, tools and developing trends of TRIZ ", Journal of Machine Design, vol.18, no. 7, 2001, pp7--11 (in Chinese)Google Scholar
Index Terms
- Patent classification of the new invention using PLSA
Recommendations
Patent Mining: A Survey
Patent documents are important intellectual resources of protecting interests of individuals, organizations and companies. Different from general web documents, patent documents have a well-defined format including frontpage, description, nclaims, and ...
Comparison of IPC and USPC classification systems in patent prior art searches
PaIR '10: Proceedings of the 3rd international workshop on Patent information retrievalPatent classification systems are used to help scrutinize patent applications for possible violations of the novelty and non-obviousness/inventive steps of a patentability test. There are several different patent classification systems in use today, ...
Searching in Cooperative Patent Classification: Comparison between keyword and concept-based search
International patent corpus is a gigantic source containing today about 80million of documents. Every patent is manually analyzed by patent officers and then classified by a specific code called Patent Class (PC). Cooperative Patent Classification CPC ...
Comments