Development of a patent document classification and search platform using a back-propagation network
Introduction
With the rapid development of information technology, the number of electronic documents and the digital content of documents exceed the capacity of manual control and management. People are increasingly required to handle wide ranges of information from multiple sources. As a result, knowledge management systems are implemented by enterprises and organizations to manage their information and knowledge more effectively. Knowledge management includes sorting useful knowledge from information, storing knowledge in good order, and finding knowledge in an existing knowledge base (Turban & Aronson, 2001). In this research, we focus on explicit knowledge management, i.e., management of well-structured documents such as patent documents (called “patents” in brief). Patents provide exclusive rights and legal protection for patent inventors. In addition, patents play an important role in the advancement and diffusion of technology. The objective of this research is to develop an effective methodology to automatically classify and identify patent documents. Furthermore, a prototype system is implemented and tested using hand tools patents sourced from the World Intellectual Property Organization (WIPO) database for scenario demonstration.
There have been many research efforts devoted to automatic document classification. Some of the classification methodologies are difficult to implement, and others are neither efficient nor effective, requiring developers of knowledge management systems to expend considerable resources testing and evaluating algorithms. The purpose of this research is to develop a document classification method based on neural networks and benchmark the performance against published standards. Through the implementation of a document classification module and a document search module, a prototype patent document management system is created. The patent document management system automates the classification of patent documents and improves the search for documents.
The automatic document classification methodology is described in the following steps: First, significant terms are abstracted from patent documents and are used to build a key phrase database. Second, the similarities between phrases are computed and depicted in a correlation matrix in order to synthesize phrases into a smaller set representing key concepts within the patent domain. After the steps of key phrase extraction and synthesis, a consolidated set of key phrases are treated as inputs of the back-propagation network model. The neural network model is trained using key phrases and the frequency of key phrases from the sample documents. The trained model is assessed until it reaches a satisfactory level of accuracy. After the network model is trained, the final step is to use the model for automated patent document classification and search.
Section snippets
Literature survey
This section reviews the relevant topics including knowledge and e-document management, document categorization, clustering methodologies and related patent analysis research.
System architecture and methodology
This section depicts the detailed methodologies for document classification and document search. First, a document content extraction model is built to represent the document content with a vector consisting of key phrase frequencies. Second, a document classification model based on the back-propagation network (BPN) approach is developed. Finally, a document search model is implementing using a trained back-propagation network.
Function modules of the system
The prototype system includes the System Parameters Management Module, the Automatic Categorization Module, and the Document Search Module. The system parameters management module provides the interface to adjust keyword correlation values and neural network weights. The automatic categorization module contains the functions of document upload, content extraction and document categorization. Finally, the document search module provides users an interface to search and download documents.
Conclusion
The back-propagation networks (BPN) algorithm provides advantages of non-linear problem solving ability and learning by example. There are limitations to the application of BPN since inadequate training data may yield an unreliable model and the training procedure may require significant computing resources. The first limitation can be solved by compiling a wide range of examples. Since a well-trained model can help companies better manage documents, the cost of computing resources can be
Acknowledgement
This research is funded partially by the Taiwan Ministry of Economic Affairs and National Science Council research grants.
References (35)
The evolution of technological trajectories 1890–1990
Structural Change and Economic Dynamics
(1998)- et al.
Knowledge-relatedness in firm technological diversification
Research Policy
(2003) - et al.
Historical evolution of technological diversification
Research Policy
(2004) Patent citation analysis: A policy analysis tool
World Patent Information
(1997)On the quality ART1 text clustering
Neural Networks
(2003)- et al.
Automatic classification using supervised learning in a medical document filtering application
Information Processing and Management
(2000) - et al.
Web page feature selection and classification using neural networks
Information Sciences
(2004) - et al.
A text-mining-based patent network: Analytical tool for high-technology trend
Journal of High Technology Management Research
(2004) - Antonie, M.-L., & Zaiane, O. R. (2002). Text document categorization by term association. In Proceedings of IEEE...
- et al.
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies
The International Journal on very Large Data Bases
(1998)
Integrating structured data and text: A relational approach
Journal of the American Society for Information Science
Genetic algorithms
Scientific American
A document content extraction model using keyword correlation analysis
International Journal of Electronic Business Management
Develop a multi-channel legal knowledge service center with knowledge mining capability
International Journal of Electronic Business Management
Cited by (113)
Automatic topology optimization of echo state network based on particle swarm optimization
2023, Engineering Applications of Artificial IntelligenceA survey on deep learning for patent analysis
2021, World Patent InformationNovel mixed-encoding for forecasting patent grant duration
2021, World Patent InformationA systematic literature review on intelligent automation: Aligning concepts from theory, practice, and future perspectives
2021, Advanced Engineering InformaticsCitation Excerpt :The number of manual inspections required can significantly reduce as error prevention and compliance of regulations can be performed simultaneously with each digitalised decision process with the use of IA. For example, RPA can detect any erroneous human input in the work papers in accounting [1], extract contents from patents, legal documents [101–103] and validate the medical processes that satisfy the drug regulations in healthcare and clinical research [57,62,63]. Nonetheless, valuable insights for model correction can also be accomplished when the IA faces error.
Automated classification of patents: A topic modeling approach
2020, Computers and Industrial EngineeringParameter tuning Naïve Bayes for automatic patent classification
2020, World Patent Information