Skip to main content

A Workflow-Based Large-Scale Patent Mining and Analytics Framework

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 920))

Abstract

The analysis of large volumes and complex scientific information such as patents requires new methods and a flexible, highly interactive and easy-to-use platform in order to enable a variety of applications ranging from information search, semantic analysis to specific text- and data mining tasks for information professionals in industry and research. In this paper, we present a scalable patent analytics framework built on top of a big-data architecture and a scientific workflow system. The framework allows to seamlessly integrate essential services for patent analysis employing natural language processing as well as machine learning algorithms for deeply structuring and semantically annotating patent texts for realizing complex scientific workflows. In two case studies we will show how the framework can be utilized for querying, annotating and analyzing large amounts of patent data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.stn-international.de.

  2. 2.

    https://phoenix.apache.org.

  3. 3.

    https://spark.apache.org/.

  4. 4.

    https://www.knime.com/.

  5. 5.

    https://github.com/ooyala/spark-jobserver.

  6. 6.

    https://www.knime.com/nodeguide/big-data/spark-executor/modularized-spark-scripting.

  7. 7.

    https://uima.apache.org/.

  8. 8.

    https://bitbucket.org/wwmm/oscar4.

  9. 9.

    https://www.ebi.ac.uk/unichem/.

  10. 10.

    http://www.wipo.int/classifications/ipc/en/.

References

  1. Hong, S.: The Magic of Patent Information. http://www.wipo.int/sme/en/documents/patent_information_fulltext.html

  2. Yoon, J., Kim, K.: TrendPerceptor: a property function based technology intelligence system for identifying technology trends from patents. Expert Syst. Appl. 39(3), 2927–2938 (2012)

    Article  Google Scholar 

  3. Choi, S., Park, H., Kang, D., Lee, J.Y., Kim, K.: An SAO based text mining approach to building a technology tree for technology planning. Expert. Syst. Appl. 39(13), 11443–11455 (2012)

    Article  Google Scholar 

  4. Trappey, C.V., Wu, H.Y., Taghaboni-Dutta, F., Trappey, A.J.C.: Using patent data for technology forecasting: China RFID patent analysis. Adv. Eng. Inform. 25(1), 53–64 (2011)

    Article  Google Scholar 

  5. Daim, T.U., Gomez, F.A., Martin, H., Sheikh, N.: Technology roadmap development process (TRDP) in the medical electronic device industry. Int. J. Bus. Innov. Res. 7(2), 228–263 (2013)

    Article  Google Scholar 

  6. Lee, Y., Kim, S., Shin, J.: Technology opportunity identification customized to the technological capability of SMEs through two-stage patent analysis. Scientometrics 100(1), 227–244 (2014)

    Article  Google Scholar 

  7. Abbas, A., Zhang, L., Khan, S.U.: A literature review on the state-of-the-art in patent analysis. World Pat. Inf. 37, 3–13 (2014)

    Article  Google Scholar 

  8. Hu, J., Li, S., Yao, Y., Yu, L., Yang, G., Hu, J.: Patent keyword extraction algorithm based on distributed representation for patent classification. Entropy 20, 104 (2018)

    Article  Google Scholar 

  9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA (2013)

    Google Scholar 

  10. Beltz, H., Fueloep, A., Wadhwa, R.R., Erdi, P.: From ranking and clustering of evolving networks to patent citation analysis. In: 2017 International Joint Conference on Neural Networks (IJCNN), vol. 350. IEEE (2017)

    Google Scholar 

  11. Jun, S., Park, S.-S., Jang, D.-S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert. Syst. Appl. 41(7), 3204–3212 (2014)

    Article  Google Scholar 

  12. Du, R., Drake, B., Park, H.: Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization, arXiv preprint arXiv:1703.09646

  13. Seo, W., Kim, N., Choi, S.: Big data framework for analyzing patents to support strategic R&D planning (2016)

    Google Scholar 

  14. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, OSDI 2004 (2004)

    Google Scholar 

  15. Tseng, Y., Lin, C., Lin, Y.: Text mining techniques for patent analysis. Inf. Process. Manag. 43(5), 1216–1247 (2007)

    Article  Google Scholar 

  16. Sofean, M.: Automatic segmentation of big data of patent texts. In: Bellatreche, L., Chakravarthy, S. (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 343–351. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64283-3_25

    Chapter  Google Scholar 

  17. Hackl-Sommer, R., Schwantner, M.: Patent claim structure recognition. Arch. Data Sci. Ser. A 2(1), 15 (2017)

    Google Scholar 

  18. Aras, H., Hackl-Sommer, R., Schwantner, M., Sofean, M.: Applications and challenges of text mining with patents. In: IPaMin@KONVENS (2014)

    Google Scholar 

  19. Vazquez, M., Krallinger, M., Leitner, F., Valencia, A.: Text mining for drugs and chemical compounds: methods, tools and applications. Mol. Inform. 30, 506–519 (2011)

    Article  Google Scholar 

  20. Matos, P., Alcaentara, R., Dekker, A., Ennis, M., Steinbeck, C.: Chemical entities of biological interest: an update. Nucleic Acids Res. 38, D249–D254 (2010)

    Article  Google Scholar 

  21. Trippe, A.: Guidelines for Preparing Patent Landscape Reports. Patinformatics, LLC, With contributions from WIPO Secretariat (2015)

    Google Scholar 

  22. Waltman, L., van Eck, N.J., Noyons, E.C.: A unified approach to mapping and clustering of bibliometric networks. J. Inform. 4(4), 629–635 (2010)

    Article  Google Scholar 

  23. Tang, J., et al.: PatentMiner: topic-driven patent analysis and mining. In: KDD 2012 (2012)

    Google Scholar 

  24. Ankam, S., Dou, W., Strumsky, D., Zadrozny, W.: Exploring emerging technologies using patent data and patent classification. In: CHI 2012 (2012)

    Google Scholar 

  25. Chen, H., Zhang, Y., Zhang, G., Zhu, D., Lu, J.: Modeling technological topic changes in patent claims. In: Proceedings of PIC MET 2015 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mustafa Sofean , Hidir Aras or Ahmad Alrifai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sofean, M., Aras, H., Alrifai, A. (2018). A Workflow-Based Large-Scale Patent Mining and Analytics Framework. In: Damaševičius, R., Vasiljevienė, G. (eds) Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science, vol 920. Springer, Cham. https://doi.org/10.1007/978-3-319-99972-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99972-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99971-5

  • Online ISBN: 978-3-319-99972-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics