Abstract
Biomedical literature is composed of an ever increasing number of publications in natural language. Patents are a relevant fraction of those, being important sources of information due to all the curated data from the granting process. However, their unstructured data turns the search of information a challenging task. To surpass that, Biomedical text mining (BioTM) creates methodologies to search and structure that data. Several BioTM techniques can be applied to patents. From those, Information Retrieval is the process where relevant data is obtained from collections of documents. In this work, a patent pipeline was developed and integrated into @Note2, an open-source computational framework for BioTM. This integration allows to run further BioTM tools over the patent documents, including Information Extraction processes as Named Entity Recognition or Relation Extraction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Faro, A., Giordano, D., Spampinato, C.: Combining literature text mining with microarray data: advances for system biology modeling. Brief Bioinform. 13(1), 61–82 (2012)
Klinger, R., Kolarik, C., Fluck, J., Hofmann-Apitius, M., Friedrich, C.M.: Detection of IUPAC and IUPAC-like chemical names. Bioinformatics 24(13), i268–i276 (2008)
WIPO, Guidelines for Preparing Patent Landscape Reports (2015)
Latimer, M.T.: Patenting inventions arising from biological research. Genome Biol. 6(1), 203 (2005)
WIPO, WIPO Guide to Using Patent Information (2015)
Papadatos, G., Davies, M., Dedman, N., Chambers, J., Gaulton, A., Siddle, J., Koks, R., Irvine, S.A., Pettersson, J., Goncharoff, N., Hersey, A., Overington, J.P.: Surechembl: a large-scale, chemically annotated patent document database. Nucleic Acids Res. 44(D1), D1220–D1228 (2016)
Wu, C., Schwartz, J.M., Brabant, G., Peng, S.L., Nenadic, G.: Constructing a molecular interaction network for thyroid cancer via large-scale text mining of gene and pathway events. BMC Syst. Biol. 9(Suppl. 6), S5 (2015)
Lu, Z.: Pubmed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford), vol. 2011, p. baq036 (2011)
WIPO, World Intellectual Property Indicators, 2015th edn. World Intellectual Property Organization - Economics and Statistics Division (2015)
Cohen, K.B., Hunter, L.: Getting started in text mining. PLoS Comput. Biol. 4(1), e20 (2008)
Miner, G., Elder, J., Hill, T., Nisbet, R., Delen, D., Fast, A.: Practical text mining and statistical analysis for non-structured text data applications. Academic Press (2012)
Krallinger, M., Valencia, A.: Text-mining and information-retrieval services for molecular biology. Genome Biol. 6(7), 224 (2005)
Asif, A.M.A.M., Hannan, S.A., Perwej, Y., Vithalrao, M.A.: An overview and applications of optical character recognition. Int. J. Adv. Res. Sci. Eng. 3(7) (2014)
Holley, R.: How good can it get? analysing and improving OCR accuracy in large scale historic newspaper digitisation programs. D-Lib Magazine 15 (2009)
Lourenço, A., Carreira, R., Carneiro, S., Maia, P., Glez-Peña, D., Fdez-Riverola, F., Ferreira, E.C., Rocha, I., Rocha, M.: @note: a workbench for biomedical text mining. J. Biomed. Inform. 42(4), 710–720 (2009)
Google, About google patents (2017)
Acknowledgments
This work is co-funded by the North Portugal Regional Operational Programme, under “Portugal 2020”, through the European Regional Development Fund (ERDF), within project SISBI- \(Ref^a\)NORTE-01-0247-FEDER-003381. This study was also supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684) and BioTecNorte (NORTE-01-0145-FEDER-000004) funded by European Regional Development Fund under the scope of Norte2020 - Programa Operacional Regional do Norte.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Alves, T., Rodrigues, R., Costa, H., Rocha, M. (2017). Development of Text Mining Tools for Information Retrieval from Patents. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., Pinto, T. (eds) 11th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2017. Advances in Intelligent Systems and Computing, vol 616. Springer, Cham. https://doi.org/10.1007/978-3-319-60816-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-60816-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60815-0
Online ISBN: 978-3-319-60816-7
eBook Packages: EngineeringEngineering (R0)