Skip to main content

Extracting Information from Molecular Pathway Diagrams

  • Conference paper
  • First Online:
Graphics Recognition. Current Trends and Evolutions (GREC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11009))

Included in the following conference series:

  • 412 Accesses

Abstract

Health and life sciences’ research fields like personalized medicine, drug discovery, pharmacovigilance and systems biology make an intensive use of graphical information to represent knowledge in the form of domain-specific diagrams, such as molecular pathway‘s. The aim is to provide added value to written text in scientific literature and related documents. Enabling access to all the existing literature for further research requires enabling access to the information contained in these diagrams. Molecular pathways are very different from more conventional diagrams (e.g. flowcharts), and therefore interpretation of molecular pathway diagrams requires domain-specific knowledge to remove ambiguity. In this paper, we propose a method that automatically extracts information from molecular pathways using computer vision techniques. To the best of our knowledge this is the first attempt to retrieve information from images depicting molecular pathway diagrams. The lack of a significant, publicly available dataset with annotated ground truth has led to experimental evaluation on synthetic data. Results show high precision and recall values for the detection of entities and relations. We compare and describe the substantial differences between the proposed method and prior art on the closest diagram type using CLEF-IP flowchart summarization task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.ncbi.nlm.nih.gov/pmc/, as of March 2018.

References

  1. Brocke, J.V., et al.: Reconstructing the giant: On the importance of rigour in documenting the literature search process. In: ECIS Proceedings (2009)

    Google Scholar 

  2. Müller, H., Foncubierta-Rodríguez, A., Lin, C., Eggel, I.: Determining the importance of figures in journal articles to find representative images. In: SPIE Proceedings, vol. 8674 (2013)

    Google Scholar 

  3. Fabregat, A., et al.: The reactome pathway knowledgebase. Nucleic Acids Res. 44(D1), D481–D487 (2016)

    Article  Google Scholar 

  4. Petri, V., et al.: The pathway ontology - updates and applications. J. Biomed. Semant. 5(1), 7 (2014)

    Google Scholar 

  5. Davis, A.P., et al.: The comparative toxicogenomics database: update 2011. Nucleic Acids Res. 39(Database issue), D1067–D1072 (2011)

    Article  Google Scholar 

  6. Hayman, G.T., et al.: The updated RGD pathway portal utilizes increased curation efficiency and provides expanded pathway information. Hum. Genomics 7, 4 (2013)

    Article  Google Scholar 

  7. Paley, S.M., Latendresse, M., Karp, P.D.: Regulatory network operations in the pathway tools software. BMC Bioinformatics 13, 243 (2012)

    Article  Google Scholar 

  8. Ravikumar, K.E., Wagholikar, K.B., Liu, H.: Challenges in adapting text mining for full text articles to assist pathway curation. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2014, pp. 551–558. ACM, New York (2014)

    Google Scholar 

  9. García-Jiménez, B., Pons, T., Sanchis, A., Valencia, A.: Predicting protein relationships to human pathways through a relational learning approach based on simple sequence features. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(4), 753–765 (2014)

    Article  Google Scholar 

  10. Yoon, S., et al.: Systematic identification of context-dependent conflicting information in biological pathways. In: Proceedings of the ACM 8th International Workshop on Data and Text Mining in Bioinformatics, DTMBIO 2014, p. 9. ACM, New York (2014)

    Google Scholar 

  11. Luna, A., Sunshine, M.L., van Iersel, M.P., Aladjem, M.I., Kohn, K.W.: PathVisio-MIM: Pathvisio plugin for creating and editing molecular interaction maps (MIMs). Bioinformatics 27(15), 2165–2166 (2011)

    Article  Google Scholar 

  12. Wang, Y.T., Huang, Y.H., Chen, Y.C., Hsu, C.L., Yang, U.C.: PINT: pathways integration tool. Nucleic Acids Res. 38(Web Server issue), W124–W131 (2010)

    Article  Google Scholar 

  13. Le Novere, N., et al.: The systems biology graphical notation, 27(8) 735–741

    Google Scholar 

  14. Hucka, M., et al.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models, 19(4), 524–531

    Google Scholar 

  15. Garcia Seco de Herrera, A., Kalpathy-Cramer, J., Demner-Fushman, D., Antani, S., Müller, H.: Overview of the ImageCLEF 2013 medical tasks. In: CLEF (Working Notes) (2013)

    Google Scholar 

  16. Müller, H., Kalpathy-Cramer, J., Demner-Fushman, D., Antani, S.: Creating a classification of image types in the medical literature for visual categorization. In: Proceedings SPIE 8319, Medical Imaging 2012: Advanced PACS-based Imaging Informatics and Therapeutic Applications, vol. 8319, pp. 83190P–83190P-12 (2012)

    Google Scholar 

  17. Foncubierta-Rodríguez, A., García Seco de Herrerea, A. Müller, H.: Medical image retrieval using bag of meaningful visual words: unsupervised visual vocabulary pruning with PLSA. In: Proceedings of the 1st ACM International Workshop on Multimedia Indexing and Information Retrieval for Healthcare, MIIRH 2013, pp. 75–82. ACM (2013)

    Google Scholar 

  18. Puddu, A., Mach, F., Nencioni, A., Viviani, G.L., Montecucco, F.: An Emerging Role of Glucagon-Like Peptide-1 in Preventing Advanced-Glycation-End-Product-Mediated Damages in Diabetes. Mediators of Inflammation 2013 (2013)

    Google Scholar 

  19. Enders, G.H.: Gauchos and ochos: a Wee1-Cdk tango regulating mitotic entry. Cell Div. 5, 12 (2010)

    Article  Google Scholar 

  20. Kim, H.L., Seo, Y.R.: Molecular and genomic approach for understanding the gene-environment interaction between Nrf2 deficiency and carcinogenic nickel-induced DNA damage. Oncol. Rep. 28(6), 1959–1967 (2012)

    Article  Google Scholar 

  21. Futrelle, R.P.: Strategies for diagram understanding: generalized equivalence, spatial/object pyramids and animate vision. In: Proceedings of the Conference on 10th International Pattern Recognition, vol. 1, pp. 403–408 (1990)

    Google Scholar 

  22. Futrelle, R.P., Kakadiaris, I.A., Alexander, J., Carriero, C.M., Nikolakis, N., Futrelle, J.M.: Understanding diagrams in technical documents. Computer 25(7), 75–78 (1992)

    Article  Google Scholar 

  23. Lank, E., Thorley, J., Chen, S., Blostein, D.: On-line recognition of UML diagrams. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, Institute of Electrical & Electronics Engineers (IEEE) (2001)

    Google Scholar 

  24. Zheng, W.T., Sun, Z.X.: Knowledge-based hierarchical sketch understanding. In: Proceedings of International Conference Machine Learning and Cybernetics, vol. 5, pp. 2838–2843, August 2005

    Google Scholar 

  25. Hammond, T., Davis, R.: Tahuti: a geometrical sketch recognition system for UML class diagrams. In: ACM SIGGRAPH 2006 Courses, SIGGRAPH 2006. ACM, New York (2006)

    Google Scholar 

  26. Thean, A., Deltorn, J.M., Lopez, P., Romary, L.: Textual summarisation of flowcharts in patent drawings for CLEF-IP 2012. In: CLEF 2012 (2012)

    Google Scholar 

  27. Lupu, M., Piroi, F., Hanbury, A.: Evaluating flowchart recognition for patent retrieval. In: EVIA@ NTCIR (2013)

    Google Scholar 

  28. Rusiñol, M., de las Heras, L.P., Terrades, O.R.: Flowchart recognition for non-textual information retrieval in patent search. Inf. Retrieval 17(5-6), 545–562 (2014)

    Article  Google Scholar 

  29. Forbus, K.D., Usher, J., Chapman, V.: Sketching for military courses of action diagrams. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, IUI 2003, pp. 61–68. ACM, New York (2003)

    Google Scholar 

  30. Mas, J., Sanchez, G., Llados, J., Lamiroy, B.: An incremental on-line parsing algorithm for recognizing sketching diagrams. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Institute of Electrical & Electronics Engineers (IEEE), September 2007

    Google Scholar 

  31. Feng, G., Viard-Gaudin, C., Sun, Z.: On-line hand-drawn electric circuit diagram recognition using 2D dynamic programming. Pattern Recogn. 42(12), 3215–3223 (2009)

    Article  Google Scholar 

  32. Nakamura, Y., Furukawa, R., Nagao, M.: Diagram understanding utilizing natural language text. In: Proceedings of Second International Document Analysis and Recognition Conference, pp. 614–618, October 1993

    Google Scholar 

  33. Butler, G., Grogono, P., Shinghal, R., Tjandra, I.: Retrieving information from data flow diagrams. In: Proceedings of 2nd Working Conference Reverse Engineering, pp. 22–29, July 1995

    Google Scholar 

  34. Watanabe, Y., Nagao, M.: Diagram understanding using integration of layout information and textual information. In: Proceedings of the 17th International Conference on Computational Linguistics, COLING 1998, vol. 2, pp. 1374–1380. Association for Computational Linguistics, Stroudsburg (1998)

    Google Scholar 

  35. Mörzinger, R., Schuster, R., Horti, A., Thallinger, G.: Visual structure analysis of flow charts in patent images. In: CLEF (Online Working Notes/Labs/Workshop) (2012)

    Google Scholar 

  36. Rusiñol, M., et al.: CVC-UAB’s participation in the flowchart recognition task of CLEF-IP 2012. In: CLEF (Online Working Notes/Labs/Workshop) (2012)

    Google Scholar 

  37. Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 02, pp. 629–633. IEEE Computer Society, Washington, DC (2007)

    Google Scholar 

  38. Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, p. 50. Citeseer (1988)

    Google Scholar 

  39. Su, Z., Yang, Z., Xu, Y., Chen, Y., Yu, Q.: MicroRNAs in apoptosis, autophagy and necroptosis. Oncotarget 6(11), 8474–8490 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Foncubierta-Rodríguez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Foncubierta-Rodríguez, A., Ciubotaru, AN., Bekas, C., Gabrani, M. (2018). Extracting Information from Molecular Pathway Diagrams. In: Fornés, A., Lamiroy, B. (eds) Graphics Recognition. Current Trends and Evolutions. GREC 2017. Lecture Notes in Computer Science(), vol 11009. Springer, Cham. https://doi.org/10.1007/978-3-030-02284-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02284-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02283-9

  • Online ISBN: 978-3-030-02284-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics