skip to main content
10.1145/2623330.2623667acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Automated hypothesis generation based on mining scientific literature

Published: 24 August 2014 Publication History

Abstract

Keeping up with the ever-expanding flow of data and publications is untenable and poses a fundamental bottleneck to scientific progress. Current search technologies typically find many relevant documents, but they do not extract and organize the information content of these documents or suggest new scientific hypotheses based on this organized content. We present an initial case study on KnIT, a prototype system that mines the information contained in the scientific literature, represents it explicitly in a queriable network, and then further reasons upon these data to generate novel and experimentally testable hypotheses. KnIT combines entity detection with neighbor-text feature analysis and with graph-based diffusion of information to identify potential new properties of entities that are strongly implied by existing relationships. We discuss a successful application of our approach that mines the published literature to identify new protein kinases that phosphorylate the protein tumor suppressor p53. Retrospective analysis demonstrates the accuracy of this approach and ongoing laboratory experiments suggest that kinases identified by our system may indeed phosphorylate p53. These results establish proof of principle for automated hypothesis generation and discovery based on text mining of the scientific literature.

Supplementary Material

MP4 File (p1877-sidebyside.mp4)

References

[1]
ALTSCHUL, S.F., GISH, W., MILLER, W., MYERS, E.W., and LIPMAN, D.J., 1990. Basic local alignment search tool. J Mol Biol 215, 3 (Oct 5), 403--410. DOI= http://dx.doi.org/10.1016/S0022--2836(05)80360--2.
[2]
ASHBURNER, M., BALL, C.A., BLAKE, J.A., BOTSTEIN, D., BUTLER, H., CHERRY, J.M., DAVIS, A.P., DOLINSKI, K., DWIGHT, S.S., EPPIG, J.T., HARRIS, M.A., HILL, D.P., ISSEL-TARVER, L., KASARSKIS, A., LEWIS, S., MATESE, J.C., RICHARDSON, J.E., RINGWALD, M., RUBIN, G.M., and SHERLOCK, G., 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 1 (May), 25--29. DOI= http://dx.doi.org/10.1038/75556.
[3]
BELKIN, M., MATVEEVA, I., and NIYOGI, P., 2004. Regularization and Semi-supervised Learning on Large Graphs. In Learning Theory, J. SHAWE-TAYLOR and Y. SINGER Eds. Springer Berlin Heidelberg, 624--638. DOI= http://dx.doi.org/10.1007/978--3--540--27819--1_43.
[4]
BJÖRK, B.-C., ROOSR, A., and LAURI, M., Global annual volume of peer reviewed scholarly articles and the share available via different open access options. In Sustainability in the Age of Web 2.0 - Proceedings of the 12th International Conference on Electronic Publishing, Toronto, Canada.
[5]
CHUNG, F.R.K., 1997. Spectral Graph Theory American Mathematical Society.
[6]
COORDINATORS, N.R., 2014. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 42, 1 (Jan 1), D7-D17. DOI= http://dx.doi.org/10.1093/nar/gkt1146.
[7]
DA COSTA, C.A., SUNYACH, C., GIAIME, E., WEST, A., CORTI, O., BRICE, A., SAFE, S., ABOU-SLEIMAN, P.M., WOOD, N.W., TAKAHASHI, H., GOLDBERG, M.S., SHEN, J., and CHECLER, F., 2009. Transcriptional repression of p53 by parkin and impairment by mutations associated with autosomal recessive juvenile Parkinson's disease. Nat Cell Biol 11, 11 (Nov), 1370--1375. DOI= http://dx.doi.org/10.1038/ncb1981.
[8]
DAI, C. and GU, W., 2010. p53 post-translational modification: deregulated in tumorigenesis. Trends Mol Med 16, 11 (Nov), 528--536. DOI= http://dx.doi.org/10.1016/j.molmed.2010.09.002.
[9]
DERDAK, Z., VILLEGAS, K.A., HARB, R., WU, A.M., SOUSA, A., and WANDS, J.R., 2013. Inhibition of p53 attenuates steatosis and liver injury in a mouse model of non-alcoholic fatty liver disease. J Hepatol 58, 4 (Apr), 785--791. DOI= http://dx.doi.org/10.1016/j.jhep.2012.11.042.
[10]
GOH, K.I., CUSICK, M.E., VALLE, D., CHILDS, B., VIDAL, M., and BARABASI, A.L., 2007. The human disease network. Proc Natl Acad Sci U S A 104, 21 (May 22), 8685--8690. DOI= http://dx.doi.org/10.1073/pnas.0701361104.
[11]
GRAY, K.A., DAUGHERTY, L.C., GORDON, S.M., SEAL, R.L., WRIGHT, M.W., and BRUFORD, E.A., 2013. Genenames.org: the HGNC resources in 2013. Nucleic Acids Res 41, Database issue (Jan), D545--552. DOI= http://dx.doi.org/10.1093/nar/gks1066.
[12]
GU, B. and ZHU, W.G., 2012. Surf the post-translational modification network of p53 regulation. Int J Biol Sci 8, 5, 672--684. DOI= http://dx.doi.org/10.7150/ijbs.4283.
[13]
HAGER, K.M. and GU, W., 2014. Understanding the non-canonical pathways involved in p53-mediated tumor suppression. Carcinogenesis(Feb 3). DOI= http://dx.doi.org/10.1093/carcin/bgt487.
[14]
HORNBECK, P.V., KORNHAUSER, J.M., TKACHEV, S., ZHANG, B., SKRZYPEK, E., MURRAY, B., LATHAM, V., and SULLIVAN, M., 2012. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res 40, Database issue (Jan), D261--270. DOI= http://dx.doi.org/10.1093/nar/gkr1122.
[15]
JENKINS, L.M., DURELL, S.R., MAZUR, S.J., and APPELLA, E., 2012. p53 N-terminal phosphorylation: a defining layer of complex regulation. Carcinogenesis 33, 8 (Aug), 1441--1449. DOI= http://dx.doi.org/10.1093/carcin/bgs145.
[16]
JINHA, A.E., 2010. Article 50 million: an estimate of the number of scholarly articles in existence. Learned Publishing 23, 3 (//), 258--263. DOI= http://dx.doi.org/10.1087/20100308.
[17]
LANGLEY, P., BRADSHAW, G., and SIMON, H., 1983. Rediscovering Chemistry with the Bacon System. In Machine Learning, R. MICHALSKI, J. CARBONELL and T. MITCHELL Eds. Springer Berlin Heidelberg, 307--329. DOI= http://dx.doi.org/10.1007/978--3--662--12405--5_10.
[18]
LARSEN, P.O. and VON INS, M., 2010. The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics 84, 3 (Sep), 575--603. DOI= http://dx.doi.org/10.1007/s11192-010-0202-z.
[19]
LI, M., HE, Y., DUBOIS, W., WU, X., SHI, J., and HUANG, J., 2012. Distinct regulatory mechanisms and functions for p53-activated and p53-repressed DNA damage response genes in embryonic stem cells. Mol Cell 46, 1 (Apr 13), 30--42. DOI= http://dx.doi.org/10.1016/j.molcel.2012.01.020.
[20]
LISEWSKI, A.M. and LICHTARGE, O., 2010. Untangling complex networks: risk minimization in financial markets through accessible spin glass ground states. Physica A 389, 16 (Aug 15), 3250--3253. DOI= http://dx.doi.org/10.1016/j.physa.2010.04.005.
[21]
MANNING, G., WHYTE, D.B., MARTINEZ, R., HUNTER, T., and SUDARSANAM, S., 2002. The protein kinase complement of the human genome. Science 298, 5600 (Dec 6), 1912--1934. DOI= http://dx.doi.org/10.1126/science.1075762.
[22]
MAY, P. and MAY, E., 1999. Twenty years of p53 research: structural and functional aspects of the p53 protein. Oncogene 18, 53 (Dec 13), 7621--7636. DOI= http://dx.doi.org/10.1038/sj.onc.1203285.
[23]
MEEK, D.W. and ANDERSON, C.W., 2009. Posttranslational modification of p53: cooperative integrators of function. Cold Spring Harb Perspect Biol 1, 6 (Dec), a000950. DOI= http://dx.doi.org/10.1101/cshperspect.a000950.
[24]
MULLER, P.A. and VOUSDEN, K.H., 2013. p53 mutations in cancer. Nat Cell Biol 15, 1 (Jan), 2--8. DOI= http://dx.doi.org/10.1038/ncb2641.
[25]
NATHANSON, J.W., YADRON, N.E., FARNAN, J., KINNEAR, S., HART, J., and RUBIN, D.T., 2008. p53 mutations are associated with dysplasia and progression of dysplasia in patients with Crohn's disease. Dig Dis Sci 53, 2 (Feb), 474--480. DOI= http://dx.doi.org/10.1007/s10620-007--9886--1.
[26]
SALTON, G. and MCGILL, M.J., 1986. Introduction to Modern Information Retrieval. McGraw-Hill, Inc.
[27]
SHAWVER, L.K., SLAMON, D., and ULLRICH, A., 2002. Smart drugs: tyrosine kinase inhibitors in cancer therapy. Cancer Cell 1, 2 (Mar), 117--123.
[28]
SHIEH, S.Y., AHN, J., TAMAI, K., TAYA, Y., and PRIVES, C., 2000. The human homologs of checkpoint kinases Chk1 and Cds1 (Chk2) phosphorylate p53 at multiple DNA damage-inducible sites. Genes Dev 14, 3 (Feb 1), 289--300.
[29]
SIGANAKI, M., KOUTSOPOULOS, A.V., NEOFYTOU, E., VLACHAKI, E., PSARROU, M., SOULITZIS, N., PENTILAS, N., SCHIZA, S., SIAFAKAS, N.M., and TZORTZAKI, E.G., 2010. Deregulation of apoptosis mediators' p53 and bcl2 in lung tissue of COPD patients. Respir Res 11, 46. DOI= http://dx.doi.org/10.1186/1465--9921--11--46.
[30]
SRINIVASAN, P., 2004. Text mining: generating hypotheses from MEDLINE. J. Am. Soc. Inf. Sci. Technol. 55, 5, 396--413. DOI= http://dx.doi.org/10.1002/asi.10389.
[31]
SWANSON, D.R., 1986. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med 30, 1 (Autumn), 7--18.
[32]
UNIPROT, C., 2013. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 41, Database issue (Jan), D43--47. DOI= http://dx.doi.org/10.1093/nar/gks1068.
[33]
WHEELER, D.L., CHURCH, D.M., FEDERHEN, S., LASH, A.E., MADDEN, T.L., PONTIUS, J.U., SCHULER, G.D., SCHRIML, L.M., SEQUEIRA, E., TATUSOVA, T.A., and WAGNER, L., 2003. Database resources of the National Center for Biotechnology. Nucleic Acids Res 31, 1 (Jan 1), 28--33.
[34]
ZHOU, D., BOUSQUET, O., WESTON, J., and SCHOLKOPF, B., 2004. Learning with local and global consistency. In Adnvaces in Neural Information Processing Systems (NIPS) 16 MIT, 321--328.

Cited By

View all
  • (2024)pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in AstronomyThe Astrophysical Journal Supplement Series10.3847/1538-4365/ad7c43275:2(38)Online publication date: 29-Nov-2024
  • (2024)Artificial Intelligence in Clinical Trials: The Present Scenario and Future ProspectsAI Innovations in Drug Delivery and Pharmaceutical Sciences; Advancing Therapy through Technology10.2174/9789815305753124010013(229-257)Online publication date: 14-Nov-2024
  • (2024)How deep is your art: An experimental study on the limits of artistic understanding in a single-task, single-modality neural networkPLOS ONE10.1371/journal.pone.030594319:11(e0305943)Online publication date: 6-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2014
2028 pages
ISBN:9781450329569
DOI:10.1145/2623330
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hypothesis generation
  2. scientific discovery
  3. text mining

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '14
Sponsor:

Acceptance Rates

KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)104
  • Downloads (Last 6 weeks)10
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in AstronomyThe Astrophysical Journal Supplement Series10.3847/1538-4365/ad7c43275:2(38)Online publication date: 29-Nov-2024
  • (2024)Artificial Intelligence in Clinical Trials: The Present Scenario and Future ProspectsAI Innovations in Drug Delivery and Pharmaceutical Sciences; Advancing Therapy through Technology10.2174/9789815305753124010013(229-257)Online publication date: 14-Nov-2024
  • (2024)How deep is your art: An experimental study on the limits of artistic understanding in a single-task, single-modality neural networkPLOS ONE10.1371/journal.pone.030594319:11(e0305943)Online publication date: 6-Nov-2024
  • (2024)SCIHYPO - A Deep Learning Framework for Data-Driven Scientific Hypothesis Generation from Extensive Literature Analysis2024 International Conference on Expert Clouds and Applications (ICOECA)10.1109/ICOECA62351.2024.00180(1037-1042)Online publication date: 18-Apr-2024
  • (2024)Learning to Rank Complex Biomedical Hypotheses for Accelerating Scientific Discovery2024 IEEE 12th International Conference on Healthcare Informatics (ICHI)10.1109/ICHI61247.2024.00044(285-293)Online publication date: 3-Jun-2024
  • (2024)Data-driven hypothesis generation among inexperienced clinical researchers: A comparison of secondary data analyses with visualization (VIADS) and other toolsJournal of Clinical and Translational Science10.1017/cts.2023.7088:1Online publication date: 4-Jan-2024
  • (2024)Analyzing research diversity of scholars based on multi-dimensional calculation of knowledge entitiesScientometrics10.1007/s11192-023-04821-3129:11(7329-7358)Online publication date: 1-Nov-2024
  • (2024)Link prediction for hypothesis generation: an active curriculum learning infused temporal graph-based approachArtificial Intelligence Review10.1007/s10462-024-10885-157:9Online publication date: 12-Aug-2024
  • (2024)The Effect of Knowledge Graph Schema on Classifying Future Research SuggestionsNatural Scientific Language Processing and Research Knowledge Graphs10.1007/978-3-031-65794-8_10(149-170)Online publication date: 26-May-2024
  • (2023)Artificial Intelligence Method for the Analysis of Marketing Scientific LiteraturePhilosophy of Artificial Intelligence and Its Place in Society10.4018/978-1-6684-9591-9.ch008(142-159)Online publication date: 30-Jun-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media