Skip to main content

Mining Hidden Knowledge from the Counterterrorism Dataset Using Graph-Based Approach

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9612))

Abstract

Information overloaded is now a matter of fact. These enormous stack of information poses huge potential to discover previously uncharted knowledge. In this paper, we propose a graph based approach integrated with statistical correlation measure to discover latent but valuable information buried under huge corpora. For given two concepts, \(C_i\) and \(C_j\) (e.g. bush and bin ladin), we find the best set of intermediate concepts interlinking them by gleaning across multiple documents. We perform query enrichment on input concepts using Longest Common Substring (LCSubstr) algorithm to enhance the level of granularity. Moreover, we use Kulczynski correlation measure to determine the strength of interdependence between concepts and demote associations with relatively meager statistical significance. Finally, we present our users with ranked paths, along with sentence level evidence to facilitate better interpretation of underlying context. Counterterrorism dataset is used to demonstrate the effectiveness and applicability of our technique.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.nlm.nih.gov/pubs/factsheets/medline.html.

  2. 2.

    https://www.nlm.nih.gov/mesh.

  3. 3.

    http://www.opencalais.com/opencalais-demo/.

  4. 4.

    http://neo4j.com.

  5. 5.

    http://en.wikipedia.org/wiki/Longest_common_substring_problem.

  6. 6.

    http://govinfo.library.unt.edu/911/report/911Report.pdf.

References

  1. Davies, R.: The creation of new knowledge by information retrieval and classification. J. Documentation 45(4), 273–301 (1989)

    Article  Google Scholar 

  2. Swanson, D.R., Smalheiser, N.R.: Implicit text linkage between medline records; using arrowsmith as an aid to scientific discovery. Libr. Trends 48, 48–59 (1999)

    Google Scholar 

  3. Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures. Artif. Intell. 91, 183–203 (1997)

    Article  MATH  Google Scholar 

  4. Jin, W., Srihari, R.K.: Knowledge discovery across documents through concept chain queries. In: Proceedings of the 6th IEEE International Conference on Data Mining Workshop on Foundation of Data Mining and Novel Techniques in High Dimensional Structural and Unstructred Data, pp. 448–452 (2006)

    Google Scholar 

  5. Ben-Dov, M., Wu, W., Feldman, R., Cairns, P.A., House, R.: Improving knowledge discovery by combining text-mining and link analysis techniques. In: Proceedings of the SIAM International Conference on Data Mining (2004)

    Google Scholar 

  6. Swanson, D.: Fish oil, raynauds syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986)

    Article  Google Scholar 

  7. Weeber, M., Klein, H., Berg, L., Vos, R.: Using concepts in literature-based discovery: simulating swansons raynaud-fish oil and migraine-magnesium discoveries. J. Am. Soc. Inf. Sci. 52(7), 548–557 (2001)

    Article  Google Scholar 

  8. Gordon, M., Dumais, S.: Using latent semantic indexing for literature based discovery. JASIS 49(8), 674–685 (1998)

    Article  Google Scholar 

  9. Lindsay, R., Gordon, M.: Literature-based discovery by lexical statistics. JASIS 50(7), 574–587 (1999)

    Article  Google Scholar 

  10. Gordon, M., Lindsay, R.: Toward discovery support systems: a replication, re-examination, and extension of swansons work on literature based discovery of a connection between raynauds and fish oil. JASIS 47(2), 116–128 (1996)

    Article  Google Scholar 

  11. Srinivasan, P.: Text mining: generating hypotheses from medline. JASIS 55(5), 396–413 (2004)

    Article  Google Scholar 

  12. Yetisgen-Yildiz, M., Pratt, W.: Using statistical and knowledge-based approaches for literature-based discovery. J. Biomed. Inf. 39(6), 600–611 (2006)

    Article  Google Scholar 

  13. Hu, X., Zhang, X., Yoo, I., Wang, X., Feng, J.: Mining hidden connections among biomedical concepts from disjoint biomedical literature sets through semantic-based association rule. Int. J. Intell. Syst. 25, 207–223 (2010)

    Google Scholar 

  14. Hu, X., Zhang, X., Yoo, I., Zhang, Y.: A semantic approach for mining hidden links from complementary and non-interactive biomedical literature. In: SDM, pp. 200–209 (2006)

    Google Scholar 

  15. Srihari, R., Lamkhede, S., Bhasin, A.: Unapparent information revelation: a concept chain graph approach. In: Proceedings of the ACM Conference on Information and Knowledge Management, pp. 200–209 (2005a)

    Google Scholar 

  16. Srihari, R.K., Li, W., Niu, C., Cornell, T.: infoxtract: a customizable intermediate level information extraction engine. J. Nat. Lang. Eng. 14(01), 33–69 (2008)

    Google Scholar 

  17. Jin, W., Srihari, R.K., Ho, H.H.: A text mining model for hypothesis generation. In: 19th IEEE International Conference on Tools with Artificial Intelligence 2007, ICTAI 2007, vol. 2, pp. 156–162. IEEE (2007)

    Google Scholar 

Download references

Acknowledgments

This work was supported by National Science Foundation grant IIS-1452898.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kishlay Jha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Jha, K., Jin, W. (2016). Mining Hidden Knowledge from the Counterterrorism Dataset Using Graph-Based Approach. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2016. Lecture Notes in Computer Science(), vol 9612. Springer, Cham. https://doi.org/10.1007/978-3-319-41754-7_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41754-7_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41753-0

  • Online ISBN: 978-3-319-41754-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics