Abstract
Information overloaded is now a matter of fact. These enormous stack of information poses huge potential to discover previously uncharted knowledge. In this paper, we propose a graph based approach integrated with statistical correlation measure to discover latent but valuable information buried under huge corpora. For given two concepts, \(C_i\) and \(C_j\) (e.g. bush and bin ladin), we find the best set of intermediate concepts interlinking them by gleaning across multiple documents. We perform query enrichment on input concepts using Longest Common Substring (LCSubstr) algorithm to enhance the level of granularity. Moreover, we use Kulczynski correlation measure to determine the strength of interdependence between concepts and demote associations with relatively meager statistical significance. Finally, we present our users with ranked paths, along with sentence level evidence to facilitate better interpretation of underlying context. Counterterrorism dataset is used to demonstrate the effectiveness and applicability of our technique.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Davies, R.: The creation of new knowledge by information retrieval and classification. J. Documentation 45(4), 273–301 (1989)
Swanson, D.R., Smalheiser, N.R.: Implicit text linkage between medline records; using arrowsmith as an aid to scientific discovery. Libr. Trends 48, 48–59 (1999)
Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures. Artif. Intell. 91, 183–203 (1997)
Jin, W., Srihari, R.K.: Knowledge discovery across documents through concept chain queries. In: Proceedings of the 6th IEEE International Conference on Data Mining Workshop on Foundation of Data Mining and Novel Techniques in High Dimensional Structural and Unstructred Data, pp. 448–452 (2006)
Ben-Dov, M., Wu, W., Feldman, R., Cairns, P.A., House, R.: Improving knowledge discovery by combining text-mining and link analysis techniques. In: Proceedings of the SIAM International Conference on Data Mining (2004)
Swanson, D.: Fish oil, raynauds syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986)
Weeber, M., Klein, H., Berg, L., Vos, R.: Using concepts in literature-based discovery: simulating swansons raynaud-fish oil and migraine-magnesium discoveries. J. Am. Soc. Inf. Sci. 52(7), 548–557 (2001)
Gordon, M., Dumais, S.: Using latent semantic indexing for literature based discovery. JASIS 49(8), 674–685 (1998)
Lindsay, R., Gordon, M.: Literature-based discovery by lexical statistics. JASIS 50(7), 574–587 (1999)
Gordon, M., Lindsay, R.: Toward discovery support systems: a replication, re-examination, and extension of swansons work on literature based discovery of a connection between raynauds and fish oil. JASIS 47(2), 116–128 (1996)
Srinivasan, P.: Text mining: generating hypotheses from medline. JASIS 55(5), 396–413 (2004)
Yetisgen-Yildiz, M., Pratt, W.: Using statistical and knowledge-based approaches for literature-based discovery. J. Biomed. Inf. 39(6), 600–611 (2006)
Hu, X., Zhang, X., Yoo, I., Wang, X., Feng, J.: Mining hidden connections among biomedical concepts from disjoint biomedical literature sets through semantic-based association rule. Int. J. Intell. Syst. 25, 207–223 (2010)
Hu, X., Zhang, X., Yoo, I., Zhang, Y.: A semantic approach for mining hidden links from complementary and non-interactive biomedical literature. In: SDM, pp. 200–209 (2006)
Srihari, R., Lamkhede, S., Bhasin, A.: Unapparent information revelation: a concept chain graph approach. In: Proceedings of the ACM Conference on Information and Knowledge Management, pp. 200–209 (2005a)
Srihari, R.K., Li, W., Niu, C., Cornell, T.: infoxtract: a customizable intermediate level information extraction engine. J. Nat. Lang. Eng. 14(01), 33–69 (2008)
Jin, W., Srihari, R.K., Ho, H.H.: A text mining model for hypothesis generation. In: 19th IEEE International Conference on Tools with Artificial Intelligence 2007, ICTAI 2007, vol. 2, pp. 156–162. IEEE (2007)
Acknowledgments
This work was supported by National Science Foundation grant IIS-1452898.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Jha, K., Jin, W. (2016). Mining Hidden Knowledge from the Counterterrorism Dataset Using Graph-Based Approach. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2016. Lecture Notes in Computer Science(), vol 9612. Springer, Cham. https://doi.org/10.1007/978-3-319-41754-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-41754-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41753-0
Online ISBN: 978-3-319-41754-7
eBook Packages: Computer ScienceComputer Science (R0)