Skip to main content

An Overview of Information Discovery Using Latent Semantic Indexing

  • Conference paper
  • First Online:
Advanced Computational Methods for Knowledge Engineering (ICCSAMA 2017)

Abstract

In recent years there has been a dramatic increase in the size of information collections of importance. At the same time, there has been a growing interest in extracting as much useful information as possible from such collections. These trends place significant demands on modern information retrieval systems. In particular there is a great need for tools that can support discovery of new and useful information. The technique of latent semantic indexing (LSI) has a number of attributes that make it particularly well-adapted to information discovery applications. This paper provides an overview of LSI-based techniques that have been successfully employed in facilitating discovery in practical applications. The techniques range from user aids to state-of-the-art discovery methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A review of major online sources (Science Direct, Springer Link, Google Scholar, and the digital libraries of the IEEE and ACM) indicates that publications regarding biomedical applications of LSI were negligible prior to 2000, grew linearly between then and 2007, surged in 2008, and have been growing at a rate of 10–15% per year since then.

    .

  2. 2.

    These figures are from commercial and government applications worked on by the author between 2005 and 2013. These applications primarily involved conceptual retrieval, text clustering, and/or document categorization tasks. Most had at least some focus on new information discovery.

  3. 3.

    Subsequent to the timeframe covered by the news articles used in the experiment, the Salafist Group for Preaching and Combat (GSPC), changed its’ name to Al Qaida in the Islamic Maghreb (AQIM).

  4. 4.

    For example, Person A – Person B – Organization C - Telephone Number D – Person E – fraud.

References

  1. Sadeh, T.: From search to discovery. In: World Library and Information Congress, Singapore (2013)

    Google Scholar 

  2. Bellegarda, J.: Latent semantic mapping. IEEE Sig. Process. Mag. 22(5), 70–80 (2005)

    Google Scholar 

  3. Bradford, R.: Techniques for processing LSI queries incorporating phrases. In: 6th International Conference, IC3 K. CCIS, Rome, vol. 553, pp. 99–117. Springer (2014)

    Google Scholar 

  4. Furnas, G., et al.: Information retrieval using a singular value decomposition model of latent semantic structure. In: 11th SIGIR, Grenoble, France, pp. 465–480 (1988)

    Google Scholar 

  5. Bradford, R.: Comparability of LSI and human judgment in text analysis tasks. In: Applied Computing Conference, Athens, Greece, pp. 359–366 (2009)

    Google Scholar 

  6. Michel, K.: Personal communication, 14 April 2017

    Google Scholar 

  7. Oard, W., Webber, W.: Information retrieval for e-discovery. Found. Trends Inf. Retrieval 7(2–3), 99–237 (2013)

    Article  Google Scholar 

  8. McArthur, R., Bruza, P.: Discovery of implicit and explicit connections between people using email utterance. In: 8th European Conference on CSCW, pp. 21–40 (2003)

    Google Scholar 

  9. Skillicorn, D.: Detecting anomalies in graphs. Technical report # 2007-529, Queen’s University, Ontario, Canada (2007)

    Google Scholar 

  10. Fortuna, B., Mladenič, D., Grobelnik, M.: Semi-automatic construction of topic ontologies. In: Semantics, Web and Mining. LNCS, vol. 4289, pp. 121–131. Springer, Heidelberg (2006)

    Google Scholar 

  11. Louwerse, M., Zwaan, R.: Language encodes geographical information. Cogn. Sci. 33, 51–73 (2009)

    Article  Google Scholar 

  12. Lia, W., Goodchild, M., Raskinc, R.: Towards geospatial semantic search: exploiting latent semantic relations in geospatial data. Int. J. Digital Earth 7(1), 17–37 (2014)

    Article  Google Scholar 

  13. Fu, K., Cagan, J., Kotovsky, K.: A methodology for discovering structure in design data-bases. In: International Conference on Engineering Design, Denmark, vol. 6 (2011)

    Google Scholar 

  14. Vockner, B., Richter, A., Mittlböck, M.: From geoportals to geographic knowledge portals. Int. J. Geo-Inf. 2(2), 256–275 (2013)

    Article  Google Scholar 

  15. de Boer, R., Vliet, H.: Architectural knowledge discovery with latent semantic analysis: constructing a reading guide for software product audits. J. Syst. Softw. 81(9), 1456–1469 (2008)

    Article  Google Scholar 

  16. Kesorn, K.: Multi-modal multi-semantic image retrieval, Ph.D. thesis, University of London (2010)

    Google Scholar 

  17. Chen, X., et al.: A latent semantic indexing based method for solving multiple instance learning problem in region-based image retrieval. In: 7th IEEE ISM, Taiwan (2005)

    Google Scholar 

  18. Jassez, J.-L., et al.: Signature based intrusion detection using latent semantic analysis. In: IJCNN, Hong Kong, pp. 1068–1074 (2008)

    Google Scholar 

  19. Pramanick, S., Rajagopalan, S., van den Berg, E.: Mitigating the insider threat with high-dimensional anomaly detection, AFRL-IF-RS-TR-2004-338, Final report (2004)

    Google Scholar 

  20. Zhu, W., Chen, C.: Storylines: visual exploration and analysis in latent semantic spaces. Comput. Graph. 31(3), 338–349 (2007)

    Article  Google Scholar 

  21. Freitas, A., Curry, E., Handschuh, S.: Towards a distributional semantic web stack. In: 10th International Workshop on Uncertainty Reasoning for the Semantic Web, pp. 49–52 (2014)

    Google Scholar 

  22. Ma, J., Zhang, Y., He, J.: Web services discovery based on latent semantic approach. In: IEEE International Conference on Web Services, Beijing, pp. 740–747 (2008)

    Google Scholar 

  23. Shahriar, H., Haddad, H.: Object injection vulnerability discovery based on latent semantic indexing. In: 31st Annual ACM SAC, Pisa, Italy, pp. 801–807 (2016)

    Google Scholar 

  24. Bhatia, L., Cao, K.: Intelligent polar infrastructure: enabling semantic search in geospatial metadata catalogue to support polar data discovery. Earth Sci. Inform. 8(1), 111–123 (2015)

    Article  Google Scholar 

  25. Hashimoto, T., Kuboyama, T., Chakraborty, B.: Temporal awareness of changes in afflicted people’s needs after the East Japan Great Earthquake. In: IEEE TENCON, pp. 1–6 (2013)

    Google Scholar 

  26. Speer, R., Havasi, C., Liebermen, H.: Analogy space: reducing the dimensionality of common sense knowledge. In: 23rd National Conference on Artificial Intelligence, pp. 548–553 (2008)

    Google Scholar 

  27. Keila, P., Skillicorn, D.: Detecting unusual and deceptive communication in email. Technical Report # 2005-498, Queen’s University, Ontario, Canada (2005)

    Google Scholar 

  28. Rossi, R.: Latent semantic analysis of the languages of life. In: 4th ISICA. CCIS, Huangshi, China. Springer, vol. 51, pp. 128–137 (2009)

    Google Scholar 

  29. Homayouni, R.: Gene clustering by latent semantic indexing of medline abstracts. Bioinformatics 21(1), 104–115 (2005)

    Article  Google Scholar 

  30. Gong, L., Yang, R., Yan, Q., Sun, X.: Prioritization of disease susceptibility genes using LSM/SVD. IEEE Trans. Biomed. Eng. 60(12), 3410–3417 (2013)

    Article  Google Scholar 

  31. Kim, H., Park, H.: Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations using a priori knowledge of gene relationships. In: 1st International Workshop on Text Mining in Bioinformatics, Virginia, pp. 60–67 (2006)

    Google Scholar 

  32. Fukushima, A.: SVD-based anatomy of gene expressions for correlation analysis in Arabi-dopsis thaliana. DNA Res. 15(6), 367–374 (2008)

    Article  Google Scholar 

  33. Vanteru, B., Shaik, J., Teasin, M.: Semantically linking and browsing PubMed abstracts with gene ontology. BMC Genom. 9(Suppl 1), S10 (2008). BIOCOMP 2007

    Article  Google Scholar 

  34. Roy, S., et al.: Latent semantic indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets. BMC Bioinform. 12(Suppl 10), S19 (2011)

    Article  Google Scholar 

  35. Xu, L., et al.: Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts. PLoS ONE 6(4), e18851 (2011)

    Article  Google Scholar 

  36. Wei, L., et al.: Inferring gene regulatory mechanisms from microarray data using latent semantic indexing of MEDLINE abstracts: the role of Rel in Type-I interferon signaling. FASEB J. 20, A929 (2006)

    Google Scholar 

  37. Doong, S., Hong, S-F.: Protein-protein interaction document mining. Advances in Intelligent Systems Research (2006)

    Google Scholar 

  38. Dos Santos, E., et al.: A semantic-based similarity measure for human druggable target proteins. In: The Fifth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO2013), Lisbon, Portugal, March 24–29 (2013)

    Google Scholar 

  39. Bradford, R.: Efficient discovery of new information in large text databases. In: Intelligence and Security Informatics. LNCS, vol. 3495, pp. 374–380. Springer (2005)

    Google Scholar 

  40. Bradford, R.: Use of latent semantic indexing to identify name variants in large data collections. In: IEEE Intelligence and Security Informatics, pp. 27–32 (2013)

    Google Scholar 

  41. Bradford, R.: Relationship discovery in large text collections using latent semantic indexing. In: SIAM Data Mining Conference, Workshop on Link Analysis, Counterterrorism and Security, Bethesda, Maryland (2006)

    Google Scholar 

  42. Kontostathis, A., Pottenger, W.: Mathematical view of latent semantic indexing: tracing term co-occurrences. Technical Report LU-CSE-02-006, Department of Computer Science and Engineering, Lehigh University (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roger Bradford .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Bradford, R. (2018). An Overview of Information Discovery Using Latent Semantic Indexing. In: Le, NT., van Do, T., Nguyen, N., Thi, H. (eds) Advanced Computational Methods for Knowledge Engineering. ICCSAMA 2017. Advances in Intelligent Systems and Computing, vol 629. Springer, Cham. https://doi.org/10.1007/978-3-319-61911-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-61911-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-61910-1

  • Online ISBN: 978-3-319-61911-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics