Abstract
While we have seen significant success in web search, enterprise search has not yet been widely investigated and as a result the benefits that can otherwise be brought to the enterprise are not fully realized. In this paper, we present an integrated framework for enhancing enterprise search. This framework is based on open source technologies which include Apache Hadoop, Tika, Solr and Lucene. Importantly, the framework also benefits from a Latent Semantic Indexing (LSI) algorithm to improve the quality of search results. LSI is a mathematical model used to discover the semantic relationship patterns in a documents collection. We envisage that the proposed framework will benefit various enterprises, improving their productivity by meeting information needs effectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mangold, C., Schwarz, H., Mitschang, B.: u38: A Framework for Database-Supported Enterprise Document-Retrieval. In: 10th International Database Engineering and Applications Symposium (IDEAS 2006), IEEE, Los Alamitos (2006)
Hawking, D.: Challanges in Entrerprise Search. In: 5th Australasian Database Conference (ADC 2004), Dunedin, NZ, Conferences in Research and Practice in Information Technology, vol. 27 (2004)
Feldman, S.: Sherman. C.:The cost of not finding Information. IDC (2003)
Dmitriev, P., Serdyukov, P., Chernov, S.: Enterprise and desktop search. In: WWW 2010, pp. 1345–1346 (2010)
Owens, L.: The Forrester WaveTM: Enterprise Search, Q2 (2008)
Dmitriev, P., Eiron, N., Fontoura, M., Shekita, E.: Using Annotations in Enterprise Search. In: WWW 2006. ACM, Edinburgh (2006)
Zhu, H., Raghavan, S., Vaithyanathan, S., Löser, N.A.: The intranet with high precision. In: 16th international conference on World Wide Web, pp. 491–500 (2007)
Li, H., Cao, Y., Xu, J., Hu, Y., Li, S., Meyerzon, D.: A new approach to intranet search based on information extraction. In: 14th ACM International Conference on Information and Knowledge Management, pp. 460–468 (2005)
Xue, G., Zeng, H., Chen, Z., Zhang, H., Lu, C.: Implicit link analysis for small web search. In: 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 56–63 (2003)
Fisher, M., Sheth, A.: Semantic Enterprise Content Management. Practical Handbook of Internet Computing (2004)
Demartini, G.: Leveraging Semantic echnologies for Enterprise Search. In: PIKM 2007. ACM, Lisboa (2007) 978-1-59593-832-9/07/001
Mukherjee, R., Mao. J.: Enterprise search: tough stuff. Qeue 2 (2004)
Telcordia Technologies, http://lsi.research.telcordia.com
Berry, W., Dumais, T., Brien, W.: Using Linear Algebra for Intelligent Information Retrieval. SIAM Review 37(4), 573–595 (1994/1995)
Brand, M.: Fast Low-Rank Modifications of the Thin Singular Value Decomposition. Linear Algebra and Its Applications 415, 20–30 (2006)
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. J.of the Society for Information Science 41(6) (1990)
Chen, C., Stoffel, N., Post, M., Basu, C., Bassu, D., Behrens, C.: Telcordia LSI Engine: Implementation and Scalability Issues. In: 11th Int. Workshop on Research Issues in Data Engineering (RIDE 2001): Document Management for Data Intensive Business and Scientific Applications, Heidelberg (2001)
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. J. of the Society for Information Science 41(6) (1990)
Landauer, T.: Learning Human-like Knowledge by Singular Value Decomposition: A Progress Report, pp. 45–51. MIT Press, Cambridge (1998)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations For Text Categorization. In: ACM-CIKM 1998, Maryland (1998)
Zukas, A., Price, R.J.: Document Categorization Using Latent Semantic Indexing. White Paper, Content Analyst Company, LLC (2003)
Homayouni, R., Heinrich, K., Wei, L., Berry, W.: Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts. Bioinformatics 21, 104–115 (2004)
Ding, C.: A Similarity-based Probability Model for Latent Semantic Indexing. In: 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, California, pp. 59–65 (1999)
Bartell, B., Cottrell, G., Belew, R.: Latent Semantic Indexing is an Optimal Special Case of Multidimensional Scaling. In: Proceedings, ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 161–167 (1992)
Fagin, R., Kumar, R., McCurley, K., Novak, J., Sivakumar, D., Tomlin, J., Williamson, D.: Searching the workplace web. In: 12th World Wide Web Conference, Budapest (2003) 1581136803/03/0005
McCandless, M., Hatcher, E., Mccandless, M.: Lucene in Action. Manning Publications (2009)
Smiley, D., Pugh, E.: Solr 1.4 Enterprise Search Server. Packt Publishing (2009)
Apache Hadoop, http://hadoop.apache.org/
Apache Lucene, http://lucene.apache.org/solr/
Apache Tika, http://tika.apache.org/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alhabashneh, O., Iqbal, R., Shah, N., Amin, S., James, A. (2011). Towards the Development of an Integrated Framework for Enhancing Enterprise Search Using Latent Semantic Indexing. In: Andrews, S., Polovina, S., Hill, R., Akhgar, B. (eds) Conceptual Structures for Discovering Knowledge. ICCS 2011. Lecture Notes in Computer Science(), vol 6828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22688-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-22688-5_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22687-8
Online ISBN: 978-3-642-22688-5
eBook Packages: Computer ScienceComputer Science (R0)