Abstract
Information retrieval from text or ‘text mining’ is the process of extracting interesting and non-trivial knowledge from unstructured text. With the ever increasing amounts of information stored on the web or archived within a computing system, high performance data processing architectures are required to process this data in real time. The aim of the work presented in this paper is the development of a hardware text mining IP-Core for use in FPGA based systems. In this paper we will describe the pre-processing engine we have developed for the PRESENCE II PCI card, to accelerate the identification of significant words within a document, logging their frequency and position. The performance of this system is then compared to an equivalent software implementation using the Lucene software package.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Freeman, M.J., Weeks, M., Austin, J.: Hardware implementation of Similarity Functions. In: IADIS International Conference on Applied Computing, Algarve, Portugal (2005)
Sholom, M.W., Naval, V.K.: A System for Real-time Competitive Market Intelligence (2002), WWW: http://www.research.ibm.com/dar/papers/pdf/weiss_kdd2002_mi.pdf
Sturgeon, W.: Interview: Mike Lynch, founder of Autonomy on Google, penguins and the future of search (2005), WWW: http://software.silicon.com/applications.0,39024653,39152405,00.html
Cutting, D., et al.: The Lucene search engine (2005), WWW: http://lucene.apache.org
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal (April 1958)
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)
Baeza-Yates, R., Ribiero-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Wang, L., Xiuju, F.: Data mining with computational intelligence. Springer, Heidelberg (2005)
ACAG: AURA - Research into high-performance pattern matching systems (2002), WWW: http://www.cs.york.ac.uk/aura
Cybula (2005), WWW: http://www.cybula.com
Chowdhury, D.R., Gupta, I.S., Chaudhuri, P.P.: A low cost high capacity associative memory design using cellular automata. IEEE Transactions on computers 44(10), 1260–1264 (1995)
Porter, M.F.: An Algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Freeman, M., Jayasooriya, T. (2006). Hardware Support for Language Aware Information Mining. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893011_53
Download citation
DOI: https://doi.org/10.1007/11893011_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46542-3
Online ISBN: 978-3-540-46544-7
eBook Packages: Computer ScienceComputer Science (R0)