Hardware Support for Language Aware Information Mining

Freeman, Michael; Jayasooriya, Thimal

doi:10.1007/11893011_53

Michael Freeman²¹ &
Thimal Jayasooriya²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4253))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1174 Accesses

Abstract

Information retrieval from text or ‘text mining’ is the process of extracting interesting and non-trivial knowledge from unstructured text. With the ever increasing amounts of information stored on the web or archived within a computing system, high performance data processing architectures are required to process this data in real time. The aim of the work presented in this paper is the development of a hardware text mining IP-Core for use in FPGA based systems. In this paper we will describe the pre-processing engine we have developed for the PRESENCE II PCI card, to accelerate the identification of significant words within a document, logging their frequency and position. The performance of this system is then compared to an equivalent software implementation using the Lucene software package.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Freeman, M.J., Weeks, M., Austin, J.: Hardware implementation of Similarity Functions. In: IADIS International Conference on Applied Computing, Algarve, Portugal (2005)
Google Scholar
Sholom, M.W., Naval, V.K.: A System for Real-time Competitive Market Intelligence (2002), WWW: http://www.research.ibm.com/dar/papers/pdf/weiss_kdd2002_mi.pdf
Sturgeon, W.: Interview: Mike Lynch, founder of Autonomy on Google, penguins and the future of search (2005), WWW: http://software.silicon.com/applications.0,39024653,39152405,00.html
Cutting, D., et al.: The Lucene search engine (2005), WWW: http://lucene.apache.org
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal (April 1958)
Google Scholar
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)
Google Scholar
Baeza-Yates, R., Ribiero-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Google Scholar
Wang, L., Xiuju, F.: Data mining with computational intelligence. Springer, Heidelberg (2005)
MATH Google Scholar
ACAG: AURA - Research into high-performance pattern matching systems (2002), WWW: http://www.cs.york.ac.uk/aura
Cybula (2005), WWW: http://www.cybula.com
Chowdhury, D.R., Gupta, I.S., Chaudhuri, P.P.: A low cost high capacity associative memory design using cellular automata. IEEE Transactions on computers 44(10), 1260–1264 (1995)
Article MATH Google Scholar
Porter, M.F.: An Algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of York, UK
Michael Freeman & Thimal Jayasooriya

Authors

Michael Freeman
View author publications
You can also search for this author in PubMed Google Scholar
Thimal Jayasooriya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Design, Engineering and Computing, Bournemouth University, UK
Bogdan Gabrys
Centre for SMART Systems, School of Environment and Technology, University of Brighton, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, 5095, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Freeman, M., Jayasooriya, T. (2006). Hardware Support for Language Aware Information Mining. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893011_53

Download citation

DOI: https://doi.org/10.1007/11893011_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46542-3
Online ISBN: 978-3-540-46544-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics