Binary LNS-based naïve Bayes inference engine for spam control: noise analysis and FPGA implementation

M.N. Marsono; M. Watheq El-Kharashi; F. Gebali

Binary LNS-based naïve Bayes inference engine for spam control: noise analysis and FPGA implementation

Access Full Text

Binary LNS-based naïve Bayes inference engine for spam control: noise analysis and FPGA implementation

Author(s): M.N. Marsono ; M. Watheq El-Kharashi ; F. Gebali
DOI: 10.1049/iet-cdt:20050180

For access to this article, please select a purchase option:

Buy article PDF

Buy Knowledge Pack

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership

Recommend Title Publication to library

IET Computers & Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Author(s): M.N. Marsono ¹ ; M. Watheq El-Kharashi ² ; F. Gebali ¹
- Affiliations: 1: Department of Electrical and Computer Engineering, University of Victoria, Victoria, Canada
  2: Department of Computer and Systems Engineering, Ain Shams University, Cairo, Egypt
Source: Volume 2, Issue 1, January 2008, p. 56 – 62
DOI: 10.1049/iet-cdt:20050180 , Print ISSN 1751-8601, Online ISSN 1751-861X

Published

A hardware architecture for naïve Bayes inference engine to classify e-mail contents for spam control is proposed. The inference engine utilises the logarithmic number system (LNS) to simplify naïve Bayes computations. For high throughput LNS recoding, a non-iterative binary LNS recoding hardware architecture that uses look-up table approach is proposed. A noise model for the inference engine was developed and the noise bounds were analysed to determine the inference accuracy. The inference engine design is synthesised targeting the Altera Stratix field programmable gate array (FPGA) device. From the synthesis results, the binary LNS naïve Bayes inference engine was found to have the capability to classify more than 117 million features per second, given a stream of a priori and likelihood probabilities as input with small computation noise. The synthesised inference engine was functionally verified against a MATLAB implementation.

References

1. 1)
  - P. Graham . A plan for spam.
2. 2)
  - Marsono, M.N., El-Kharashi, M.W., Gebali, F., Ganti, S.: `A distributed e-mail classification for spam control', Proc. 2006 Canadian Conf. Electrical and Computer Engineering (CCECE 2006), May 2006, Ottawa, Canada, p. 438–441.
3. 3)
  - A. Jesdanun . As spam filters improve, attention shifts to containment.
4. 4)
  - G. Varghese . (2005) Network algorithmics: an interdisiplinary approach to designing fast networked devices.
5. 5)
  - Yang, Y., Pedersen, J.O.: `A comparative study on feature selection in text categorization', Proc. ICML-97, the 14th Int. Conf. Machine Learning, July 1997, Nashville, USA, p. 412–420.
6. 6)
  - BETSY, http://edres.org/betsy/, accessed August 2007.
7. 7)
  - ‘The real threat of spam’, http://whitepapers.silicon.com/0,39024759,60131105p-39000647q,00.htm, accessed August 2007.
8. 8)
  - ‘US spam crackdown shows mixed results’, http://news.zdnet.co.uk/internet/0,39020369,39229171,00.htm, accessed August 2007.
9. 9)
  - Rabin, M.O.: `Fingerprinting by random polynomials', Technical Report TR-15-81, 1981, Department of Computer Science, Harvard University.
10. 10)
  - F. Elguibaly . α-CORDIC: an adaptive CORDIC algorithm. Can. J. Electrical Comput. Eng. , 3 , 133 - 138
11. 11)
  - F. Elguibaly , N.-T. Sui , A. Rayhan . HCORDIC: a high-radix adaptive CORDIC algorithm. Can. J. Electrical Comput. Eng. , 3 , 149 - 154
12. 12)
  - Yerazunis, W.: `The spam filtering plateau at 99.9% accuracy and how to get past it', Presented in MIT Spam Conf., 16 January 2004, Cambridge, USA, http://www.merl.com/reports/docs/TR2004-091.pdf, accessed August 2007.
13. 13)
  - Androutsopolous, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropolous, C.D.: `An evaluation of naïve Bayesian anti-spam filtering', Proc. Workshop on Machine Learning in the New Information Age, 11th European Conf. Machine Learning, May 2000, Barcelona, Spain, p. 9–17.
14. 14)
  - Yerazunis, W., Chhabra, S., Siefkes, C., Assis, F., Gunopulos, D.: `A unified model of spam filtration', Presented in MIT Spam Conf., January 2005, Cambridge, USA, http://www.cs.ucr.edu/~schhabra/UnifiedFilters.pdf, accessed August 2007.
15. 15)
  - SpamAssassin Corpus, http://spamassassin.apache.org/publiccorpus/, accessed August 2007.
16. 16)
  - Cho, Y.H., Nahab, S., Mangione-Smith, W.: `Specialized hardware for deep network packet filtering', Proc. 12th Int. Conf. Field Programmable Logic and Applications (FPL), September 2002, Montpellier, France, 2438, p. 452–461, Lecture Notes in Computer Science: Field-Programmable Logic and Applications.
17. 17)
  - Altera Stratix Device Handbook, http://www.altera.com/literature/hb/stx/stratix_handbook.pdf, accessed August 2007.
18. 18)
  - O'Brien, C., Vogel, C.: `Spam filters: Bayes vs. chi-squared; letters vs. words', Proc. 1st Int. Symp. on Information and Communication Technologies (ISICT), September 2003, Dublin, Ireland, p. 291–296.
19. 19)
  - Siefkes, C., Assis, F., Chhabra, S., Yerazunis, W.S.: `Combining Winnow and orthogonal sparse bigrams for incremental spam filtering', Proc. 15th European Conf. Machine Learning and 8th European Conf. Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), September 2004, Pisa, Italy, p. 410–421.
20. 20)
  - ‘The spam economy: the convergent spam and virus threats’, http://www.sophos.com/spaminfo/whitepapers/Sophos_spam-economy_wpus, accessed August 2007.
21. 21)
  - L. Rudner , T. Liang . Automated essay scoring using Bayes theorem. J. Technol. Learn. Assess. , 2
22. 22)
  - P. Resnick . RFC2822: internet message format.
23. 23)
  - Espion Interceptor, http://www.espionintl.com, accessed March 2007.
24. 24)
  - J. Goodman , D. Heckerman , R. Rounthwaite . (2005) Stopping spam.
25. 25)
  - S. Dharmapurikar , P. Krishnamurthy , D.E. Taylor . Longest prefix matching using bloom filters. IEEE/ACM Trans. Netw. , 2 , 397 - 409
26. 26)
  - Wan, Y., Wey, C.-L.: `Efficient algorithms for binary logarithmic conversion and addition', Proc. 1998 IEEE Int. Symp. on Circuits and Systems, May 1998, Monterey, USA, p. 233–236.
27. 27)
  - ‘Bayesian filtering example: using Bayes's formula to keep spam out of your inbox’, http://www.process.com/precisemail/BayesianExample.pdfa, accessed change all marked to August 2007.
28. 28)
  - ‘Common tricks employed by spammers’, http://www.process.com/precisemail/SpamTricks.pdf, accessed August 2007.
29. 29)
  - Sourdis, I., Pnevmatikatos, D.: `Fast, large-scale string match for a 10 Gbps FPGA-based network intrusion detection system', Proc. 13th Int. Conf. Field Programmable Logic and Applications (FPL), September 2003, Lisbon, Portugal, 2778, p. 880–889, Lecture Notes in Computer Science: Field-Programmable Logic and Applications.
30. 30)
  - ‘Why Bayesian filtering is the most effective anti-spam technology’, http://www.gfi.com/whitepapers/why-bayesian-filtering.pdf, accessed August 2007.
31. 31)
  - Elkan, C.: `Boosting and naïve Bayesian learning', Technical Report CS97-557, UCSD, September 1997, http://www-cse.ucsd.edu/users/elkan/papers/bnb.ps, accessed August 2007.
32. 32)
  - WatchGuard Firebox, http://www.watchguard.com, accessed March 2007.
33. 33)
  - Gebali, F., El-Kharashi, M.W.: `ERL: an algorithm for fast evaluation of exponential, reciprocal, and logarithmic functions', Proc. 2004 Int. Conf. Electrical, Electronic and Computer Engineering, September 2004, Cairo, Egypt, p. 269–272.
34. 34)
  - ‘Intel 64 and IA-32 architectures software developers manual: basic architecture’, vol. 3, 2006. http://developer.intel.com/design/processor/manuals/253665.pdf, accessed August 2007.
35. 35)
  - Matousek, R., Tichý, M., Pohl, Z., Kadlec, J., Softley, C., Coleman, N.: `Logarithmic number system and floating-point arithmetics on FPGA', Proc. 12th Int. Conf. on Field Programmable Logic and Application, September 2002, Montpellier, France, 2438, p. 627–636, Lecture Notes in Computer Science: Field-Programmable Logic and Applications.
36. 36)
  - Marsono, M.N., El-Kharashi, M.W., Gebali, F.: `Performance analysis of server-side spam control strategies based on layer-3 classification', IEEE Canadian Conf. Electrical and Computer Engineering (CCECE 2007), April 2007, Vancouver, Canada, p. 349–352.
37. 37)
  - K. Skahill . (1996) VHDL for Programmable Logic.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Binary LNS-based naïve Bayes inference engine for spam control: noise analysis and FPGA implementation

Binary LNS-based naïve Bayes inference engine for spam control: noise analysis and FPGA implementation

Buy article PDF

Buy Knowledge Pack

Thank you

References

Related content