Skip to main content

E-Mail Authorship Attribution for Computer Forensics

  • Chapter
Applications of Data Mining in Computer Security

Part of the book series: Advances in Information Security ((ADIS,volume 6))

Abstract

In this chapter, we briefly overview the relatively new discipline of computer forensics and describe an investigation of forensic authorship attribution or identification undertaken on a corpus of multi-author and multi-topic e-mail documents. We use an extended set of e-mail document features such as structural characteristics and linguistic patterns together with a Support Vector Machine as the learning algorithm. Experiments on a number of e-mail documents generated by different authors on a set of topics gave promising results for multi-topic and multi-author categorisation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Computer Security Institute (2001). “2001 CSI/FBI Computer Crime and Security Survey”, Computer Security Issues & Trends.

    Google Scholar 

  • Salton G., and McGill M. (1983).Introduction to Modern Information FilteringMcGraw-Hill, New York.

    Google Scholar 

  • Apte C., Damerau F., and Weiss S. (1998). “Text mining with decision rules and decision trees”, Workshop on Learning from text and the Web, Conference on Automated Learning and Discovery.

    Google Scholar 

  • Ng H., Goh W., and Low K. (1997). “Feature selection, perceptron learning, and a usability case study for text categorization”, Proc. 20th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR97), pp.67–73.

    Google Scholar 

  • Yang Y., and Liu X. (1999). “A re-examination of text categorisation methods”, Proc. 22nd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR99), pp.67–73.

    Google Scholar 

  • Joachims T. (1998). “Text categorization with support vector machines: Learning with many relevant features”, Proc. European Conf. Machine Learning (ECML’98), pp.137–142.

    Google Scholar 

  • de Vel O. (1999). “Evaluation of Text Document Categorisation Techniques for Computer Forensics”, Journal of Computer Security, (submitted).

    Google Scholar 

  • Cohen W. (1996). “Learning rules that classify e-mail”, Proc. Machine Learning in Information Access: AAAI Spring Symposium (SS-96–05), pp.18–25.

    Google Scholar 

  • Sahami M., Dumais S., Heckerman D. and Horvitz E., “A Bayesian approach to filtering junk e-mail”, Learning for Text Categorization Workshop: 15th National Conf. on AI. AAAI Technical Report WS-98–05, pp.55–62.

    Google Scholar 

  • Mitchell T. (1997).Machine LearningMcGraw-Hill, New York.

    MATH  Google Scholar 

  • Gray A., Sallis P., and MacDonell S. (1997). “Software forensics: Extending authorship analysis techniques to computer programs”, Proc. 3rdBiannual Conf. Int. Assoc. of Forensic Linguists (IAFL’97), pp.1–8.

    Google Scholar 

  • Thomson R., and Murachver T. (2001). “Predicting gender from electronic discourse”, British Journal of Social Psychology, pp.193–208.

    Google Scholar 

  • Mosteller F., and Wallace D. (1964).Inference and Disputed Authorship: The FederalistAddison-Wesley, Reading, Mass.

    Google Scholar 

  • Bosch R., and Smith J. (1998). “Separating hyperplanes and the authorship of the disputed federalist papers”, American Mathematical Monthly, 105, pp.601–608.

    Article  MathSciNet  MATH  Google Scholar 

  • Elliot W., and Valenza R. (1991). “Was the Earl of Oxford the true Shakespeare?”, Notes and Queries38pp.501–506.

    Google Scholar 

  • Crain C. (1998). “The Bard’s fingerprints”, Lingua Franca, pp.29–39.

    Google Scholar 

  • Chaski C. (1998). “A Daubert-inspired assessment of current techniques for language-based author identification”, US National Institute of Justice, available through www.ncjrs.org.

    Google Scholar 

  • Chaski C. (2001). “Empirical evaluations of language-based author identification techniques”, Forensic Linguistics, to appear.

    Google Scholar 

  • Rudman J. (1997). “The state of authorship attribution studies: Some problems and solutions”, Computers and the Humanities, 31, pp.351–365.

    Article  Google Scholar 

  • Tweedie F., and Baayen R. (1998). “How variable may a constant be? Measure of lexical richness in perspective”, Computers and the Humanities32pp.323–352.

    Article  Google Scholar 

  • Farringdon J. (1996).Analysing for Authorship: A Guide to the Cusum TechniqueUniversity of Wales Press, Cardiff.

    Google Scholar 

  • Thisted B., and Efron R. (1987). “Did Shakespeare write a newly disovered poem?”, Biometrika, pp.445–455.

    Google Scholar 

  • Lowe D., and Matthews R. (1995). “Shakespeare vs Fletcher: A stylometric analysis by radial basis functions”, Computers and the Humanities, pp.449–461.

    Google Scholar 

  • Tweedie F., Singh S., and Holmes D. (1996). “Neural network applications in stylometry: The Federalist papers”, Computers and the Humanities30pp.1–10.

    Article  Google Scholar 

  • Waugh S., Adams A., and Tweedie F. (2000). “Computational stylistics using artificial neural networks”, Literary and Linguistic Computing15pp.187–198.

    Article  Google Scholar 

  • Holmes D., Forsyth R. (1995). “The Federalist revisited: New directions in authorship attribution”, Literary and Linguistic Computing, pp.111–127.

    Google Scholar 

  • Khmelev D. (2000). “Disputed authorship resolution using relative entropy for Markov chain of letters in a text”, Proc. 4th Conference Int. Quantitative Linguistics Association, R. Baayen (Ed.), Prague.

    Google Scholar 

  • Spafford E., and Weeber S. (1993). “Software forensics: tracking code to its authors”, Computers and Security12pp.585–595.

    Article  Google Scholar 

  • Oman P., and Cook C. (1989). “Programming style authorship analysis”, Proc. 17th Annual ACM Computer Science Conference, pp.320–326.

    Google Scholar 

  • Krsul I., and Spafford E. (1997). “Authorship analysis: Identifying the author of a program”, Computers and Security16p..248–259.

    Article  Google Scholar 

  • Krsul I. (1994). “Authorship analysis: Identifying the author of a program”, Technical Report CSD-TR-94–030, Department of Computer Science, Purdue University.

    Google Scholar 

  • Sallis P., MacDonell S., MacLennan G., Gray A., and Kilgour R. (1997). “Identified: Software authorship analysis with case-based reasoning”, Proc. Addendum Session Int. Conf. Neural Info. Processing and Intelligent Info. Systems, pp.53–56.

    Google Scholar 

  • Foster D. (2000).Author Unknown: On the Trail of AnonymousHenry Holt, New York.

    Google Scholar 

  • de Vel O. (2000). “Mining e-mail authorship”, Proc. Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD’2000), Boston.

    Google Scholar 

  • Anderson A., Corney M., de Vel O. and Mohay G. (2001). “Identifying the Authors of Suspect E-mail”, Computers and Security, (submitted).

    Google Scholar 

  • Vapnik V. (1995).The Nature of Statistical Learning TheorySpringer-Verlag, New York.

    MATH  Google Scholar 

  • Druker H., Wu D. and Vapnik V. (1999). “Support vector machines for spam categorisation”, IEEE Trans. on Neural Networks 10 pp.1048–1054.

    Article  Google Scholar 

  • Teytaud O. and Jalam R. (2001). “Kernel-based text categorization”, International Joint Conference on Neural Networks (IJCNN’2001), Washington DC.

    Google Scholar 

  • Diederich J., Kindermann J., Leopold E. and Paass G. (2000). “Authorship attribution with Support Vector Machines”, Applied Intelligence, (submitted).

    Google Scholar 

  • de Vel O., Anderson A., Corney M., and Mohay G. (2001). “Mining Email Content for Author Identification Forensics”, SIGMOD Record, 30(4)

    Google Scholar 

  • SVMLight (2001). Support Vector Machine software, University of Dortmund, Germany.

    Google Scholar 

  • Witten I., and Frank E. (2000).Data Mining: Practical Machine Learning Tools and Techniques with Java ImplementationsMorgan Kaufmann, San Francisco.

    Google Scholar 

  • Yang Y. (1999). “An evaluation of statistical approaches to text categorization”, Journal of Information Retrieval, 1, pp.67–88.

    Article  Google Scholar 

  • Friedman J. (1991). “Multivariate adaptive regression splines”, Annals of Statistics, 19, pp.1–141.

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie T., Tibshirani R., and Friedman J. (2001).The Elements of Statistical Learning: Data Mining Inference and PredictionSpringer Series in Statistics, Springer-Verlag, New York, NY.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

de Vel, O., Anderson, A., Corney, M., Mohay, G. (2002). E-Mail Authorship Attribution for Computer Forensics. In: Barbará, D., Jajodia, S. (eds) Applications of Data Mining in Computer Security. Advances in Information Security, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0953-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0953-0_9

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5321-8

  • Online ISBN: 978-1-4615-0953-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics