Skip to main content

Finding and Rating Personal Names on Drives for Forensic Needs

  • Conference paper
  • First Online:

Abstract

Personal names found on drives provide forensically valuable information about users of systems. This work reports on the design and engineering of tools to mine them from disk images, bootstrapping on output of the Bulk Extractor tool. However, most potential names found are either uninteresting sales and help contacts or are not being used as names, so we developed methods to rate name-candidate value by an analysis of the clues that they and their context provide. We used an empirically based approach with statistics from a large corpus from which we extracted 303 million email addresses and 74 million phone numbers, and then found 302 million personal names. We tested three machine-learning approaches and Naïve Bayes performed the best. Cross-modal clues from nearby email addresses improved performance still further. This approach eliminated from consideration 71.3% of the addresses found in our corpus with an estimated 67.4% F-score, a potential 3.5 times reduction in the name workload of most forensic investigations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bikel, D., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: 5th Conference on Applied Natural Language Processing, Washington DC, US, March, pp. 194–201 (1997)

    Google Scholar 

  2. Bulk Extractor 1.5: Digital Corpora: Bulk Extractor [Software] (2013). http://digitalcorpora.org/downloads/bulk_extractor. Accessed 6 Feb 2015

  3. Fan, X., Wang, J., Pu, X., Zhou, L., Bing, L.: On graph-based name disambiguation. ACM J. Data Inf. Qual. 2(2), Article No. 10 (2011)

    Google Scholar 

  4. Garfinkel, S.: Forensic feature extraction and cross-drive analysis. Digit. Invest. 3S(September), S71–S81 (2006)

    Article  Google Scholar 

  5. Garfinkel, S.: The prevalence of encoded digital trace evidence in the nonfile space of computer media. J. Forensic Sci. 59(5), 1386–1393 (2014)

    Article  Google Scholar 

  6. Garfinkel, S., Farrell, P., Roussev, V., Dinolt, G.: Bringing science to digital forensics with standardized forensic corpora. Digit. Invest. 6(August), S2–S11 (2009)

    Article  Google Scholar 

  7. Gross, B., Churchill, E.: Addressing constraints: multiple usernames, task spillage, and notions of identity. In: Conference on Human Factors in Computing Systems, San Jose, CA, US, April–May, pp. 2393–2398 (2007)

    Google Scholar 

  8. Henseler, H., Hofste, J., van Keulen, M.: Digital-forensics based pattern recognition for discovering identities in electronic evidence. In: European Conference on Intelligence and Security Informatics, August (2013)

    Google Scholar 

  9. Lee, S., Shishibori, M., Ando, K.: E-mail clustering based on profile and multi-attribute values. In: Sixth International Conference on Language Processing and Web Information Technology, Luoyang, China, August, pp. 3–8 (2007)

    Google Scholar 

  10. McCalley, H., Wardman, B., Warner, G.: Analysis of back-doored phishing kits. In: Peterson, G., Shenoi, S. (eds.) DigitalForensics 2011. IAICT, vol. 361, pp. 155–168. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24212-0_12

    Chapter  Google Scholar 

  11. Paglierani, J., Mabey, M., Ahn, G.-J.: Towards comprehensive and collaborative forensics on email evidence. In: 9th IEEE Conference on Collaborative Computing: Networking, Applications, and Worksharing, pp. 11–20 (2013)

    Google Scholar 

  12. Petkova, D., Croft, W.: Proximity-based document representation for named entity retrieval. In: 16th ACM Conference on Information and Knowledge Management, Lisbon, PT, November, pp. 731–740 (2007)

    Google Scholar 

  13. Rowe, N., Schwamm, R., Garfinkel, S.: Language translation for file paths. Digital Invest. 10S(August), S78–S86 (2016)

    Google Scholar 

  14. Rowe, N., Schwamm, R., McCarrin, M., Gera, R.: Making sense of email addresses on drives. J. Digit. Forensics Secur. Law 11(2), 153–173 (2016)

    Google Scholar 

  15. Yang, M., Chow, K.-P.: An information extraction framework for digital forensic investigations. In: Peterson, G., Shenoi, S. (eds.) DigitalForensics 2015. IAICT, vol. 462, pp. 61–76. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24123-4_4

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the U.S. Navy under the Naval Research Program and is covered by an IRB protocol. The views expressed are those of the author and do not represent the U.S. Government. Daniel Gomez started the implementation, and Janina Green provided images of project-team drives.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neil C. Rowe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rowe, N.C. (2018). Finding and Rating Personal Names on Drives for Forensic Needs. In: Matoušek, P., Schmiedecker, M. (eds) Digital Forensics and Cyber Crime. ICDF2C 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 216. Springer, Cham. https://doi.org/10.1007/978-3-319-73697-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73697-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73696-9

  • Online ISBN: 978-3-319-73697-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics