ABSTRACT
Access to current application and network data is vital to cybersecurity and networking research. Intrusion detection, steganography, traffic camouflaging, traffic classification and modeling all benefit from real-world data. Such data provides training, testing, and evaluation as well as furthers efforts to reach ground truth. Currently available network data--especially data with application-level information--is often outdated and is either private or customized to specific, narrow research needs. The biggest hurdle to obtaining such content-rich data is addressing the huge privacy risks associated with sharing such complex and open-ended data. In this paper we present a data sharing system called Critter-at-Home which addresses these challenges. Critter connects end-users willing to share data with researchers and strikes a balance between privacy risks for a data contributor and utility for a researcher.
- S. Hansell, "AOL removes search data on vast group of web users.," New York Times, August 2006.Google Scholar
- A. Narayanan and V. Shmatikov, "Robust De-anonymization of Large Sparse Datasets," in Proceedings of IEEE Security and Privacy 2008, pp. "111--125", IEEE, May 2008. Google ScholarDigital Library
- L. Sweeney, "Weaving technology and policy together to maintain confidentiality," in "Journal of Law, Medicine and Ethics", vol. 25, pp. 98--110, 1997.Google ScholarCross Ref
- J. Xu, J. Fan, M. H. Ammar, , and S. B. Moon, "Prefix-Preserving IP Address Anonymization: Measurement-Based Security Evaluation and a New Cryptography-Based Scheme," in Proceedings of the IEEE International Conference on Network Protocols, 2002. Google ScholarDigital Library
- Q. Sun, D. R. Simon, Y. Wang, W. Russell, V. N. Padmanabhan, and L. Qiu, "Statistical Identification ofEncrypted Web Browsing Traffic," in Proceedings of the IEEE Symposium on Security and Privacy, 2002. Google ScholarDigital Library
- T. Kohno, A. Broido, and kc Claffy, "Remote Physical Device Fingerprinting," in Proceedings of the IEEE Symposium on Security and Privacy, 2005. Google ScholarDigital Library
- S. Coull, C. Wright, F. Monrose, M. Collins, and M. Reiter, "Playing Devil's Advocate: Inferring Sensitive Information from Anonymized Network Traces," in Proceedings of the Network and Distributed System Security Symposium, February 2007.Google Scholar
- "LANDER: Los Angeles Network Data Exchange and Repository." http://www.isi.edu/ant/lander/.Google Scholar
- R. International, "PREDICT Project Web Page." http://www.predict.org.Google Scholar
- J. Mirkovic, "Privacy-Safe Network Trace Sharing via Secure Queries," in Proceedings of ACM CCS Workshop on Network Data Anonymization, October 2008. Google ScholarDigital Library
- R. Dingledine, N. Mathewson, and P. Syverson, "Tor: the second-generation onion router," in Proceedings of the 13th conference on USENIX Security Symposium - Volume 13, SSYM'04, (Berkeley, CA, USA), pp. 21--21, USENIX Association, 2004. Google ScholarDigital Library
- "Tor Project: Anonymity Online." https://www.torproject.org/.Google Scholar
- "Request for comments: 1928." https://www.ietf.org/rfc/rfc1928.txt.Google Scholar
- S. Gebert, R. Pries, D. Schlosser, and K. Heck, "Internet access traffic measurement and analysis," in Proceedings of the 4th International Conference on Traffic Monitoring and Analysis, TMA'12, (Berlin, Heidelberg), pp. 29--42, Springer-Verlag, 2012. Google ScholarDigital Library
- L. Sweeney, "k-anonymity: a model for protecting privacy," International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, vol. 10, no. 5, pp. 557--570, 2002. Google ScholarDigital Library
- "pyinstaller." http://www.pyinstaller.org/.Google Scholar
- R. Pang, M. Allman, V. Paxson, and J. Lee, "The devil and packet trace anonymization," ACM SIGCOMM Computer Communications Review, vol. 36, no. 1, pp. 29--38, 2006. Google ScholarDigital Library
- R. Pang and V. Paxson, "A High-level Programming Environment for Packet Trace Anonymization and Transformation," in Proceedings of ACM SIGCOMM, 2003. Google ScholarDigital Library
- R. Lippmann, J. Haines, D. Fried, J. Korba, and K. Das, "The 1999 darpa off-line intrusion detection evaluation.," Computer Networks, vol. 34, no. 4, pp. 579--595, 2000. Google ScholarDigital Library
- M. L. Laboratory, "DARPA Intrusion Detection Evaluation." http://www.ll.mit.edu/IST/ideval/.Google Scholar
- M. Mahoney and P. Chan, "An analysis of the 1999 darpa/lincoln laboratory evaluation data for network anomaly detection," in In Proceedings of the Sixth International Symposium on Recent Advances in Intrusion Detection, pp. 220--237, Springer-Verlag, 2003.Google Scholar
- J. McHugh, "Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by lincoln laboratory," ACM Transactions on Information and System Security, vol. 3, no. 4, pp. 262--294, 2000. Google ScholarDigital Library
- C. Thomas, V. Sharma, and N. Balakrishnan, "Usefulness of darpa dataset for intrusion detection system evaluation," in Proceedings of SPIE, vol. 6973, pp. 69730G--69730G-8, Spie, 2008.Google Scholar
- R. Chen, I. E. Akkus, and P. Francis, "Splitx: High-performance private analytics," SIGCOMM Comput. Commun. Rev., vol. 43, pp. 315--326, Aug. 2013. Google ScholarDigital Library
- K. P. N. Puttaswamy, R. Bhagwan, and V. N. Padmanabhan, "Anonygator: Privacy and integrity preserving data aggregation," in Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware, Middleware '10, (Berlin, Heidelberg), pp. 85--106, Springer-Verlag, 2010. Google ScholarDigital Library
- A. Nandi, A. Aghasaryan, and I. Chhabra, "On the use of decentralization to enable privacy in web-scale recommendation services," in Proceedings of the 12th ACM Workshop on Workshop on Privacy in the Electronic Society, WPES '13, (New York, NY, USA), pp. 25--36, ACM, 2013. Google ScholarDigital Library
- "The Internet Traffic Archive." http://ita.ee.lbl.gov/.Google Scholar
- "MAWI Working Group Traffic Archive." http://tracer.csl.sony.co.jp/mawi/.Google Scholar
- CAIDA, "Internet Measurement Data Catalog." http://www.datcat.org/.Google Scholar
- "Cooperative Association for Internet Data Analysis." http://www.caida.org.Google Scholar
- U. of Dartmouth, "CRAWDAD -- a Community Resource for Archiving Wireless Data At Dartmouth." http://crawdad.cs.dartmouth.edu/.Google Scholar
- E. Kenneally and k. Claffy, "Dialing privacy and utility: A proposed data-sharing framework to advance internet research," Security Privacy, IEEE, vol. 8, pp. 31--39, july-aug. 2010. Google ScholarDigital Library
- "Lobster web page." http://www.ist-lobster.org/ publications/deliverables/D1.1a.pdf.Google Scholar
- G. Iannacone, "CoMo: An Open Infrastructure for Network Monitoring-Research Agenda." http://como. intel-research.net/pubs/como.agenda.pdf.Google Scholar
- G. Tech and U. of Napoli Ferderico II, "Project BISmark." http://projectbismark.net/.Google Scholar
- P. Eckersley, "How unique is your web browser?," in Proceedings of the The 10th Privacy Enhancing Technologies Symposium (PETS 2010), (Berlin, Germany), pp. 1--18, Springer-Verlag, July 2010. Google ScholarDigital Library
Index Terms
- Critter: Content-Rich Traffic Trace Repository
Recommendations
Short paper: the NetSANI framework for analysis and fine-tuning of network trace sanitization
WiSec '11: Proceedings of the fourth ACM conference on Wireless network securityAnonymization is critical prior to sharing wireless-network traces within the research community, to protect both personal and organizational sensitive information from disclosure. One difficulty in anonymization, or more generally, sanitization, is ...
k-Anonymous data collection
To protect individual privacy in data mining, when a miner collects data from respondents, the respondents should remain anonymous. The existing technique of Anonymity-Preserving Data Collection partially solves this problem, but it assumes that the ...
Extending a re-identification risk-based anonymisation framework and evaluating its impact on data mining classifiers
Preserving sensitive information in data mining processes is one of the major issues in the context of big data. Handling huge volumes of data demands techniques to assure that private data is not accessible to non-authorised users. One of these ...
Comments