skip to main content
10.1145/1188455.1188542acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

Detecting distributed scans using high-performance query-driven visualization

Published: 11 November 2006 Publication History

Abstract

Modern forensic analytics applications, like network traffic analysis, perform high-performance hypothesis testing, knowledge discovery and data mining on very large datasets. One essential strategy to reduce the time required for these operations is to select only the most relevant data records for a given computation. In this paper, we present a set of parallel algorithms that demonstrate how an efficient selection mechanism -- bitmap indexing -- significantly speeds up a common analysis task, namely, computing conditional histogram on very large datasets. We present a thorough study of the performance characteristics of the parallel conditional histogram algorithms. As a case study, we compute conditional histograms for detecting distributed scans hidden in a dataset consisting of approximately 2.5 billion network connection records. We show that these conditional histograms can be computed on interactive time scale (i.e., in seconds). We also show how to progressively modify the selection criteria to narrow the analysis and find the sources of the distributed scans.

References

[1]
Amsden, P., Amweg, J., Calato, P., Bensley, S., and Lyons, G., 1997. Cabletron's lightweight flow admission protocol specification, version 1.0. IETF RFC 4124, http://www.ietf.org/rfc/rfc2124.txt.]]
[2]
Bellman, R. 1961. Adaptive Control Processes: A Guided Tour. Princeton University Press.]]
[3]
Bentley, J. 1975. Multidimensional binary search trees used for associative search. Communications of the ACM 18, 9, 509--516.]]
[4]
Berchtold, S., Jagadish, H. V., and Ross, K. A. 1998. Independence diagrams: A technique for visual data mining. In Proc. 4th Int. Conf. Knowledge Discovery and Data Mining, KDD, AAAI Press, R. Agrawal, P. E. Stolorz, and G. Piatetsky-Shapiro, Eds., 139--143.]]
[5]
Bethel, E. W., Campbell, S., Dart, E., Stockinger, K., and Wu, K. 2006. Accelerating network traffic analysis using query-driven visualization. In IEEE Symposium on Visual Analytics Science and Technology, IEEE Computer Society Press.]]
[6]
Brun, R., and Rademarkers, F. 1997. Root -- an object oriented data analysis framework. In Proceedings of the AIHENP 1996 Workshop, 81--86.]]
[7]
Burrescia, J., and Johnston, W., 2005. Esnet status update. Internet2 International Meeting.]]
[8]
Chan, C.-Y., and Ioannidis, Y. E. 1998. Bitmap index design and evaluation. In Proceedings of ACM SIGMOD International Conference on Management of Data.]]
[9]
Cox, K. C., Eick, S. G., and He, T. 1996. 3D Geographic Network Displays. SIGMOD Rec. 25, 4, 50--54.]]
[10]
Experiment, B., 2006. The babar experiment. http://wwwpublic.slac.stanford.edu/babar/.]]
[11]
Fisk, M., and Verghese, G. 2002. Agile and scalable analysis of network events. In Proceedings of ACM SIGCOMM Internet Measurement Workshop, 285--290.]]
[12]
Fisk, M., Smith, S. A., Weber, P., Kothapally, S., and Caudell, T. 2003. Immersive Network Monitoring. In Proceedings of the 2003 Passive and Active Measurement Workshop.]]
[13]
Fullmer, M., and Romig, S. 2000. The OSU Flow-tools package and Cisco Netflow Logs. In Proceedings of the 14th Systems Administrator Conference (LISA 2000), 291--303.]]
[14]
Gates, C., Collins, M., Duggan, M., Kompanek, A., and Thomas, M. 2004. More NetFlow Tools: For Performance and Security. In Proceedings of the USENIX18th Systems Administration Conference (LISA 2004), 121--131.]]
[15]
Goodall, J., Lutters, W., Rheingans, P., and Komlodi, A. 2005. Preserving the Big Picture: Visual Network Traffic Analysis with TNV. In Proceedings of the 2005 Workshop on Visualization for Computer Security, 47--54.]]
[16]
Grinstein, G., Keim, D., and Ward, M., 2002. Information visualization, visual data mining, and its application to drug design. IEEE Visualization 2002 Course #1 Notes, October.]]
[17]
Hochheiser, H., and Shneiderman, B. 2001. Visual specification of queries for finding patterns in time-series data. In Proceedings of Discovery Science.]]
[18]
Ioannidis, Y. 2003. The history of histograms (abridged). In International Conference on Very Large Data Bases.]]
[19]
Jacobsen, V., Leres, C., and McCanne, S., 1989. tcpdump. ftp://ftp.ee.lbl.gov/.]]
[20]
Johnson, T. 1999. Performance measurements of compressed bitmap indices. In International Conference on Very Large Data Bases.]]
[21]
Keim, D., and Kriegel, H.-P. 1994. Visdb: Database exploration using multidimensional visualization. IEEE Computer Graphics and Applications 14. 4, 40--49.]]
[22]
Kindlmann, G. 1999. Semi-Automatic Generation of Transfer Functions for Direct Volume Rendering. Master's thesis, Cornell University.]]
[23]
Kitware, Inc. 2003. The Visualization Toolkit User's Guide, January.]]
[24]
Knuth, D. 1998. The Art of Computer Programming, 2nd Ed., Volume 3. Addison-Wesley.]]
[25]
Komlodi, A., Rheingans, P., Ayachit, U., Goodall, J., and Joshi, A. 2005. A user-centered look at glyph-based security visualization. In Proceedings of the 2005 Workshop on Visualization for Computer Security.]]
[26]
Kornexl, S., Paxson, V., Dreger, H., Feldmann, A., and Sommer, R. 2005. Building a time machine for efficient recording and retrieval of high-volume network traffic. In Internet Measurement Conference.]]
[27]
Koutsofios, E. E., North, S. C., Truscott, R., and Keim, D. A. 1999. Visualizing large-scale telecommunication networks and services (case study). In VIS '99: Proceedings of the conference on Visualization '99, IEEE Computer Society Press, Los Alamitos, CA, USA, 457--461.]]
[28]
Lakkaraju, K., Yurcik, W., and Lee, A. 2004. NVisionIP: NetFlow Visualizations of System State for Security Situational Awareness. In Internet Proceedings of the 2004 ACM Workshop on Visualization and Data Mining for Computer Security (VizSEC/DMSEC-2004).]]
[29]
Lau, S. 2004. The spinning cube of potential doom. Communications of the ACM 47, 6, 25--26.]]
[30]
Levoy, M. 1989. Display of Surfaces from Volume Data. PhD thesis, University of North Carolina at Chapel Hill.]]
[31]
Livnat, Y., Agutter, J., Moon, S., Erbacher, R., and Foresti, S. 2005. A visual paradigm for network intrusion detection. In IEEE Workshop on Information Assurance And Security.]]
[32]
Lorensen, W. E., and Cline, H. E. 1987. Marching cubes: A high resolution 3d surface construction algorithm. In Computer Graphics (Proceedings of SIGGRAPH 87), vol. 21.]]
[33]
Max, N. 1995. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics 1, 2 (June).]]
[34]
McCanne, S., Leres, C., and Jacobsen, V., 1994. libpcap. ftp://ftp.ee.lbl.gov/.]]
[35]
McCormick, P., Inman, J., Ahrens, J., Hansen, C., and Roth, G. 2004. Scout: A hardware-accelerated system for quantitatively driven visualization and analysis. In Proceedings of IEEE Visualization, 171--178.]]
[36]
McPherson, J., Ma, K.-L., Krystosek, P., Bartoletti, T., and Christensen, M. 2004. Portvis: A tool for port-based detection of security events. In Proceedings of CCS Workshop on Visualization and Data Mining for Computer Security, ACM Conference on Computer and Communication Security.]]
[37]
Nielson, G. M., and Hamann, B. 1991. The asymptotic decider: Removing the ambiguity in marching cubes. In Proceedings of IEEE Visualization.]]
[38]
Oetiker, T., 2006. Multi router traffic grapher. http://mrtg.hdl.com/.]]
[39]
Oetiker, T., 2006. Round robin database tool. http://oss.oetiker.ch/rrdtool/.]]
[40]
O'neil, P., and Quass, D. 1997. Improved query performance with variant indices. In Proceedings of ACM SIGMOD International Conference on Management of Data, ACM Press.]]
[41]
O'neil, P. 1987. Model 204 architecture and performance. In Second International Workshop in High Performance Transaction Systems, Springer Verlag.]]
[42]
Paxson, V. 1998. Bro: A system for detecting network intruders in real-time. In Proceedings of the 7th USENIX Security Symposium.]]
[43]
Phaal, P., Panchen, S., and McKee, N., 2001. Inmon corporation's sflow: A method for monitoring traffic in switched and routed networks. IETF RFC 3176, http://www.app.sietf.org/rfc/rfc3176.html.]]
[44]
Plonka, D. 2000. FlowScan: A Network Traffic Flow Reporting and Visualization Tool. In Proceedings of the 14th Systems Administrator Conference (LISA 2000), 305--317.]]
[45]
Products, E. S., 2006. The fast light toolkit. http://www.fltk.org.]]
[46]
R3vis, 1999-2006. OpenRM Scene Graph. http://www.openrm.org.]]
[47]
Scientific Data Management Group, L. B. N. L., 2006. Fastbit. http://sdm.lbl.gov/fastbit.]]
[48]
Shoshani, A., Bernardo, L., Nordberg, H., Rotem, D., and Sim, A. 1999. Multidimensional indexing and query coordination for tertiary storage management. In International Conference on Scientific and Statistical Database Management, IEEE Computer Society. 1998. Proceedings of the 1998 ACM SIGMOD: International Conference on Management of Data, ACM Press, New York, NY, USA.]]
[49]
Stockinger, K., Duellmann, D., Hoschek, W., and Schikuta, E. 2000. Improving the performance of high-energy physics analysis through bitmap indices. In Proceedings of the 11th International Conference on Database and Expert Systems Applications, Springer Verlag.]]
[50]
Stockinger, K., Wu, K., Campbell, S., Lau, S., M, F., Gavrilov, E., Kent, A., Davis, C. E., Olinger, R., Young, R., Prewett, J., Weber, P., Caudell, T. P., Bethel, E. W., and Smith, S. 2005. Network traffic analysis with query driven visualization - sc 2005 hpc analytics results. In SC05, HPC Analytics Challenge, ACM Press.]]
[51]
Stockinger, K., Shalf, J., Wu, K., and Bethel, E. W. 2005. Query-driven visaulization of large data sets. In Proceedings of IEEE Visualization.]]
[52]
Stockinger, K., Wu, K., Brun, R., and Canal, P. 2006. Bitmap indices for fast end-user physics analysis in root. Nuclear Instruments and Methods in Physics Research, Section A - Accelerators, Spectrometers, Detectors and Associated Equipment 559, 99--102.]]
[53]
Systems, C., 2005. Cisco netflow collection engine. http://www.cisco.com/en/US/products/sw/netmgtsw/ps1964/.]]
[54]
Thomas, J. J., and Eds., K. A. C. 2005. Illuminating the Path -- The Research and Development Agenda for Visual Analytics. IEEE Computer Society Press.]]
[55]
Uphoff, B., and Criscuolo, P. 2004. A framework for collection and management of intrusion detection data sets. In Proceedings of the 16th Annual FIRST Conference on Computer Security Incident Handling.]]
[56]
Ware, C. 2004. Information Visualization: Perception for Design, second ed. Morgan Kaufmann Publishers.]]
[57]
Wu, K., Otoo, E., and Shoshani, A. 2001. A performance comparison of bitmap indices. In Proceedings of the ACM CIKM International Conference on Information and Knowledge Management, ACM Press.]]
[58]
Wu, K., Otoo, E., and Shoshani, A. 2004. On the performance of bitmap indices for high cardinality attributes. In Proceedings of the International Conference on Very Large Data Bases.]]
[59]
Wu, K., Otoo, E. J., and Shoshani, A. 2006. An Efficient Compression Scheme For Bitmap Indices. ACM Transactions on Database Systems 31, 1--38.]]
[60]
Yin, X., Yurcik, W., Treaster, M., Li, Y., and Lakkaraju, K. 2004. VisFlowConnect: NetFlow Visualizations of Link Relations for Security Situational Awareness. In Internet Proceedings of the 2004 ACM Workshop on Visualization and Data Mining for Computer Security (VizSEC/DMSEC-2004).]]

Cited By

View all
  • (2017)Optimizing the query performance of block index through data analysis and I/O modelingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126934(1-10)Online publication date: 12-Nov-2017
  • (2017)ZNS - Efficient query processing with ZurichNoSQLData & Knowledge Engineering10.1016/j.datak.2017.09.004112:C(38-54)Online publication date: 1-Nov-2017
  • (2016)Indexing blocks to reduce space and time requirements for searching large data filesProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.18(398-402)Online publication date: 16-May-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing
November 2006
746 pages
ISBN:0769527000
DOI:10.1145/1188455
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data mining
  2. network connection analysis
  3. network security
  4. query-driven visualization
  5. visual analytics

Qualifiers

  • Article

Conference

SC '06
Sponsor:

Acceptance Rates

SC '06 Paper Acceptance Rate 54 of 239 submissions, 23%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Optimizing the query performance of block index through data analysis and I/O modelingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126934(1-10)Online publication date: 12-Nov-2017
  • (2017)ZNS - Efficient query processing with ZurichNoSQLData & Knowledge Engineering10.1016/j.datak.2017.09.004112:C(38-54)Online publication date: 1-Nov-2017
  • (2016)Indexing blocks to reduce space and time requirements for searching large data filesProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.18(398-402)Online publication date: 16-May-2016
  • (2012)Parallel I/O, analysis, and visualization of a trillion particle simulationProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389077(1-12)Online publication date: 10-Nov-2012
  • (2012)Revisiting network scanning detection using sequential hypothesis testingSecurity and Communication Networks10.1002/sec.4165:12(1337-1350)Online publication date: 1-Dec-2012
  • (2011)Federal market information technology in the post flash crash eraProceedings of the fourth workshop on High performance computational finance10.1145/2088256.2088267(23-30)Online publication date: 13-Nov-2011
  • (2011)Parallel index and query for large scale data analysisProceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2063384.2063424(1-11)Online publication date: 12-Nov-2011
  • (2009)Terascale data organization for discovering multivariate climatic trendsProceedings of the Conference on High Performance Computing Networking, Storage and Analysis10.1145/1654059.1654075(1-12)Online publication date: 14-Nov-2009
  • (2009)FastBit: interactively searching massive dataJournal of Physics: Conference Series10.1088/1742-6596/180/1/012053180(012053)Online publication date: 11-Aug-2009
  • (2008)Investigating design choices between Bitmap index and B-tree index for a large data warehouse systemProceedings of the 8th conference on Applied computer scince10.5555/1504034.1504058(123-130)Online publication date: 21-Nov-2008
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media