Skip to main content
Log in

Real-time creation of bitmap indexes on streaming network data

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

High-speed archival and indexing solutions of streaming traffic are growing in importance for applications such as monitoring, forensic analysis, and auditing. Many large institutions require fast solutions to support expedient analysis of historical network data, particularly in case of security breaches. However, “turning back the clock” is not a trivial task. The first major challenge is that such a technology needs to support data archiving under extremely high-speed insertion rates. Moreover, the archives created have to be stored in a compressed format that is still amenable to indexing and search. The above requirements make general-purpose databases unsuitable for this task and dedicated solutions are required. This work describes a solution for high-speed archival storage, indexing, and data querying on network flow information. We make the two following important contributions: (a) we propose a novel compressed bitmap index approach that significantly reduces both CPU load and disk consumption and, (b) we introduce an online stream reordering mechanism that further reduces space requirements and improves the time for data retrieval. The reordering methodology is based on the principles of locality-sensitive hashing (LSH) and also of interest for other bitmap creation techniques. Because of the synergy of these two components, our solution can sustain data insertion rates that reach 500,000–1 million records per second. To put these numbers into perspective, typical commercial network flow solutions can currently process 20,000–60,000 flows per second. In addition, our system offers interactive query response times that enable administrators to perform complex analysis tasks on the fly. Our technique is directly amenable to parallel execution, allowing its application in domains that are challenged by large volumes of historical measurement data, such as network auditing, traffic behavior analysis, and large-scale data visualization in service provider networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 671–682 (2006)

  2. Abadi, D.J., Ahmad, Y., Balazinska, M., Cetintemel, U., Cherniack, M., Hwang, J.-H., Lindner, W., Maskey, A.S., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., Zdonik, S.: The design of the borealis stream processing engine. In: Second Biennial Conference on Innovative Data Systems Research (CIDR) (2005)

  3. Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores versus row-stores: how different are they really? In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 967–980 (2008)

  4. Andrade, H., Gedik, B., Wu, K.-L., Yu, P.S.: Scale-up strategies for processing high-rate data streams in system S. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 1375–1378 (2009)

  5. Anh V.N., Moffat A.: Inverted index compression using word-aligned binary codes. Inf. Retr. 8(1), 151–166 (2005)

    Article  Google Scholar 

  6. Antoshenkov G., Ziauddin M.: Query processing and optimization in Oracle Rdb. Very Large Data Bases J. 5(4), 229–237 (1996)

    Article  Google Scholar 

  7. Apaydin, T., Ferhatosmanoglu, H., Canahuate, G., Tosun, A.C.: Dynamic data organization for bitmap indices. In: Proceedings of International Conference on Scalable Information Systems (INFOSCALE), pp. 30:1–30:10 (2008)

  8. Bethel, E.W., Campbell, S., Dart, E., Stockinger, K., Wu, K.: Accelerating network traffic analysis using query-driven visualization. In: Proceedings of IEEE Symposium on Visual Analytics Science and Technology (IEEE VAST), pp. 115–122 (2006)

  9. Boncz P.A., Kersten M.L., Manegold S.: Breaking the memory wall in MonetDB. Commun. ACM 51(12), 77–85 (2008)

    Article  Google Scholar 

  10. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Reiss, F., Shah, M.A.: TelegraphCQ: continuous dataflow processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 668–668 (2003)

  11. Chang F., Dean J., Ghemawat S., Hsieh W.C., Wallach D.A., Burrows M., Chandra T., Fikes A., Gruber R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 1–26 (2008)

    Article  MATH  Google Scholar 

  12. Cranor, C.D., Johnson, T., Spatscheck, O., Shkapenyuk, V.: Gigascope: a stream database for network applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 647–651 (2003)

  13. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedingds of the Symposium on Computational Geometry, pp. 253–262 (2004)

  14. Deliége, F., Pedersen, T.B.: Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 228–239 (2010)

  15. Endace. Endace Measurement Systems, NinjaProbe Appliances. http://www.endace.com

  16. Fang, W., He, B., Luo, Q.: Database compression on graphics processors. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 670–680 (2010)

  17. FastBit. An Efficient Compressed Bitmap Index Technology. https://sdm.lbl.gov/fastbit/

  18. Ferragina, P.: Data structures: time, I/Os, entropy, joules! In: Proceedings of 18th Annual European Conference on Algorithms: part II, pp. 1–16 (2010)

  19. Fujioka, K., Uematsu, Y., Onizuka, M.: Application of bitmap index to information retrieval. In: Proceedings of the international World Wide Web conference (WWW), pp. 1109–1110 (2008)

  20. Fusco, F., Stoecklin, M., Vlachos, M.: NET-FLi: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic. In: Proceedings of the International Conference on Very Large DataBases (VLDB), pp. 1382–1393 (2010)

  21. Gailly, J.-L., Adler, M.: The ZLIB library. http://www.zlib.org/

  22. Gates, C., Collins, M., Duggan, M., Kompanek, A., Thomas, M.: More netflow tools for performance and security. In: Proceedings of USENIX Conference on System Administration, pp. 121–132 (2004)

  23. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 518–529 (1999)

  24. Giura, P., Memon, N.: Netstore: an efficient storage infrastructure for network forensics and monitoring. In: Proceedings of the International Symposium on Recent Advances in Intrusion Detection (RAID), pp. 277–296 (2010)

  25. Haag, P.: Nfdump. http://nfdump.sourceforge.net/

  26. Harizopoulos, S., Liang, V., Abadi, D.J., Madden, S.: Performance tradeoffs in read-optimized databases. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 487–498 (2006)

  27. Holloway A.L., DeWitt D.J.: Read-optimized databases, in depth. Proc. VLDB Endow. 1, 502–513 (2008)

    Google Scholar 

  28. IBM Corp., AURORA—Traffic Analysis and Visualization. http://www.zurich.ibm.com/aurora/

  29. Intel. Intel. SSE4 Programming Reference (2007)

  30. Karagiannis, T., Papagiannaki, K., Faloutsos, M.: BLINC: multilevel traffic classification in the dark. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), pp. 229–240 (2005)

  31. Kaser, O., Lemire, D., Aouiche, K.: Histogram-aware sorting for enhanced word-aligned compression in bitmap indexes. In: Proceedings of International Workshop on Data Warehousing and OLAP (DOLAP), pp. 1–8 (2008)

  32. Lemire D., Kaser O., Aouiche K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69(1), 3–28 (2010)

    Article  Google Scholar 

  33. Li, X., Bian, F., Zhang, H., Diot, C., Govindan, R., Hong, W., Iannaccone, G.: MIND: a distributed multi-dimensional indexing system for network diagnosis. In: Proceedings of the IEEE International Conference on Computer Communications (INFOCOM) (2006)

  34. Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 950–961 (2007)

  35. Morariu, C., Kramis, T., Stiller, B.: DIPStorage: Distributed storage of IP flow records. In: Proceedings of the 16th Workshop on Local and Metropolitan Area Networks (LANMAN) (2008)

  36. Network Top. http://www.ntop.org/

  37. Niksun. Niksun NetDetector. http://niksun.com

  38. Oberhumer, M.F.: The Lempel-Ziv-Oberhumer Packer. http://www.lzop.org/

  39. Oberhumer, M.F.: Lzo documentation. http://www.oberhumer.com/opensource/lzo/lzodoc.php

  40. Pinar, A., Tao, T., Ferhatosmanoglu, H.: Compressing bitmap indices by data reorganization. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 310–321 (2005)

  41. Plagemann, T., Goebel, V., Bergamini, A., Tolu, G., Urvoy-Keller, G., Biersack, E.W.: Using data stream management systems for traffic analysis—a case study. In: Proceedings of the Passive and Active Measurement Conference (PAM), pp. 215–226 (2004)

  42. Reiss, F., Stockinger, K., Wu, K., Shoshani, A., Hellerstein, J.M.: Enabling real-time querying of live and historical stream data. In:~Proceedings of International Conference on Scientific and Statistical Database Management (SSDBM), pp. 28 (2007)

  43. Romig, S., Fullmer, M., Luman, R.: The OSU flow-tools package and CISCO NetFlow logs. In: Proceedings of USENIX Conference on System Administration, pp. 291–304 (2000)

  44. Schatzmann, D., Mühlbauer, W., Spyropoulos, T., Dimitropoulos, X.: Digging into https: flow-based classification of webmail traffic. In: IMC ’10: Proceedings of the 10th Internet Measurement Conference. Melbourne, Australia, Nov (2010)

  45. Stabno, M., Wrembel, R.: RLH: bitmap compression technique based on run-length and Huffman encoding. In: Proceedings of ACM International Workshop on Data Warehousing and OLAP (DOLAP), pp. 41–48 (2007)

  46. Stonebraker, M., et~al.: C-Store: a column-oriented DBMS. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 553–564 (2005)

  47. Sullivan, M., Heybey, A.: Tribeca: a system for managing large databases of network traffic. In: Proceedings of USENIX Annual Technical Conference, p. 2 (1998)

  48. Wu, K., Otoo, E., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 24–35 (2004)

  49. Wu K., Otoo E.J., Shoshani A.: Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. 31(1), 1–38 (2006)

    Article  Google Scholar 

  50. Wu, K., Otoo, E.J., Shoshani, A., Nordberg, H.: Notes on design and implementation of compressed bit vectors. Technical Report LBNL/PUB-3161, Lawrence Berkeley National Laboratory, Berkeley, CA (USA)

  51. Wu, K.-L., et al.: Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 1185–1196 (2007)

  52. Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), p. 59 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Fusco.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fusco, F., Vlachos, M. & Stoecklin, M.P. Real-time creation of bitmap indexes on streaming network data. The VLDB Journal 21, 287–307 (2012). https://doi.org/10.1007/s00778-011-0242-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-011-0242-x

Keywords

Navigation