Skip to main content

Hancock: A Language for Analyzing Transactional Data Streams

  • Chapter
  • First Online:
  • 3426 Accesses

Part of the book series: Data-Centric Systems and Applications ((DCSA))

Abstract

Massive transaction streams present a number of opportunities for data mining techniques. Transactions might represent calls on a telephone network, commercial credit card purchases, stock market trades, or HTTP requests to a web server. While historically such data have been collected for billing or security purposes, they are now being used to discover how the transactors, e.g., credit-card numbers or IP addresses, use the associated services. For over six years, we have computed evolving profiles (called signatures) of the transactors in several large data streams. The signature for each transactor captures the salient features of his or her transactions through time. Programs for processing signatures must be highly optimized because of the size of the data stream (several gigabytes per day) and the number of signatures to maintain (hundreds of millions). Originally, we wrote such programs directly in C, but because signature programs often sacrificed readability for performance, they were difficult to verify and maintain. Hancock is a domain-specific language created to express computationally efficient signature programs cleanly. In this chapter, we describe the obstacles to computing signatures from massive streams and explain how Hancock addresses these problems. For expository purposes, we present Hancock using a running example from the telecommunications industry; however, the language itself is general and applies equally well to other data sources.

C. Cortes, K. Fisher, D. Pregibon, A. Rogers and F. Smith, ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 26 Issue 2, March 2004, Pages 301–338. DOI: 10.1145/973097.973100, © 2004 ACM, Reprinted with permission.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A.W. Appel, A runtime system. Lisp and Symbolic Computation 4(3), 343–380 (1990)

    Article  Google Scholar 

  2. M. Atkinson, L. Daynes, M. Jordan, T. Printezis, S. Spence, An orthogonally persistent Java. ACM SIGMOD Rec. 25(4) (1996)

    Google Scholar 

  3. B. Babcock, S. Babu, M. Data, R. Motwani, J. Widom, Models and issues in data stream systems, in Proceedings of the 2002 ACM Symposium on Principles of Database Systems (PODS 2002) (2002). See the Stream Project homepage, www-db.stanford.edu/stream for a complete list of papers

    Google Scholar 

  4. D. Belanger, K. Church, A. Hume, Virtual data warehousing, data publishing, and call detail, in Processings of Databases in Telecommunications 1999, International Workshop. Also Appears in Springer Verlag LNCS, vol. 1819 (1999), pp. 106–117

    Google Scholar 

  5. D. Bonachea, K. Fisher, A. Rogers, F.S. Hancock, A language for processing very large-scale data, in USENIX 2nd Conference on Domain-Specific Languages, USENIX Association (1999), pp. 163–176

    Google Scholar 

  6. P. Burge, J. Shawe-Taylor, Frameworks for fraud detection in mobile telecommunications networks, in Proceedings of the Fourth Annual Mobile and Personal Communications Seminar, University of Limerick (1996)

    Google Scholar 

  7. D. Carney, U. Cetinemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S. Zdonik, Monitoring streams–a new class of data management applications, in Proceedings of the 28th VLDB Conference (2002). See the Aurora Project homepage, www.cs.brown.edu/research/aurora/main.html for a complete list of papers

    Google Scholar 

  8. S. Chandra, N. Heintze, D. MacQueen, D. Oliva, M. Siff, Pre-release of C-frontend library for SML/NJ (1999). See cm.bell-labs.com/cm/cs/what/smlnj

  9. S. Chandrasekaran, M.J. Franklin, Streaming queries over streaming data, in Proceedings of the 28th VLDB Conference (2002)

    Google Scholar 

  10. C. Cortes, K. Fisher, D. Pregibon, A. Rogers, F.S. Hancock, A language for extracting signatures from data streams, in Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (2000), pp. 9–17

    Chapter  Google Scholar 

  11. C. Cortes, K. Fisher, D. Pregibon, A. Rogers, F.S. Hancock, A language for analyzing transactional data streams. ACM Transactions on Programming Languages and Systems 26(2), 301–338 (2004)

    Article  Google Scholar 

  12. C. Cortes, D. Pregibon, Giga mining, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (1998)

    Google Scholar 

  13. C. Cortes, D. Pregibon, Information mining platform: an infrastructure for KDD rapid deployment, in Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (1999)

    Google Scholar 

  14. D.E. Denning, An intrusion-detection model, IEEE Trans. Softw. Eng. 13(2) (1987)

    Google Scholar 

  15. T. Fawcett, F. Provost, Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1997)

    Article  Google Scholar 

  16. K. Fisher, C. Goodall, K. Hogstedt, A. Rogers, An application-specific database, in Proceedings of 8th Biennial Workshop on Data Bases and Programming Languages (DBPL’01). LNCS, vol. 2397 (Springer, Berlin, 2002), pp. 213–227

    Google Scholar 

  17. P. Gupta, S. Lin, M. McKeown, Routing lookups in hardware and memory access speeds, in Proc. 17th Ann. Joint Conf. of the IEEE Computer and Communications Societies, vol. 3 (1998), pp. 1240–1247

    Google Scholar 

  18. J. Hellerstein, M. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, M. Shah, Adaptive query processing: technology in evolution, in IEEE Data Eng. Bulletin (2000), pp. 7–18. See the Telegraph Project homepage telegraph.cs.berkley.edu for a complete list papers

    Google Scholar 

  19. N.-F. Huang, S.-M. Zhao, J.-Y. Pan, C.-A. Su, A fast IP routing lookup scheme for gigabit switching routers, in Proc. 18th Ann. Joint Conf. of the IEEE Computer and Communications Societies, vol. 3 (1999), pp. 1429–1436

    Google Scholar 

  20. M. Knasmüller, Adding persistence to the Oberon system, in Proceedings of the Joint Modular Languages Conference 97 (1997)

    Google Scholar 

  21. B. Lampson, V. Srinivasan, G. Varghese, IP lookups using multiway and multicolumn search. IEEE/ACM Transactions on Networking 7(3), 324–334 (1999)

    Article  Google Scholar 

  22. B. Liskov, M. Castro, L. Shrira, A. Adya, Providing persistent objects in distributed systems, in Proceedings of the 13th European Conference on Object-Oriented Programming (ECOOP’99) (1999)

    Google Scholar 

  23. G. Nelson (ed.), Systems Programming with Modula-3 (Prentice Hall, New York, 1991)

    Google Scholar 

  24. R. Riggs, J. Waldo, A. Wollrath, K. Bharat, Pickling state in the Java system, in Proceedings of the USENIX 1996 Conference on Object-Oriented Technologies (COOTS) (1996)

    Google Scholar 

  25. SIGMOD. Proceedings of SIGMOD (2002)

    Google Scholar 

  26. M. Sullivan, A. Heybey, Tribeca: a system for managing large databases of network traffic, in Proceedings of the USENIX Annual Technical Conference (No. 98) (1998)

    Google Scholar 

  27. G. van Rossum Python library reference (2001). python.sourceforge.net/devel-docs/lib/lib.html

  28. VLDB. Proceedings of the 28th VLDB conference (2002)

    Google Scholar 

  29. D.C. Wang, The asdlGen reference manual. See www.cs.princeton.edu/zephyr/ASDL (1998)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kathleen Fisher .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Cortes, C., Fisher, K., Pregibon, D., Rogers, A., Smith, F. (2016). Hancock: A Language for Analyzing Transactional Data Streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28608-0_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28607-3

  • Online ISBN: 978-3-540-28608-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics