Skip to main content
Log in

Irrevocable-choice algorithms for sampling from a stream

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The problem of sampling from data streams has attracted significant interest in the last decade. Whichever sampling criteria is considered (uniform sample, maximally diverse sample, etc.), the challenges stem from the relatively small amount of memory available in the face of unbounded streams. In this work we consider an interesting extension of this problem, the framework of which is stimulated by recent improvements in sensing technologies and robotics. In some situations it is not only possible to digitally sense some aspects of the world, but to physically capture a tangible aspect of that world. Currently deployed examples include devices that can capture water/air samples, and devices that capture individual insects or fish. Such devices create an interesting twist on the stream sampling problem, because in most cases, the decision to take a physical sample is irrevocable. In this work we show how to generalize diversification sampling strategies to the irrevocable-choice setting, demonstrating our ideas on several real world domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Aggarwal CC (2006) Data streams: models and algorithms (advances in database systems). Springer, New York

    Google Scholar 

  • Anderson R et al (2010) Mars Science Laboratory participating scientists program proposal information package. NASA/Jet Propulsion Laboratory, Pasadena

    Google Scholar 

  • Baldridge AM, Hook SJ, Grove CI, Rivera G (2009) The ASTER spectral library version 2.0. Remote Sens Environ 113(4):711–715

    Article  Google Scholar 

  • Begum N, Keogh E (2014) Rare time series motif discovery from unbounded streams. Proc VLDB Endow 8(2):149–160

    Article  Google Scholar 

  • Bowman A, Azzalini A (1997) Applied smoothing techniques for data analysis: the kernel approach with S-Plus illustrations. Oxford University Press, New York

    MATH  Google Scholar 

  • Cerra D, Bieniarz J, Avbelj J, Reinartz P, Mueller R (2011) Compression-based unsupervised clustering of spectral signatures. Whispers, Oro Valley

    Book  Google Scholar 

  • Chen Y, Why A, Batista G, Mafra-Neto A, Keogh E (2014) Flying insect classification with inexpensive sensors. J Insect Behav 27(5):657–677

    Article  Google Scholar 

  • Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20

    Article  Google Scholar 

  • Drosou M, Pitoura E (2012a) Disc diversity: result diversification based on dissimilarity and coverage. Proc VLDB Endow 6(1):13–24

  • Drosou M, Pitoura E (2012b) Dynamic diversification of continuous data. In: Proceedings of the 15th EDBT/ICDT, ACM, pp 216–227

  • Erkut E (1990) The discrete p-dispersion problem. Eur J Oper Res 46(1):48–60

    Article  MathSciNet  MATH  Google Scholar 

  • Erkut E, Ülküsal Y, Yenicerioğlu O (1994) A comparison of p-dispersion heuristics. Comput Oper Res 21(10):1103–1113

    Article  MATH  Google Scholar 

  • Ferguson TS (2006) Optimal stopping and applications. Online Book. www.math.ucla.edu/~tom/Stopping/Contents.html

  • Fønss A, Munksgaard L (2008) Automatic blood sampling in dairy cows. Comput Electron Agric 64(1):27–33

    Article  Google Scholar 

  • Ghosh JB (1996) Computational aspects of the maximum diversity problem. Oper Res Lett 19(4):175–181

    Article  MathSciNet  MATH  Google Scholar 

  • Goldberg D (2011) Huxley: a flexible robot control architecture for autonomous underwater vehicles. In: Proceedings of IEEE OCEANS conference (Spain, 2011), pp 1–10

  • Hill TP (2009) Knowing when to stop: how to gamble if you must—the mathematics of optimal stopping. Am Sci 97(2):126–133

    Article  Google Scholar 

  • Honda MC, Watanabe S (2007) Utility of an automatic water sampler to observe seasonal variability in nutrients and DIC in the Northwestern North Pacific. J Oceanogr 63(3):349–362

    Article  Google Scholar 

  • Jonsson F (2015) Real-time fish type recognition in underwater images for sustainable fishing. Technical report. Uppsala University, Uppsala

    Google Scholar 

  • Matlab ksdensity function (2016) http://www.mathworks.com/help/stats/ksdensity.html

  • Minack E, Siberski W, Nejdl W (2011) Incremental diversification for very large sets: a streaming-based approach. In: ACM SIGIR (July 2011), pp 585–594

  • Peskir G, Shiryaev A (2006) Optimal stopping and free-boundary problems. Lectures in Mathematics. ETH, Zürich

    MATH  Google Scholar 

  • Project Premonition (2015a) http://www.research.microsoft.com/en-us/um/redmond/projects/projectpremonition/default.aspx. Accessed 2 Aug 2015

  • Project Premonition (2015b) URL of Video of First Trials in Granada. Seeking to prevent disease outbreaks. https://www.youtube.com/watch?v=v8uG82Z7VLM

  • Project Webpage (2016) https://sites.google.com/site/irrevocablestreamingdata/

  • Rasmussen SL, Starr N (1979) Optimal and adaptive stopping in the search for new species. J Am Stat Assoc 74(367):661–667

    Article  MathSciNet  MATH  Google Scholar 

  • Roman C, Mather R (2010) Autonomous underwater vehicles as tools for deep-submergence archaeology. Eng Marit Environ 224(4):327–340

    Google Scholar 

  • Silver JB (2008) Chapter 14: estimating the size of the adult population. Mosquito ecology field sampling methods, 3rd edn. Springer, New York

    Google Scholar 

  • Vitter JS (1985) Random sampling with a reservoir. ACM Trans. Math Softw (TOMS) 11(1):37–57

    Article  MathSciNet  MATH  Google Scholar 

  • Webster G, Agle DC (2012) Mars Science Laboratory/Curiosity Mission status report. NASA, New York

    Google Scholar 

  • Zhang D et al (2015) Automatic fish taxonomy using evolution-constructed features for invasive species removal. Pattern Anal Appl 18(2):451–459

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Zhu.

Additional information

Responsible editor: Thomas Gärtner, Mirco Nanni, Andrea Passerini and Celine Robardet.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Keogh, E. Irrevocable-choice algorithms for sampling from a stream. Data Min Knowl Disc 30, 998–1023 (2016). https://doi.org/10.1007/s10618-016-0472-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-016-0472-z

Keywords

Navigation