skip to main content
10.1145/1559845.1559964acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial

Large-scale uncertainty management systems: learning and exploiting your data

Published: 29 June 2009 Publication History

Abstract

The database community has made rapid strides in capturing, representing, and querying uncertain data. Probabilistic databases capture the inherent uncertainty in derived tuples as probability estimates. Data acquisition and stream systems can produce succinct summaries of very large and time-varying datasets. This tutorial addresses the natural next step in harnessing uncertain data: How can we efficiently and quantifiably determine what, how, and how much to learn in order to make good decisions based on the imprecise information available.
The material in this tutorial is drawn from a range of fields including database systems, control and information theory, operations research, convex optimization, and statistical learning. The focus of the tutorial is on the natural constraints that are imposed in a database context and the demands of imprecise information from an optimization point of view. We look both into the past as well as into the future; to discuss general tools and techniques that can serve as a guide to database researchers and practitioners, and to enumerate the challenges that lie ahead.

References

[1]
J. Abernethy, E. Hazan, and A. Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In COLT, pages 263--274, 2008.
[2]
B.M. Anthony, V. Goyal, A. Gupta, and V. Nagarajan. A plant location guide for the unsure. Annual ACM-SIAM Symp. on Discrete Algorithms, pages 1164--1173, 2008.
[3]
R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, 2000.
[4]
B. Awerbuch and R.D. Kleinberg. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. STOC, 2004.
[5]
B. Babcock and S. Chaudhuri. Towards a robust query optimizer: A principled and practical approach. In SIGMOD, pages 119--130, 2005.
[6]
S. Babu. Grand challenge: Experiment-driven adaptive systems. In Proc. Hot Topics in Autonomic Computing (HotAC III), 2008.
[7]
S. Babu, N. Borisov, S. Duan, H. Herodotou, and V. Thummala. Automated experiment-driven management of (database) systems. In Proceedings of Workshop on Hot Topics in Operating Systems (HotOS), 2009.
[8]
S. Babu, S. Duan, and K. Munagala. Fa: A system for automating failure diagnosis. ICDE, 2009.
[9]
S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom. Adaptive ordering of pipelined stream filters. In SIGMOD, 2004.
[10]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[11]
G. Cormode and A. McGregor. Approximation algorithms for clustering uncertain data. ACM Symp. on Principles of Database Systems, page 191, 2008.
[12]
A. Deshpande, C. Guestrin, S. Madden, J.M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In VLDB, 2004.
[13]
A. Deshpande and S. Sarawagi. Probabilistic graphical models and their role in databases. Proc VLDB, pages 1435--1436, 2007.
[14]
S. Duan and S. Babu. Processing forecasting queries. VLDB, 2007.
[15]
A. Flaxman, A. Kalai, and H.B. McMahan. Online convex optimization in the bandit setting: Gradient descent without a gradient. In SODA, 2005.
[16]
J.C. Gittins and D.M. Jones. A dynamic allocation index for the sequential design of experiments. Progress in statistics (European Meeting of Statisticians), 1972.
[17]
A. Goel, S. Guha, and K. Munagala. Asking the right questions: Model-driven optimization using probes. In PODS, 2006.
[18]
S. Guha and K. Munagala. Approximation algorithms for budgeted learning problems. STOC, 2007.
[19]
S. Guha and K. Munagala. Model-driven optimization using adaptive probes. In SODA, 2007.
[20]
S. Guha and K. Munagala. Exceeding expectations and clustering uncertain data. ACM Symp. on Principles of Database Systems, 2009.
[21]
S. Guha, K. Munagala, and S. Sarkar. Information acquisition and exploitation in multi-channel wireless systems. CoRR: 0804.1724, 2008.
[22]
S. Guha, K. Munagala, and P. Shi. Approximation algorithms for restless bandit problems. CoRR: 0711.3861, 2008.
[23]
R. Gupta and S. Sarawagi. Creating probabilistic databases from information extraction models. VLDB, pages 965--976, 2006.
[24]
P. Horn. Autonomic computing: IBM's perspective on the state of information technology. IBM Tech. Rep., 2001. http://www.research.ibm.com/autonomic.
[25]
J. Kleinberg, Y. Rabani, and E. Tardos. Allocating bandwidth for bursty connections. SIAM J. Comput, 30(1), 2000.
[26]
A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. UAI, 2005.
[27]
A. Krause, C. Guestrin, A. Gupta, and J. Kleinberg. Near-optimal sensor placements: Maximizing information while minimizing communication cost. In Proc. IPSN, 2006.
[28]
N. Littlestone and M.K. Warmuth. The weighted majority algorithm. Inf. Comput., 108(2):212--261, 1994.
[29]
L. Lovász and S. Vempala. Fast algorithms for logconcave functions: Sampling, rounding, integration and optimization. In FOCS, pages 57--68, 2006.
[30]
K. Munagala, S. Babu, R. Motwani, and J. Widom. The pipelined set cover problem. Proc. Intl. Conf. Database Theory, 2005.
[31]
G.L. Nemhauser, L.A. Wolsey, and M.L. Fisher. An analysis of approximations for maximizing submodular set functions-I. Math Programming, 14(1):265--294, 1978.
[32]
S. Pandey and C. Olston. Handling advertisements of unknown quality in search advertising. In NIPS, pages 1065--1072, 2006.
[33]
H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58:527--535, 1952.
[34]
S. Sarawagi and W.W. Cohen. Semi-Markov conditional random fields for information extraction. Proc. NIPS, 2004.
[35]
A.D. Sarma, S. Gollapudi, and S. Ieong. Bypass rates: reducing query abandonment using negative inferences. Proc. KDD, pages 177--185, 2008.
[36]
P. Shivam, S. Babu, and J.S. Chase. Active and accelerated learning of cost models for optimizing scientific applications. In VLDB, pages 535--546, 2006.
[37]
A. Silberstein, K. Munagala, A. Gelfand, G. Puggioni, and J. Yang. Suppression and failures in sensor networks: A Bayesian approach. VLDB, 2007.
[38]
J.C. Spall. Introduction to Stochastic Search and Optimization. Wiley-Interscience, 2003.
[39]
U. Srivastava, K. Munagala, and J. Widom. Operator placement for in-network stream query processing. In Proc. of the 2005 ACM Symp.\on Principles of Database Systems, 2005.
[40]
U. Srivastava, K. Munagala, J. Widom, and R. Motwani. Query optimization over web services. In VLDB, 2006.
[41]
R. Thonangi, V. Thummala, and S. Babu. Finding good configurations in high-dimensional spaces: Doing more with less. In Proc. MASCOTS, 2008.
[42]
P. Whittle. Restless bandits: Activity allocation in a changing world. Appl. Prob., 25(A):287--298, 1988.
[43]
M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. Proceedings of ICML, pages 928--936, 2003.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
June 2009
1168 pages
ISBN:9781605585512
DOI:10.1145/1559845
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2009

Check for updates

Author Tags

  1. algorithms
  2. measurement
  3. performance tuning

Qualifiers

  • Tutorial

Conference

SIGMOD/PODS '09
Sponsor:
SIGMOD/PODS '09: International Conference on Management of Data
June 29 - July 2, 2009
Rhode Island, Providence, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 572
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media