tutorial

Large-scale uncertainty management systems: learning and exploiting your data

Authors:

Kamesh MunagalaAuthors Info & Claims

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Pages 995 - 998

https://doi.org/10.1145/1559845.1559964

Published: 29 June 2009 Publication History

Abstract

The database community has made rapid strides in capturing, representing, and querying uncertain data. Probabilistic databases capture the inherent uncertainty in derived tuples as probability estimates. Data acquisition and stream systems can produce succinct summaries of very large and time-varying datasets. This tutorial addresses the natural next step in harnessing uncertain data: How can we efficiently and quantifiably determine what, how, and how much to learn in order to make good decisions based on the imprecise information available.

The material in this tutorial is drawn from a range of fields including database systems, control and information theory, operations research, convex optimization, and statistical learning. The focus of the tutorial is on the natural constraints that are imposed in a database context and the demands of imprecise information from an optimization point of view. We look both into the past as well as into the future; to discuss general tools and techniques that can serve as a guide to database researchers and practitioners, and to enumerate the challenges that lie ahead.

References

[1]

J. Abernethy, E. Hazan, and A. Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In COLT, pages 263--274, 2008.

[2]

B.M. Anthony, V. Goyal, A. Gupta, and V. Nagarajan. A plant location guide for the unsure. Annual ACM-SIAM Symp. on Discrete Algorithms, pages 1164--1173, 2008.

Digital Library

[3]

R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, 2000.

Digital Library

[4]

B. Awerbuch and R.D. Kleinberg. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. STOC, 2004.

Digital Library

[5]

B. Babcock and S. Chaudhuri. Towards a robust query optimizer: A principled and practical approach. In SIGMOD, pages 119--130, 2005.

Digital Library

[6]

S. Babu. Grand challenge: Experiment-driven adaptive systems. In Proc. Hot Topics in Autonomic Computing (HotAC III), 2008.

[7]

S. Babu, N. Borisov, S. Duan, H. Herodotou, and V. Thummala. Automated experiment-driven management of (database) systems. In Proceedings of Workshop on Hot Topics in Operating Systems (HotOS), 2009.

Digital Library

[8]

S. Babu, S. Duan, and K. Munagala. Fa: A system for automating failure diagnosis. ICDE, 2009.

Digital Library

[9]

S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom. Adaptive ordering of pipelined stream filters. In SIGMOD, 2004.

Digital Library

[10]

S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

[11]

G. Cormode and A. McGregor. Approximation algorithms for clustering uncertain data. ACM Symp. on Principles of Database Systems, page 191, 2008.

Digital Library

[12]

A. Deshpande, C. Guestrin, S. Madden, J.M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In VLDB, 2004.

Digital Library

[13]

A. Deshpande and S. Sarawagi. Probabilistic graphical models and their role in databases. Proc VLDB, pages 1435--1436, 2007.

Digital Library

[14]

S. Duan and S. Babu. Processing forecasting queries. VLDB, 2007.

Digital Library

[15]

A. Flaxman, A. Kalai, and H.B. McMahan. Online convex optimization in the bandit setting: Gradient descent without a gradient. In SODA, 2005.

Digital Library

[16]

J.C. Gittins and D.M. Jones. A dynamic allocation index for the sequential design of experiments. Progress in statistics (European Meeting of Statisticians), 1972.

[17]

A. Goel, S. Guha, and K. Munagala. Asking the right questions: Model-driven optimization using probes. In PODS, 2006.

Digital Library

[18]

S. Guha and K. Munagala. Approximation algorithms for budgeted learning problems. STOC, 2007.

Digital Library

[19]

S. Guha and K. Munagala. Model-driven optimization using adaptive probes. In SODA, 2007.

Digital Library

[20]

S. Guha and K. Munagala. Exceeding expectations and clustering uncertain data. ACM Symp. on Principles of Database Systems, 2009.

Digital Library

[21]

S. Guha, K. Munagala, and S. Sarkar. Information acquisition and exploitation in multi-channel wireless systems. CoRR: 0804.1724, 2008.

[22]

S. Guha, K. Munagala, and P. Shi. Approximation algorithms for restless bandit problems. CoRR: 0711.3861, 2008.

[23]

R. Gupta and S. Sarawagi. Creating probabilistic databases from information extraction models. VLDB, pages 965--976, 2006.

Digital Library

[24]

P. Horn. Autonomic computing: IBM's perspective on the state of information technology. IBM Tech. Rep., 2001. http://www.research.ibm.com/autonomic.

[25]

J. Kleinberg, Y. Rabani, and E. Tardos. Allocating bandwidth for bursty connections. SIAM J. Comput, 30(1), 2000.

Digital Library

[26]

A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. UAI, 2005.

Digital Library

[27]

A. Krause, C. Guestrin, A. Gupta, and J. Kleinberg. Near-optimal sensor placements: Maximizing information while minimizing communication cost. In Proc. IPSN, 2006.

Digital Library

[28]

N. Littlestone and M.K. Warmuth. The weighted majority algorithm. Inf. Comput., 108(2):212--261, 1994.

Digital Library

[29]

L. Lovász and S. Vempala. Fast algorithms for logconcave functions: Sampling, rounding, integration and optimization. In FOCS, pages 57--68, 2006.

Digital Library

[30]

K. Munagala, S. Babu, R. Motwani, and J. Widom. The pipelined set cover problem. Proc. Intl. Conf. Database Theory, 2005.

Digital Library

[31]

G.L. Nemhauser, L.A. Wolsey, and M.L. Fisher. An analysis of approximations for maximizing submodular set functions-I. Math Programming, 14(1):265--294, 1978.

Digital Library

[32]

S. Pandey and C. Olston. Handling advertisements of unknown quality in search advertising. In NIPS, pages 1065--1072, 2006.

[33]

H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58:527--535, 1952.

[34]

S. Sarawagi and W.W. Cohen. Semi-Markov conditional random fields for information extraction. Proc. NIPS, 2004.

[35]

A.D. Sarma, S. Gollapudi, and S. Ieong. Bypass rates: reducing query abandonment using negative inferences. Proc. KDD, pages 177--185, 2008.

Digital Library

[36]

P. Shivam, S. Babu, and J.S. Chase. Active and accelerated learning of cost models for optimizing scientific applications. In VLDB, pages 535--546, 2006.

Digital Library

[37]

A. Silberstein, K. Munagala, A. Gelfand, G. Puggioni, and J. Yang. Suppression and failures in sensor networks: A Bayesian approach. VLDB, 2007.

Digital Library

[38]

J.C. Spall. Introduction to Stochastic Search and Optimization. Wiley-Interscience, 2003.

Digital Library

[39]

U. Srivastava, K. Munagala, and J. Widom. Operator placement for in-network stream query processing. In Proc. of the 2005 ACM Symp.\on Principles of Database Systems, 2005.

Digital Library

[40]

U. Srivastava, K. Munagala, J. Widom, and R. Motwani. Query optimization over web services. In VLDB, 2006.

Digital Library

[41]

R. Thonangi, V. Thummala, and S. Babu. Finding good configurations in high-dimensional spaces: Doing more with less. In Proc. MASCOTS, 2008.

[42]

P. Whittle. Restless bandits: Activity allocation in a changing world. Appl. Prob., 25(A):287--298, 1988.

[43]

M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. Proceedings of ICML, pages 928--936, 2003.

Index Terms

Large-scale uncertainty management systems: learning and exploiting your data
1. Information systems
  1. Data management systems
2. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis
      1. Scheduling algorithms
    2. Online algorithms
      1. Online learning algorithms
        Scheduling algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Sequential decision making

Recommendations

New developments in uncertainty assessment and uncertainty management

The paper presents a general method for doing predictions and test planning which can also be used as a tool for managing uncertainty. Uncertainty is generally defined as ''that which is not precisely known''. This definition permits the identification ...
Identifying New Directions in Database Performance Tuning

Database performance tuning is a complex and varied active research topic. With enterprise relational database management systems still reliant on the set-based relational concepts that defined early data management products, the disparity between the ...
Numerical approach for quantification of epistemic uncertainty

In the field of uncertainty quantification, uncertainty in the governing equations may assume two forms: aleatory uncertainty and epistemic uncertainty. Aleatory uncertainty can be characterised by known probability distributions whilst epistemic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

June 2009

1168 pages

ISBN:9781605585512

DOI:10.1145/1559845

Editors:
Carsten Binnig,
Benoit Dageville,
General Chairs:
Uğur Çetintemel
Brown University, USA
,
Stan Zdonik
Brown University, USA
,
Program Chair:
Donald Kossmann
ETH Zurich, Switzerland

Copyright © 2009 Copyright is held by the owner/author(s).

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2009

Check for updates

Author Tags

Qualifiers

Tutorial

Conference

SIGMOD/PODS '09

Sponsor:

SIGMOD/PODS '09: International Conference on Management of Data

June 29 - July 2, 2009

Rhode Island, Providence, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
572
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten