How Can Computer Science Contribute to Knowledge Discovery

Watanabe, Osamu

doi:10.1007/3-540-45627-9_11

Osamu Watanabe⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2234))

Included in the following conference series:

International Conference on Current Trends in Theory and Practice of Computer Science

283 Accesses

Abstract

Knowledge discovery, that is, to analyze a given massive data set and derive or discover some knowledge from it, has been becoming a quite important subject in several fields including computer science. Good softwares have been demanded for various knowledge discovery tasks. For such softwares, we often need to develop efficient algorithms for handling huge data sets. Random sampling is one of the important algorithmic methods for processing huge data sets. In this paper, we explain some random sampling techniques for speeding up learning algorithms and making them applicable to large data sets [15], [16], [4], [3]. We also show some algorithms obtained by using these techniques.

A part of this work is supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research on Priority Areas (Discovery Science), 19982001.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

N. Abe and H. Mamitsuka, Query learning strategies using boosting and bagging, in Proc. the 15th Int’l Conf. on Machine Learning (ICML’00), 1–9, 1998.
Google Scholar
I. Adler and R. Shamir, A randomized scheme for speeding up algorithms for linear and convex programming with high constraints-to-variable ratio, Math. Programming 61, 39–52, 1993.
Article MathSciNet Google Scholar
J. Balcázar, Y. Dai, and O. Watanabe, Provably fast training algorithms for support vector machines, in Proc. the first IEEE Int’l Conf. on Data Mining, to appear.
Google Scholar
J. Balcázar, Y. Dai, and O. Watanabe, Random sampling techniques for training support vector machines: For primal-form maximal-margin classifiers, in Proc. the 12th Int’l Conf. on Algorithmic Learning Theory (ALT’01), to appear.
Google Scholar
K.P. Bennett and E.J. Bredensteiner, Duality and geometry in SVM classifiers, in Proc. the 17th Int’l Conf. on Machine Learning (ICML’2000), 57–64, 2000.
Google Scholar
P.S. Bradley, O.L. Mangasarian, and D.R. Musicant, Optimization methods in massive datasets, in Handbook of Massive Datasets (J. Abello, P.M. Pardalos, and M.G.C. Resende, eds.), Kluwer Academic Pub., to appear.
Google Scholar
L. Breiman, Pasting small votes for classification in large databases and on-line, Machine Learning 36, 85–103, 1999.
Article Google Scholar
K.L. Clarkson, Las Vegas algorithms for linear and integer programming, J.ACM 42, 488–499, 1995.
Article MATH MathSciNet Google Scholar
M. Collins, R.E. Schapire, and Y. Singer, Logistic regression, AdaBoost and Bregman Distance, in Proc. the 13th Annual Conf. on Comput. Learning Theory (COLT’00), 158–169, 2000.
Google Scholar
C. Cortes and V. Vapnik, Support-vector networks, Machine Learning 20, 273–297, 1995.
MATH Google Scholar
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge Univ. Press, 2000.
Google Scholar
P. Dagum, R. Karp, M. Luby, and S. Ross, An optimal algorithm for monte carlo estimation, SIAM J. Comput. 29(5), 1484–1496, 2000.
Article MATH MathSciNet Google Scholar
T.G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization, Machine Learning 32, 1–22, 1998.
Google Scholar
C. Domingo, R. Gavaldá, and O. Watanabe, Practical algorithms for on-line selection, in Proc. the first Intl. Conf. on Discovery Science (DS’98), Lecture Notes in AI 1532, 150–161, 1998.
Google Scholar
C. Domingo, R. Gavaldá, and O. Watanabe, Adaptive sampling methods for scaling up knowledge discovery algorithms, in Proc. the 2nd Intl. Conf. on Discovery Science (DS’99), Lecture Notes in AI, 172–183, 1999. (The final version will appear in J. Knowledge Discovery and Data Mining.)
Google Scholar
C. Domingo and O. Watanabe, MadaBoost: A modification of AdaBoost, in Proc. the 13th Annual Conf. on Comput. Learning Theory (COLT’00), 180–189, 2000.
Google Scholar
C. Domingo and O. Watanabe, Scaling up a boosting-based learner via adaptive sampling, in Proc. of Knowledge Discovery and Data Mining (PAKDD’00), Lecture Notes in AI 1805, 317–328, 2000.
Google Scholar
B. Gärtner and E. Welzl, A simple sampling lemma: Analysis and applications in geometric optimization, Discr. Comput. Geometry, to appear. (Also available from http://www.inf.ethz.ch/personal/gaertner/publications.html)
W. Feller, An Introduction to Probability Theory and its Applications (Third Edition), John Wiley & Sons, 1968.
Google Scholar
Y. Freund, Boosting a weak learning algorithm by majority, Information and Computation 121(2), 256–285, 1995.
Article MATH MathSciNet Google Scholar
Y. Freund, An adaptive version of the boost by majority algorithm, in Proc. the 12th Annual Conf. on Comput. Learning Theory (COLT’99), 102–113, 1999.
Google Scholar
J. Friedman, T. Hastie, and R. Tibshirani, Additive logistic regression: a statistical view of boosting, Technical Report, 1998.
Google Scholar
Y. Freund and R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci. 55(1), 119–139, 1997.
Article MATH MathSciNet Google Scholar
B.K. Ghosh and P.K. Sen eds., Handbook of Sequential Analysis, Marcel Dekker, 1991.
Google Scholar
R. Greiner, PALO: a probabilistic hill-climbing algorithm, Artificial Intelligence 84, 177–204, 1996.
Article MathSciNet Google Scholar
P. Haas and A. Swami, Sequential sampling, procedures for query size estimation, IBM Research Report RJ 9101(80915), 1992.
Google Scholar
M. Kearns, Efficient noise-tolerant learning from statistical queries, in Proc. the 25th Annual ACM Sympos. on Theory of Comput. (STOC’93), 392–401, 1993.
Google Scholar
R.J. Lipton, J.F. Naughton, D.A. Schneider, and S. Seshadri, Efficient sampling strategies for relational database operations, Theoret. Comput. Sci. 116, pp.195–226, 1993.
Article MATH MathSciNet Google Scholar
R.J. Lipton and J.F. Naughton, Query size estimation by adaptive sampling, J. Comput. and Syst. Sci. 51, 18–25, 1995.
Article MATH MathSciNet Google Scholar
J.F. Lynch, Analysis and application of adaptive sampling, in Proc. the 19th ACM Sympos. on Principles of Database Systems (PODS’99), 260–267, 1999.
Google Scholar
O. Maron and A. Moore, Hoeffding races: accelerating model selection search for classification and function approximation, in Proc. Advances in Neural Information Process. Systems (NIPS’94), 59–66, 1994.
Google Scholar
J. Platt, Fast training of support vector machines using sequential minimal optimization, in Advances in Kernel Methods — Support Vector Learning (B. Scholkopf, C.J.C. Burges, and A.J. Smola, eds.), MIT Press, 185–208, 1999.
Google Scholar
R.E. Schapire, The strength of weak learnability, Machine Learning 5(2), 197–227, 1990.
Google Scholar
T. Scheffer and S. Wrobel, A sequential sampling algorithm for a general class of utility criteria, in Proc. the 6th ACM Intl. Conf. on Knowledge Discovery and Data Mining (KDD’00), 2000.
Google Scholar
A.J. Smola and B. Scholkopf, A tutorial on support vector regression, NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, Univ. London, 1998.
Google Scholar
A. Wald, Sequential Analysis, John Wiley & Sons, 1947.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Mathematical and Computing Sciences, Tokyo Institute of Technology, Japan
Osamu Watanabe

Authors

Osamu Watanabe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, University of Wroclaw, Przesmyckiego 20, 51-151, Wroclaw, Poland
Leszek Pacholski
Fac. of Mathematics, Physics, and Informatics, Inst. of Informatics, Comenius Univ., Mlynská dolina, 84248, Bratislava, Slovakia
Peter Ružička

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Watanabe, O. (2001). How Can Computer Science Contribute to Knowledge Discovery. In: Pacholski, L., Ružička, P. (eds) SOFSEM 2001: Theory and Practice of Informatics. SOFSEM 2001. Lecture Notes in Computer Science, vol 2234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45627-9_11

Download citation

DOI: https://doi.org/10.1007/3-540-45627-9_11
Published: 24 October 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42912-8
Online ISBN: 978-3-540-45627-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics