Abstract
Research in the MIDAS project at Stanford explores new ideas in data-mining. One early result was a new algorithm for Web search, that resulted in a recently turned commercial search engine, called Google.
A second area of interest is in generalizing the techniques such as “a-priori,” which were developed by Rakesh Agrawal and his associates at IBM Research in Almaden to allow “market-basket analysis,” or “association-rule mining.” The latter problem deals with finding items that customers frequently buy together. We have developed a framework called “query flocks.” In this system, we can phrase highly complex data-mining queries, including many that are not handled well by commercial SQL systems.We then compile the “query flock” into a sequence of SQL queries that are simple enough to be optimized by commercial systems.
A third interesting challenge is summarizing the knowledge of the Web in a form that resembles conven- tional relational data. We describe some experiments that have been carried out to exploit the redundancy of the Web and discover the patterns in which facts of a certain kind tend to exist.
Finally, we shall talk about extending the techniques for association-rule mining to extract relationships that are not based on “high support,” i.e., sets of items that appear very frequently in market baskets. Important example include intelligence-gathering, where we want to find terms that are highly correlated in documents, but that do not appear in very many documents. The MIDAS group has recently developed some techniques to process very large amounts of data and detect efficiently items that are highly correlated but not very frequent. We can even find implications, similar to causal relationships, without requiring high support for the associated items.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ullman, J.D. (1999). Some Advances in Data-Mining Techniques. In: Pinter, R.Y., Tsur, S. (eds) Next Generation Information Technologies and Systems. NGITS 1999. Lecture Notes in Computer Science, vol 1649. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48521-X_1
Download citation
DOI: https://doi.org/10.1007/3-540-48521-X_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66225-9
Online ISBN: 978-3-540-48521-6
eBook Packages: Springer Book Archive