ABSTRACT
We introduce a new data mining problem: mining truth tables in binary datasets. Given a matrix of objects and the properties they satisfy, a truth table identifies a subset of properties that exhibit maximal variability (and hence, complete independence) in occurrence patterns over the underlying objects. This problem is relevant in many domains, e.g., in bioinformatics where we seek to identify and model independent components of combinatorial regulatory pathways, and in social/economic demographics where we desire to determine independent behavioral attributes of populations. We outline a family of levelwise approaches adapted to mining truth tables, algorithmic optimizations, and applications to bioinformatics and political datasets.
- A. Gionis et al. Geometric and Combinatorial Tiles in 0--1 Data. In PKDD'04, pages 173--184, 2004. Google ScholarDigital Library
- C. Owens et al. Capturing truthiness: Mining truth tables in binary datasets. Technical report, Virginia Tech, March 2007. http://eprints.cs.vt.edu/archive/00000948/.Google Scholar
- C. Silverstein et al. Beyond Market Baskets: Generalizing Association Rules to Dependence Rules. Data Mining and Knowledge Discovery, Vol. 2(1):pages 39--68, 1998. Google ScholarDigital Library
- D. Lin et al. Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set. IEEE TKDE, Vol. 14(3): 553--566, 2002. Google ScholarDigital Library
- D. Pavlov et al. Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data. IEEE TKDE, Vol. 15(6):pages 149--1421, 2003. Google ScholarDigital Library
- F. Geerts et al. Tight Upper Bounds on the Number of Candidate Patterns. ACM Transactions on Database Systems, Vol. 30(2):pages 333--363, June 2005. Google ScholarDigital Library
- H. Xiong et al. TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases. IEEE TKDE, Vol. 18(4):pages 493--508, 2006. Google ScholarDigital Library
- J. Fitzgerald et al. Systems Biology and Combination Therapy in the Quest for Clinical Efficacy. Nature Chemical Biology, Vol. 2(9): 458--466, Sep 2006.Google ScholarCross Ref
- J. Seppanen et al. Dense Itemsets. In KDD'04, pages 683--688, Aug 2004. Google ScholarDigital Library
- J. K. Seppanen et al. Using and Extending Itemsets in Data Mining. PhD thesis, Helsinki University of Technology, 2006.Google Scholar
- J. L. Tuegels et al. Generalized Graphical Models for Discrete Data. Statistics and Probability Letters, Vol. 38: 41--47, May 1998.Google ScholarCross Ref
- K. Goldberg et al. Eigentaste: A Constant Time Collaborative Filtering Algorithm. Information Retrieval, Vol. 4(2):pages 133--151, July 2001. Google ScholarDigital Library
- L. O. Barrera et al. The transcriptional regulatory code of eukaryotic cells-insights from genome-wide analysis of chromatin organization and transcription factor binding. Curr Opin Cell Biol, 18(3): 291--8, 2006.Google ScholarCross Ref
- M. Natarajan et al. A Global Analysis of Cross-talk in a Mammalian Cellular Signaling Network. Nature Cell Biology, Vol. 8(6): 571--580, June 2006.Google ScholarCross Ref
- M. J. Zaki et al. Reasoning about Sets using Redescription Mining. In KDD'05, pages 364--373, Aug 2005. Google ScholarDigital Library
- N. Ramakrishnan et al. Turning CARTwheels: An Alternating Algorithm for Mining Redescriptions. In KDD'04, pages 266--275, Aug 2004. Google ScholarDigital Library
- N. Tatti et al. What is the Dimension of your Binary Data? In ICDM'06, pages 603--612, 2006. Google ScholarDigital Library
- R. Agrawal et al. Fast Algorithms for Mining Association Rules in Large Databases. In VLDB'94, pages 487--499, Sep 1994. Google ScholarDigital Library
- S. C. Madeira et al. Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM TCBB, Vol. 1(1): 24--45, Jan 2004. Google ScholarDigital Library
- T. Calders et al. Mining all non-derivable frequent itemsets. In PKDD'02, pages 74--85, London, UK, 2002. Springer-Verlag. Google ScholarDigital Library
- T. Lee et al. Transcriptional Regulatory Networks in Saccharomyces cerevisiae. Science, 298(5594): 799--804, 2002.Google ScholarCross Ref
- Truthiness. Wikipedia. http://en.wikipedia.org/wiki/Truthiness.Google Scholar
Index Terms
- Capturing truthiness: mining truth tables in binary datasets
Recommendations
Exploiting independencies to compute semigraphoid and graphoid structures
We deal with conditional independencies, which have a fundamental role in probability and multivariate statistics. The structure of probabilistic independencies is described by semigraphoids or, for strictly positive probabilities, by graphoids. In this ...
Unravelling the Hidden Truth within Logical Statements: A Computer Tool
ICCSA '13: Proceedings of the 2013 13th International Conference on Computational Science and Its ApplicationsIn this paper a new Mathematica package for creating truth tables for logical reasoning and deduction is introduced. By using several simple examples, the paper shows the different options the user can choose to modify the visualappearance of the tables ...
Truthiness: Challenges Associated with Employing Machine Learning on Neurophysiological Sensor Data
Proceedings, Part I, 10th International Conference on Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience - Volume 9743The use of neurophysiological sensors in HCI research is increasing in use and sophistication, largely because such sensors offer the potential benefit of providing "ground truth" in studies, and also because they are expected to underpin future ...
Comments