Abstract
Mining group differences is useful in many applications, such as medical research, social network analysis and link discovery. The differences between groups can be measured from either statistical or data mining perspective. In this paper, we propose an empirical likelihood (EL) based strategy of building confidence intervals for the mean and distribution differences between two contrasting groups. In our approach we take into account the structure (semi-parametric) of groups, and experimentally evaluate the proposed approach using both simulated and real-world data. The results demonstrate that our approach is effective in building confidence intervals for group differences such as mean and distribution function.
This work is partially supported by Australian large ARC grants (DP0449535 and DP0559536), a China NSF major research Program (60496327), a China NSF grant (60463003), a National Basic Research Program of China (2004CB318103), and a National Science Foundation of China (60033020).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Au, W., Chan, K.: Mining changes in association rules: a fuzzy approach. Fuzzy Sets and Systems 149(1), 87–104 (2005)
Bay, S., Pazzani, M.: Detecting Change in Categorical Data: Mining Contrast Sets. In: KDD 1999, pp. 302–306 (1999)
Bay, S., Pazzani, M.: Characterizing Model Erros and Differences. In: ICML 2000, pp. 49–56 (2000)
Bay, S., Pazzani, M.: Detecting Group Differences: Mining Contrast Sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)
Blake, C., Merz, C.: UCI Repository of machine learning database (1998), http://www.ics.uci.edu/~mlearn/
Cho, Y.B., Cho, Y.H., Kim, S.: Mining changes in customer buying behavior for collaborative recommendations. Expert Systems with Applications 28(2), 359–369 (2005)
Cong, G., Liu, B.: Speed-up Iterative Frequent Itemset Mining with Constraint Changes. In: ICDM 2002, pp. 107–114 (2002)
Adibi, J., Cohen, P., Morrison, C.: Measuring Confidence Intervals in Link Discovery: A Bootstrap Approach. In: KDD 2004 (2004)
Hall, P., Martin, M.: On the bootstrap and two-sample problems. Austral. J. Statist 30A, 179–192 (1988)
Li, H.F., Lee, S.Y., Shan, M.K.: Online Mining Changes of Items over Continuous Append-only and Dynamic Data Streams. The Journal of Universal Computer Science 11(8), 1411–1425 (2005)
Little, R., Rubin, D.: Statistical analysis with missing data, 2nd edn. John Wiley & Sons, New York (2002)
Liu, B., Hsu, W., Han, H., Xia, Y.: Mining Changes for Real-Life Applications. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 337–346. Springer, Heidelberg (2000)
Qin, J.: Semi-empirical likelihood ratio confidence intervals for the difference of two sample mean. Ann. Inst. Statist. Math. 46, 117–126 (1994)
Qin, J., lawless, J.: Empirical likelihood and general estimating equations. Ann. Statists. 22, 300–325 (1994)
Rao, J.: On variance estimation with imputed survey data. J. Amer. Statist. Assoc. 91, 499–520 (1996)
Wang, K., Zhou, S., Fu, A., Yu, X.: Mining Changes of Classification by Correspondence Tracing. In: SIAMDM 2003 (2003)
Webb, G., Butler, S., Newlands, D.: On detecting differences between groups. In: KDD 2003, pp. 256–265 (2003)
Ying, A., Murphy, G., Raymond, T., Mark, C.: Predicting Source Code Changes by Mining Change History. IEEE Trans. Software Eng. 30(9), 574–586 (2004)
Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extraction for breast tumour diagnosis. In: IS&T/SPIE 1993, San Jose, CA, vol. 1905, pp. 861–870 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, Hj., Qin, Y., Zhu, X., Zhang, J., Zhang, S. (2006). Difference Detection Between Two Contrast Sets. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2006. Lecture Notes in Computer Science, vol 4081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823728_46
Download citation
DOI: https://doi.org/10.1007/11823728_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37736-8
Online ISBN: 978-3-540-37737-5
eBook Packages: Computer ScienceComputer Science (R0)