Skip to main content

Difference Detection Between Two Contrast Sets

  • Conference paper
Book cover Data Warehousing and Knowledge Discovery (DaWaK 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4081))

Included in the following conference series:

Abstract

Mining group differences is useful in many applications, such as medical research, social network analysis and link discovery. The differences between groups can be measured from either statistical or data mining perspective. In this paper, we propose an empirical likelihood (EL) based strategy of building confidence intervals for the mean and distribution differences between two contrasting groups. In our approach we take into account the structure (semi-parametric) of groups, and experimentally evaluate the proposed approach using both simulated and real-world data. The results demonstrate that our approach is effective in building confidence intervals for group differences such as mean and distribution function.

This work is partially supported by Australian large ARC grants (DP0449535 and DP0559536), a China NSF major research Program (60496327), a China NSF grant (60463003), a National Basic Research Program of China (2004CB318103), and a National Science Foundation of China (60033020).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Au, W., Chan, K.: Mining changes in association rules: a fuzzy approach. Fuzzy Sets and Systems 149(1), 87–104 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  2. Bay, S., Pazzani, M.: Detecting Change in Categorical Data: Mining Contrast Sets. In: KDD 1999, pp. 302–306 (1999)

    Google Scholar 

  3. Bay, S., Pazzani, M.: Characterizing Model Erros and Differences. In: ICML 2000, pp. 49–56 (2000)

    Google Scholar 

  4. Bay, S., Pazzani, M.: Detecting Group Differences: Mining Contrast Sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)

    Article  MATH  Google Scholar 

  5. Blake, C., Merz, C.: UCI Repository of machine learning database (1998), http://www.ics.uci.edu/~mlearn/

  6. Cho, Y.B., Cho, Y.H., Kim, S.: Mining changes in customer buying behavior for collaborative recommendations. Expert Systems with Applications 28(2), 359–369 (2005)

    Article  Google Scholar 

  7. Cong, G., Liu, B.: Speed-up Iterative Frequent Itemset Mining with Constraint Changes. In: ICDM 2002, pp. 107–114 (2002)

    Google Scholar 

  8. Adibi, J., Cohen, P., Morrison, C.: Measuring Confidence Intervals in Link Discovery: A Bootstrap Approach. In: KDD 2004 (2004)

    Google Scholar 

  9. Hall, P., Martin, M.: On the bootstrap and two-sample problems. Austral. J. Statist 30A, 179–192 (1988)

    Article  Google Scholar 

  10. Li, H.F., Lee, S.Y., Shan, M.K.: Online Mining Changes of Items over Continuous Append-only and Dynamic Data Streams. The Journal of Universal Computer Science 11(8), 1411–1425 (2005)

    Google Scholar 

  11. Little, R., Rubin, D.: Statistical analysis with missing data, 2nd edn. John Wiley & Sons, New York (2002)

    MATH  Google Scholar 

  12. Liu, B., Hsu, W., Han, H., Xia, Y.: Mining Changes for Real-Life Applications. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 337–346. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  13. Qin, J.: Semi-empirical likelihood ratio confidence intervals for the difference of two sample mean. Ann. Inst. Statist. Math. 46, 117–126 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  14. Qin, J., lawless, J.: Empirical likelihood and general estimating equations. Ann. Statists. 22, 300–325 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  15. Rao, J.: On variance estimation with imputed survey data. J. Amer. Statist. Assoc. 91, 499–520 (1996)

    Article  MATH  Google Scholar 

  16. Wang, K., Zhou, S., Fu, A., Yu, X.: Mining Changes of Classification by Correspondence Tracing. In: SIAMDM 2003 (2003)

    Google Scholar 

  17. Webb, G., Butler, S., Newlands, D.: On detecting differences between groups. In: KDD 2003, pp. 256–265 (2003)

    Google Scholar 

  18. Ying, A., Murphy, G., Raymond, T., Mark, C.: Predicting Source Code Changes by Mining Change History. IEEE Trans. Software Eng. 30(9), 574–586 (2004)

    Article  Google Scholar 

  19. Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extraction for breast tumour diagnosis. In: IS&T/SPIE 1993, San Jose, CA, vol. 1905, pp. 861–870 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, Hj., Qin, Y., Zhu, X., Zhang, J., Zhang, S. (2006). Difference Detection Between Two Contrast Sets. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2006. Lecture Notes in Computer Science, vol 4081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823728_46

Download citation

  • DOI: https://doi.org/10.1007/11823728_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37736-8

  • Online ISBN: 978-3-540-37737-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics