Abstract
In this work we are interested in the problem of mining very large distributed databases. We propose a distributed data mining technique which produces a meta-classifier that is both predictive and descriptive. This meta-classifier is made of a set of classification rules, which can be refined then validated. The refinement step, proposes to remove from the meta-classifier rules that according to their confidence coefficient, computed by statistical means, would not have a good prediction capability when used with new objects. The validation step uses some samples to fine-tune rules in the rule set resulted from the refinement step. This paper deals especially with the validation process. Indeed, we propose two validation techniques: the first one is very simple and the second one uses a Galois lattice. A detailed description of these processes is presented in the paper, as well as the experimentation proving the viability of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)
Tsoumakas, G., Vlahavas, I.: Distributed Data Mining of Large Classifier Eensembles. In: Vlahavas, I., Spyropoulos, C. (eds.) Proceedings Companion Volume of the Second Hellenic Conference on Artificial Intelligence, Thessaloniki, Greece, pp. 249–256 (April 2002)
Prodromidis, A.L., Chan, P.K., Stolfo, S.J.: Meta-learning in distributed data mining systems: Issues and approaches. In: Kargupta, H., Chan, P. (eds.) Advances in Distributed and Parallel Knowledge Discovery, ch. 3, part II, pp. 81–113. AAAI Press MIT Press, Menlo Park, Cambridge (2000)
Fayyad, U.M., Djorgovski, S.G., Weir, N.: Automating the analysis and cataloging of sky surveys. In: Advances in Knowledge Discovery and Data Mining, pp. 471–493. AAAI Press/The MIT Press, Menlo Park, California (1996)
Sikora, R., Shaw, M.: A Computational Study of Distributed Rule Learning. Information Systems Research 7(2), 189–197 (1996)
Wüthrich, B.: Probabilistic knowledge bases. IEEE Transactions on Knowledge and Data Engineering 7(5), 691–698 (1995)
Williams, G.J.: Inducing and Combining Decision Structures for Expert Systems. PhD thesis, The Australian National University (January 1990)
Hall, O.L., Chawla, N., Bowyer, W.K.: Decision tree learning on very large data sets. In: IEEE International Conference on Systems, Man, and Cybernetics (october 1998), vol. 3, pp. 2579–2584 (1998)
Provost, F.J., Hennessy, D.N.: Scaling up: Distributed machine learning with cooperation. In: Thirteenth National Conference on Artificial Intelligence (AAAI-1996), pp. 74–79 (1996)
Hall, O.L., Chawla, N., Bowyer, W.K.: Learning rules from distributed data. In: Workshop on Large-Scale Parallel KDD Systems (KDD99). Also in RPI, CS Dep. Tech. Report 99-8, 77–83 (1999)
Aounallah, M., Mineau, G.: Rule confidence produced from disjoint databases: a statistically sound way to regroup rules sets. In: IADIS international conference, Applied Computing 2004, Lisbon, Portugal, vol. 31, pp. II–27–II31 (2004)
Aounallah, M., Mineau, G.: Le forage distribué des données: une méthode simple, rapide et efficace. Revue des Nouvelles Technologies de l’Information, extraction et gestion des connaissances RNTI-E-6(1), 95–106 (2006)
Aounallah, M., Mineau, G.: Distributed Data Mining: Why Do More Than Aggregating Models. In: Twentieth International Join Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India (January 2007)
Aounallah, M., Quirion, S., Mineau, G.: Forage distribué des données : une comparaison entre l’agrégation d’échantillons et l’agrégation de règles. Revue des Nouvelles Technologies de l’Information, extraction et gestion des connaissances RNTI-E-3(1), 43–54 (2005)
Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999)
Mineau, G.W., Aounallah, M., Quirion, S.: Distributed Data Mining vs. In: Tawfik, A.Y., Goodwin, S.D. (eds.) Canadian AI 2004. LNCS (LNAI), vol. 3060, pp. 454–460. Springer, Heidelberg (2004)
Mangasarian, O.L., Wolberg, W.H.: Cancer diagnosis via linear programming. SIAM News 23(5), 1–18 (1990)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan, J.R.: Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)
Valtchev, P., Missaoui, R., Lebrun, P.: A partition-based approach towards building Galois (concept) lattices. Technical Report 2000-08, Département d’Informatique, UQAM, Montréal (CA) (August 2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aoun-Allah, M., Mineau, G. (2008). Rule Validation of a Meta-classifier Through a Galois (Concept) Lattice and Complementary Means. In: Yahia, S.B., Nguifo, E.M., Belohlavek, R. (eds) Concept Lattices and Their Applications. CLA 2006. Lecture Notes in Computer Science(), vol 4923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78921-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-78921-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78920-8
Online ISBN: 978-3-540-78921-5
eBook Packages: Computer ScienceComputer Science (R0)