Abstract
In this work we address a technique for effectively clustering points in specific convex sets, called homogeneous boxes, having sides aligned with the coordinate axes (isothetic condition). The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and, even if it was originally developed in the context of the logical analysis of data, it is now placed inside the framework of Supervised clustering. First, we introduce the basic concepts in box geometry; then, we consider a generalized clustering algorithm based on a class of graphs, called incompatibility graphs. For supervised classification problems, we consider classifiers based on box sets, and compare the overall performances to the accuracy levels of competing methods for a wide range of real data sets. The results show that the proposed method performs comparably with other supervised learning methods in terms of accuracy.





Similar content being viewed by others
References
Awasthi P, Zadeh RB (2010) Supervised clustering. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, Vancouver, British Columbia, Canada, 6–9 December 2010. Curran Associates, Inc., pp 91–99. http://papers.nips.cc/paper/4115-supervised-clustering.pdf
Bache K, Lichman M (2013) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Bárány I, Lehel J (1987) Covering with Euclidean boxes. Eur J Comb 8(2):113–119
Bereg S, Díaz-Bánez JM, Pérez-Lantero P, Ventura I (2011) The maximum box problem for moving points in the plane. J Comb Optim 22(4):517–530
Bertolazzi P, Felici G, Festa P, Lancia G (2008) Logic classification and feature selection for biomedical data. Comput Math Appl 55(5):889–899
Boros E (2010) Incompatibility graphs. In: Proceedings of workshop in graph theory and combinatorics, University of Illinois at Chicago (UIC)
Boros E, Hammer P, Ibaraki T, Kogan A (1997) Logical analysis of numerical data. Math Program 79:163–190
Boros E, Gurvich V, Liu Y (2005) Comparison of convex hulls and box hulls. Ars Comb 77
Boros E, Horiyama T, Ibaraki T, Makino K, Yagiura M (2000) Finding essential attributes in binary data. In: Leung K-S, Chan L-W, Meng H (eds) IDEAL, Springer, Lecture notes in computer science, vol. 1983, pp 133–138
Boros E, Menkov V (2004) Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis. Discrete Appl Math 144(1–2):43–58
Boros E, Spinelli V, Ricca F (2011) Incompatibility graphs and data mining. In: Proceedings of the 10th Cologne-Twente workshop on graphs and combinatorial optimization. Extended Abstracts, Frascati, Italy, June 14–16, 2011, pp 4–7
Carathéodory C (1911) Über den Variabilitätsbereich der Fourier’schen Konstanten von positiven harmonischen Funktionen. Rendiconti del Circolo Matematico di Palermo 32:193–217
Dobkin DP, Gunopulos D, Maass W (1996) Computing the maximum bichromatic discrepancy, with applications to computer graphics and machine learning. J Comput Syst Sci 52(3):453–470
Duchet P (1987) Convexity in combinatorial structures. In: Proceedings of the 14th Winter School on Abstract Analysis, Circolo Matematico di Palermo, pp 261–293
Eckstein J, Hammer P, Liu Y, Nediak M, Simeone B (2002) The maximum box problem and its application to data analysis. Comput Optim Appl 23(3):285–298
Eick CF, Zeidat N, Zhao Z (2004) Supervised clustering—algorithms and benefits. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence, IEEE Computer Society, ICTAI ’04, pp 774–776
Felici G, Simeone B, Spinelli V (2010) Classification techniques and error control in logic mining. In: Stahlbock R, Crone SF, Lessmann S (eds) Data mining, annals of information systems, vol 8. Springer, New York, pp 99–119
Gyárfás A, Lehel J (1983) Hypergraph families with bounded edge cover or transversal number. Combinatorica 3(3):351–358
Haldar C, Patnaik L (1992) On movable separability and isotheticity. Inf Sci 62(1–2):87–102
Hammer PL (2006) Optimization models for logical analysis of data. In: Proceedings of the workshop on data mining and mathematical programming. Centre de Recherches mathématiques Montréal, Québec, Canada, October 10–13, 2006
Hammer PL, Liu Y, Simeone B, Szedmák S (2004) Saturated systems of homogeneous boxes and the logical analysis of numerical data. Discrete Appl Math 144(1–2):103–109
Helly E (1923) Über Mengen konvexer Körper mit gemeinschaftlichen Punkte. Jahresbericht der Deutschen Mathematiker-Vereinigung 32:175–176
Kaneko A, Kano M (2003) Discrete geometry on red and blue points in the plane—a survey. In: Aronov B, Basu S, Pach J, Sharir M (eds) Discrete and computational geometry, Springer, pp 551–570
Kearns M, Schapire RE, Sellie LM (1994) Toward efficient agnostic learning. Mach Learn 17(2–3):115–141
Leighton F (1979) A graph coloring algorithm for large scheduling problems. J Res Natl Bureau Stand 84:489–503
Liu Y, Nediak M (2003) Planar case of the maximum box and related problems. In: CCCG 2003, 15th Canadian conference on computational geometry, pp 14–18
Maloof M (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 workshop on learning from imbalanced data sets
Maravalle M, Ricca F, Simeone B, Spinelli V (2014) Carpal tunnel syndrome automatic classification: electromyography vs. ultrasound imaging. TOP 23(1):100–123
Mitchell TM (1997) Machine learning, 1st edn. McGraw-Hill Inc., New York
Morris W, Soltan V (2000) The erdös-szekeres problem on points in convex position a survey. Bull Am Math Soc 37:437–458
Mugan J, Truemper K (2008) Mathematical methods for knowledge discovery and data mining, IGI Global, chap Discretization of rational data, pp 1–23
Noga A, Füredi Z, Katchalski M (1985) Separating pairs of points by standard boxes. Eur J Comb 6(3):205–210
Preparata FP, Shamos MI (1985) Computational geometry: an introduction. Springer-Verlag New York Inc., New York
Radon J (1921) Mengen konvexer Körper, die einen gemeinsamen Punkt enthalten. Mathematische Annalen 83(1–2):113–115
Serafini P (2014) Classifying negative and positive points by optimal box clustering. Discrete Appl Math 165:270–282
Simeone B, Felici G, Spinelli V (2007) A graph coloring approach for box clustering techniques in logic mining. In: Book of abstract of Euro XXII—22nd European conference on operational research, Euro XXII, p 193
Simeone B, Maravalle M, Ricca F, Spinelli V (2006) Logic mining of non-logic data: some extensions of box clustering. In: Proceedings of the Euro XXI, 21st European conference on operational research. Reykjavik, Iceland, July 2–5, 2006
Simeone B, Spinelli V (2007) The optimization problem framework for box clustering approach in logic mining. In: Book of abstract of Euro XXII—22nd European conference on operational research, Euro XXII, p 193
Weka (2013) Machine learning group—data mining software in java. University of Waikato. http://www.cs.waikato.ac.nz/ml/weka
Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco
Wu S, Flach P (2005) A scored auc metric for classifier evaluation and selection. In: Proceedings of the ICML 2005 workshop on ROC Analysis in Machine Learning, Bonn, Germany, 11 Aug, 2005
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the 3rd IEEE international conference on data mining, p 435
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Spinelli, V. Supervised box clustering. Adv Data Anal Classif 11, 179–204 (2017). https://doi.org/10.1007/s11634-016-0233-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-016-0233-2