Skip to main content
Log in

Supervised box clustering

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In this work we address a technique for effectively clustering points in specific convex sets, called homogeneous boxes, having sides aligned with the coordinate axes (isothetic condition). The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and, even if it was originally developed in the context of the logical analysis of data, it is now placed inside the framework of Supervised clustering. First, we introduce the basic concepts in box geometry; then, we consider a generalized clustering algorithm based on a class of graphs, called incompatibility graphs. For supervised classification problems, we consider classifiers based on box sets, and compare the overall performances to the accuracy levels of competing methods for a wide range of real data sets. The results show that the proposed method performs comparably with other supervised learning methods in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Awasthi P, Zadeh RB (2010) Supervised clustering. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, Vancouver, British Columbia, Canada, 6–9 December 2010. Curran Associates, Inc., pp 91–99. http://papers.nips.cc/paper/4115-supervised-clustering.pdf

  • Bache K, Lichman M (2013) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Bárány I, Lehel J (1987) Covering with Euclidean boxes. Eur J Comb 8(2):113–119

    Article  MathSciNet  MATH  Google Scholar 

  • Bereg S, Díaz-Bánez JM, Pérez-Lantero P, Ventura I (2011) The maximum box problem for moving points in the plane. J Comb Optim 22(4):517–530

    Article  MathSciNet  MATH  Google Scholar 

  • Bertolazzi P, Felici G, Festa P, Lancia G (2008) Logic classification and feature selection for biomedical data. Comput Math Appl 55(5):889–899

    Article  MathSciNet  MATH  Google Scholar 

  • Boros E (2010) Incompatibility graphs. In: Proceedings of workshop in graph theory and combinatorics, University of Illinois at Chicago (UIC)

  • Boros E, Hammer P, Ibaraki T, Kogan A (1997) Logical analysis of numerical data. Math Program 79:163–190

    MathSciNet  MATH  Google Scholar 

  • Boros E, Gurvich V, Liu Y (2005) Comparison of convex hulls and box hulls. Ars Comb 77

  • Boros E, Horiyama T, Ibaraki T, Makino K, Yagiura M (2000) Finding essential attributes in binary data. In: Leung K-S, Chan L-W, Meng H (eds) IDEAL, Springer, Lecture notes in computer science, vol. 1983, pp 133–138

  • Boros E, Menkov V (2004) Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis. Discrete Appl Math 144(1–2):43–58

    Article  MathSciNet  MATH  Google Scholar 

  • Boros E, Spinelli V, Ricca F (2011) Incompatibility graphs and data mining. In: Proceedings of the 10th Cologne-Twente workshop on graphs and combinatorial optimization. Extended Abstracts, Frascati, Italy, June 14–16, 2011, pp 4–7

  • Carathéodory C (1911) Über den Variabilitätsbereich der Fourier’schen Konstanten von positiven harmonischen Funktionen. Rendiconti del Circolo Matematico di Palermo 32:193–217

    Article  MATH  Google Scholar 

  • Dobkin DP, Gunopulos D, Maass W (1996) Computing the maximum bichromatic discrepancy, with applications to computer graphics and machine learning. J Comput Syst Sci 52(3):453–470

    Article  MathSciNet  MATH  Google Scholar 

  • Duchet P (1987) Convexity in combinatorial structures. In: Proceedings of the 14th Winter School on Abstract Analysis, Circolo Matematico di Palermo, pp 261–293

  • Eckstein J, Hammer P, Liu Y, Nediak M, Simeone B (2002) The maximum box problem and its application to data analysis. Comput Optim Appl 23(3):285–298

    Article  MathSciNet  MATH  Google Scholar 

  • Eick CF, Zeidat N, Zhao Z (2004) Supervised clustering—algorithms and benefits. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence, IEEE Computer Society, ICTAI ’04, pp 774–776

  • Felici G, Simeone B, Spinelli V (2010) Classification techniques and error control in logic mining. In: Stahlbock R, Crone SF, Lessmann S (eds) Data mining, annals of information systems, vol 8. Springer, New York, pp 99–119

    Google Scholar 

  • Gyárfás A, Lehel J (1983) Hypergraph families with bounded edge cover or transversal number. Combinatorica 3(3):351–358

    Article  MathSciNet  MATH  Google Scholar 

  • Haldar C, Patnaik L (1992) On movable separability and isotheticity. Inf Sci 62(1–2):87–102

    Article  MathSciNet  MATH  Google Scholar 

  • Hammer PL (2006) Optimization models for logical analysis of data. In: Proceedings of the workshop on data mining and mathematical programming. Centre de Recherches mathématiques Montréal, Québec, Canada, October 10–13, 2006

  • Hammer PL, Liu Y, Simeone B, Szedmák S (2004) Saturated systems of homogeneous boxes and the logical analysis of numerical data. Discrete Appl Math 144(1–2):103–109

    Article  MathSciNet  MATH  Google Scholar 

  • Helly E (1923) Über Mengen konvexer Körper mit gemeinschaftlichen Punkte. Jahresbericht der Deutschen Mathematiker-Vereinigung 32:175–176

    MATH  Google Scholar 

  • Kaneko A, Kano M (2003) Discrete geometry on red and blue points in the plane—a survey. In: Aronov B, Basu S, Pach J, Sharir M (eds) Discrete and computational geometry, Springer, pp 551–570

  • Kearns M, Schapire RE, Sellie LM (1994) Toward efficient agnostic learning. Mach Learn 17(2–3):115–141

    MATH  Google Scholar 

  • Leighton F (1979) A graph coloring algorithm for large scheduling problems. J Res Natl Bureau Stand 84:489–503

    Article  MathSciNet  MATH  Google Scholar 

  • Liu Y, Nediak M (2003) Planar case of the maximum box and related problems. In: CCCG 2003, 15th Canadian conference on computational geometry, pp 14–18

  • Maloof M (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 workshop on learning from imbalanced data sets

  • Maravalle M, Ricca F, Simeone B, Spinelli V (2014) Carpal tunnel syndrome automatic classification: electromyography vs. ultrasound imaging. TOP 23(1):100–123

    Article  MathSciNet  MATH  Google Scholar 

  • Mitchell TM (1997) Machine learning, 1st edn. McGraw-Hill Inc., New York

    MATH  Google Scholar 

  • Morris W, Soltan V (2000) The erdös-szekeres problem on points in convex position a survey. Bull Am Math Soc 37:437–458

    Article  MATH  Google Scholar 

  • Mugan J, Truemper K (2008) Mathematical methods for knowledge discovery and data mining, IGI Global, chap Discretization of rational data, pp 1–23

  • Noga A, Füredi Z, Katchalski M (1985) Separating pairs of points by standard boxes. Eur J Comb 6(3):205–210

    Article  MathSciNet  MATH  Google Scholar 

  • Preparata FP, Shamos MI (1985) Computational geometry: an introduction. Springer-Verlag New York Inc., New York

    Book  MATH  Google Scholar 

  • Radon J (1921) Mengen konvexer Körper, die einen gemeinsamen Punkt enthalten. Mathematische Annalen 83(1–2):113–115

    Article  MathSciNet  MATH  Google Scholar 

  • Serafini P (2014) Classifying negative and positive points by optimal box clustering. Discrete Appl Math 165:270–282

    Article  MathSciNet  MATH  Google Scholar 

  • Simeone B, Felici G, Spinelli V (2007) A graph coloring approach for box clustering techniques in logic mining. In: Book of abstract of Euro XXII—22nd European conference on operational research, Euro XXII, p 193

  • Simeone B, Maravalle M, Ricca F, Spinelli V (2006) Logic mining of non-logic data: some extensions of box clustering. In: Proceedings of the Euro XXI, 21st European conference on operational research. Reykjavik, Iceland, July 2–5, 2006

  • Simeone B, Spinelli V (2007) The optimization problem framework for box clustering approach in logic mining. In: Book of abstract of Euro XXII—22nd European conference on operational research, Euro XXII, p 193

  • Weka (2013) Machine learning group—data mining software in java. University of Waikato. http://www.cs.waikato.ac.nz/ml/weka

  • Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Wu S, Flach P (2005) A scored auc metric for classifier evaluation and selection. In: Proceedings of the ICML 2005 workshop on ROC Analysis in Machine Learning, Bonn, Germany, 11 Aug, 2005

  • Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the 3rd IEEE international conference on data mining, p 435

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincenzo Spinelli.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Spinelli, V. Supervised box clustering. Adv Data Anal Classif 11, 179–204 (2017). https://doi.org/10.1007/s11634-016-0233-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-016-0233-2

Keywords

Mathematics Subject Classification