Skip to main content
Log in

ADCLUS and INDCLUS: analysis, experimentation, and meta-heuristic algorithm extensions

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

The ADCLUS and INDCLUS models, along with associated fitting techniques, can be used to extract an overlapping clustering structure from similarity data. In this paper, we examine the scalability of these models. We test the SINDLCUS algorithm and an adapted version of the SYMPRES algorithm on medium size datasets and try to infer their scalability and the degree of the local optima problem as the problem size increases. We describe several meta-heuristic approaches to minimizing the INDCLUS and ADCLUS loss functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Alimoglu F, Alpaydin E (1996) Methods of combining multiple classifiers based on different representations for pen-based handwriting recognition. In: Proceedings of the Fifth Turkish Artificial Intelligence and Artificial Neural Networks Symposium (TAINN 96). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.6383

  • Bakeman R (2005) Recommended effect size statistics for repeated measures designs. Behav Res Methods 37(3):379–384. doi:10.3758/BF03192707

    Article  Google Scholar 

  • Barthélemy JP, Brucker F (2001) NP-hard approximation problems in overlapping clustering. J Classif 18(2):159–183. doi:10.1007/s00357-001-0014-1

    MathSciNet  MATH  Google Scholar 

  • Brodley C (1990) Image segmentation data set. http://archive.ics.uci.edu/ml/datasets/Image+Segmentation

  • Brusco MJ (2001) A simulated annealing heuristic for unidimensional and multidimensional (city-block) scaling of symmetric proximity matrices. J Classif 18(1):3–33

    Article  MathSciNet  MATH  Google Scholar 

  • Carroll JD, Arabie P (1980) Multidimensional scaling. Ann Rev Psychol 31(1):607–649. doi:10.1146/annurev.ps.31.020180.003135, doi:10.1146/annurev.ps.31.020180.003135; M3: doi:10.1146/annurev.ps.31.020180.003135; 18

  • Carroll JD, Arabie P (1983) INDCLUS: an individual differences generalization of the ADCLUS model and the MAPCLUS algorithm. Psychometrika 48(2):157–169

    Article  Google Scholar 

  • Carroll JD, Chang JJ (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition. Psychometrika 35(3):283–319

    Article  MATH  Google Scholar 

  • Caruana R, Joachims T, Backstrom L (2004) KDD-cup 2004: results and analysis. SIGKDD Explor Newsl 6(2):95–108. http://doi.acm.org/10.1145/1046456.1046470

  • Ceulemans E, Van Mechelen I (2008) CLASSI: a classification model for the study of sequential processes and individual differences therein. Psychometrika 73(1):107–124. doi:10.1007/s11336-007-9024-1

    Article  MathSciNet  MATH  Google Scholar 

  • Ceulemans E, Van Mechelen I, Leenen I (2007) The local minima problem in hierarchical classes analysis: an evaluation of a simulated annealing algorithm and various multistart procedures. Psychometrika 72(3):377–391. doi:10.1007/s11336-007-9000-9

    Article  MathSciNet  MATH  Google Scholar 

  • Chaturvedi A, Carroll JD (1994) An alternating combinatorial optimization approach to fitting the INDCLUS and generalized INDCLUS models. J Classif 11(2):155–170

    Article  MATH  Google Scholar 

  • Chaturvedi A, Carroll JD (2001) Deriving market structures via additive decomposition of market shares (application of three-way generalized SINDCLUS). In: Presented at the DIMACS Workshop on Algorithms for Multidimensional Scaling, DIMACS Center, Rutgers University

  • Cook D (1997) Internet usage data data set. http://archive.ics.uci.edu/ml/datasets/Internet+Usage+Data

  • Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553

    Article  Google Scholar 

  • Deneke T, Haile H, Lafond S, Lilius J (2014) Video transcoding time prediction for proactive load balancing. In: 2014 IEEE International Conference on Multimedia and Expo (ICME), pp 1–6

  • Depril D, Van Mechelen I, Mirkin B (2009) Algorithms for additive clustering of rectangular data tables. Comput Stat Data Anal 52(11):4923–4938

    Article  MathSciNet  MATH  Google Scholar 

  • Fanty M, Cole R (1990) Spoken letter recognition, Morgan Kaufmann., San Mateo. In: Advances in Neural Information Processing Systems Vol 3, pp 220–226

  • Fisher ML (2004) The Lagrangian relaxation method for solving integer programs. Manag Sci 50(12):1861–1871

    Article  Google Scholar 

  • France SL, Abbasi A (2011) Boosting unsupervised additive clustering using cluster-wise optimization and multi-label learning. In: 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW) IEEE, Los Alamitos, CA, pp 236–243

  • Gill JL (1973) Current status of multiple comparisons of means in designed experiments. J Dairy Sci 56(8):973–977. http://www.sciencedirect.com/science/article/pii/S0022030273852919

  • Glover F (1989) Tabu search—part I. ORSA J Comput 1(3):190–206

    Article  MATH  Google Scholar 

  • Glover F (1990) Tabu search—part II. ORSA J Comput 2(1):4–32

    Article  MATH  Google Scholar 

  • Hamalainen W, Nykanen M (2008) Efficient discovery of statistically significant association rules. In: ICDM ’08. Eighth IEEE International Conference on Data Mining, IEEE Press, Los Alamitos pp 203–212

  • Hansen P, Meyer C (2014) A polynomial algorithm for a class of 01 fractional programming problems involving composite functions, with an application to additive clustering, clusters, orders, and trees: methods and applications, vol 92. Springer, New York

    Google Scholar 

  • Hersh W, Buckley C, Leone T, Hickam D (1994) OHSUMED: an Interactive Retrieval Evaluation and New Large Test Collection for Research. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, Springer Verlag, New York, pp 192–201

  • Horton P, Nakai K (1996) A probablistic classification system for predicting the cellular localization sites of proteins. AAAI, Menlo Park, pp 109–115. http://www.aaai.org/Papers/ISMB/1996/ISMB96-012.pdf

  • Kiers HAL (1997) A modification of the SINDCLUS algorithm for fitting the ADCLUS and INDCLUS models. J Classif 14(2):297–310

    Article  MATH  Google Scholar 

  • Lawrence MA (2015) Package ’ez’: Easy analysis and visualization of factorial experiments. https://cran.r-project.org/web/packages/ez/ez.pdf

  • Lee MD, Navarro DJ (2005) Minimum description length and psychological clustering models. In: Advances in Minimum Description Length Theory and Applications, Neural Information Processing Series, MIT Press, Cambridge, pp 355–384

  • Meek C, Thiesson B, Heckerman D (2002) The learning-curve sampling method applied to model-based clustering. J Mach Learn Res 2:397–418. doi:10.1162/153244302760200678

    MathSciNet  MATH  Google Scholar 

  • Nash WJ, Sellers TL, Talbot SR, Cawthorn AJ, Ford WB (1994) The population biology of abalone (haliotis species) in Tasmania. I. Blacklip abalone (h. rubra) from the North Coast and Islands of Bass Strait”, Sea Fisheries Division, Technical Report No. 48

  • Neslin SA, Sunil G, Kamakura WA, Lu J, Mason CH (2006) Defection detection: measuring and understanding the predictive accuracy of customer churn models. J Market Res 43(2):204–211

    Article  Google Scholar 

  • Rao RB, Yakhnenko O, Krishnapuram B (2008) KDD cup 2008 and the workshop on mining medical data. SIGKDD Explor 10(2):34–38. http://doi.acm.org/10.1145/1540276.1540288

  • Richardson JTE (2011) Eta squared and partial eta squared as measures of effect size in educational research. Educ Res Rev 6(2):135–147

    Article  Google Scholar 

  • van Rosmalen J, Groenen PJF, Trejos J, Castillo W (2009) Optimization strategies for two-mode partitioning. J Classif 26(2):155–181

    Article  MathSciNet  MATH  Google Scholar 

  • Ruml W (2002) Constructing distributed representations using additive clustering. In: Dietterich TG, Becker S, Ghahramani Z (eds) Proceedings of the 2001 Neural Information Processing Systems (NIPS) Conference. MIT Press, Boston

  • Schlimmer J (1987) Mushroom data set. http://archive.ics.uci.edu/ml/datasets/Mushroom

  • Shepard RN, Arabie P (1979) Additive clustering: representation of similarities as combinations of discrete overlapping properties. Psychol Rev 86(2):87–123

    Article  Google Scholar 

  • Strehl A, Ghosh J, Mooney R (2000) Impact of similarity measures on web-page clustering. In: Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search (AAAI 2000), 30–31 July 2000, Austin, Texas, AAAI, Cambridge, pp 58–64

  • Ten Berge JMF, Kiers HAL (2005) A comparison of two methods for fitting the INDCLUS model. J Classif 22(2):273–286

    Article  MathSciNet  MATH  Google Scholar 

  • Tenenbaum JB (1996) Learning the structure of similarity. In: Neural Information Processing Systems, vol 8, MIT Press, Cambridge, pp 59–65

  • Van Laarhoven PJM, Aarts EH (1987) Simulated annealing: theory and applications (mathematics and its applications), 1st edn. Kluwer, Dordrecht

    Book  MATH  Google Scholar 

  • Van Mechelen I, Bock HH, Boeck PD (2004) Two-mode clustering methods: a structured overview. Stat Methods Med Res 13(5):363–394

    Article  MathSciNet  MATH  Google Scholar 

  • Vera JF, Heiser WJ, Murillo A (2007) Global optimization in any Minkowski metric: a permutation-translation simulated annealing algorithm for multidimensional scaling. J Classif 24(2):277–301

    Article  MathSciNet  MATH  Google Scholar 

  • Voorhees EM (2008) TREC Text REtrieval Conference. http://trec.nist.gov

  • Wilderjans T, Ceulemans E, Van Mechelen I (2008) The CHIC model: a global model for coupled binary data. Psychometrika 73(4):729–751. doi:10.1007/s11336-008-9069-9

    Article  MathSciNet  MATH  Google Scholar 

  • Wilderjans TF, Ceulemans E, Van Mechelen I (2012a) The SIMCLAS model: simultaneous analysis of coupled binary data matrices with noise heterogeneity between and within data blocks. Psychometrika 77(4):724–740. doi:10.1007/s11336-012-9275-3

    Article  MathSciNet  MATH  Google Scholar 

  • Wilderjans TF, Depril D, Van Mechelen I (2012b) Block-relaxation approaches for fitting the INDCLUS model. J Classif 29(3):277–296. doi:10.1007/s00357-012-9113-4

    Article  MathSciNet  MATH  Google Scholar 

  • Wolberg WH, Mangasarian OL (1990) Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc Natl Acad Sci 87(23):9193–9196

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen L. France.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

France, S.L., Chen, W. & Deng, Y. ADCLUS and INDCLUS: analysis, experimentation, and meta-heuristic algorithm extensions. Adv Data Anal Classif 11, 371–393 (2017). https://doi.org/10.1007/s11634-016-0244-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-016-0244-z

Keywords

Mathematics Subject Classification

Navigation