Hybrid heuristics based on column generation with path-relinking for clustering problems
Introduction
The clustering problem is defined as the process of separating a set of objects into groups such that members of a group are similar to each other (Lorena & Furtado, 2001). The determination of similarity between individuals depends on the metric defined for the distance between objects. External measures are used to measure the similarity between sets, such as the Rand (Rand, 1971) and correct adjusted Rand (CRand) indexes (Hubert & Arabie, 1985).
The difficulty in solving the clustering problem comes from how to identify close objects and develop methods that fulfill the function of grouping them. This is not an easy task due to the great difficulty in finding good solutions, i.e., classify objects efficiently. The classification of data can be supervised or unsupervised (Abbasi & Younis, 2007). In supervised analysis the method is trained with known patterns of data to use it in new samples. In the unsupervised analysis, the algorithm seeks to find the data structures that allow the separation into groups without prior knowledge of the standards. Although methods of unsupervised classification get less accurate results than supervised methods, they are more suitable when no prior information about the groups is available.
The clustering problem examined in this paper is considered unsupervised, i.e., concerned with the grouping of related objects without information from class or label (Nascimento & De Carvalho, 2011). The clustering problem has been applied in a wide variety of research areas such as machine learning, artificial intelligence, pattern recognition, spatial data mining, image segmentation, genetics, microbiology, geology, remote sensing, among others (Xu and Wunsch, 2005, Jun et al., 2014). Authors have applied metaheuristics for resolution, such as greedy randomized adaptive search procedure (Nascimento, Toledo, & de Leon Ferreira de Carvalho, 2010) and genetic algorithms (Agusti’n-Blas et al., 2012).
This work proposes five hybrid heuristics to solve clustering problems. All of them are based on the application of a column generation technique for solving p-median problems (Senne, Lorena, & Pereira, 2007). The five approaches are: a solution made feasible from the master problem, the column generation solution, a heuristic with path-relinking considering the initial columns of the column generation procedure, a solution of the master problem with path-relinking and the column generation process with path-relinking. All clustering solutions are tested with the external measure CRand and the computational results compared to recent methods in literature.
The paper is organized as follows. Section 2 gives a brief literature review on the clustering problem. Section 3 presents an overview of column generation for p-median problems. Section 4 describes the hybrid heuristics for solving the clustering problem. Section 5 presents the data, the distances and correlations used to calculate the dissimilarity between samples, and the computational results. In Section 6 some conclusions are mentioned.
Section snippets
Abbreviated literature review
The clustering problem has been extensively studied. Rand (1971) proposes patterns that isolate aspects of performance of a method, such as, return, sensitivity and stability. These patterns depend on a similarity measure between two different clusters of the same set of data; the measure should essentially consider how each data point pair is assigned in each set.
Handl, Knowles, and Kell (2005) show the large amount of techniques available for validation of results obtained for the problem,
Column generation for p-median problems
The heuristics to solve clustering problems start from a set of data and without any information of standards build groups that have similar characteristics among its objects. The groups are obtained by a column generation technique proposed to solve p-median problems. P-median solutions minimize the sum of distances between nodes and their nearest facility (median) and it is expected that the characteristics between nodes that are allocated to the same facility are similar. Thus, a feasible
The hybrid heuristics
The hybrid heuristics can be classified as a combination of metaheuristics and column generation. Two basic possibilities are explored in literature: apply metaheuristics to the price subproblems or directly to the MP, first to create initial columns or to produce incoming columns to CG (Mauri and Lorena, 2007, Pirkwieser and Raidl, 2010, Filho and Lorena, 2000, Massen et al., 2013). The hybrid heuristics proposed in this work can be seen as a third option were the CG process generates
Computational results
This section presents the data and computational results used to validate the proposed methods. A total of 8 sets, namely: Iris, Yeast, Breast, BreastA, BreastB, Proteins, DLBCLA and DLBCLB. The protein data were obtained at the address http://ranger.uta.edu/chqding/protein. Data from Yeast, Breast and Iris were obtained in UCI repository (Abbasi & Younis, 2007). The data BreastA, BreastB, DLBCLA and DLBCLB are of the data repository program of cancer (//www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
Conclusion
Clustering problems appear in various contexts and real life applications. This work presents a contribution to the solution of clustering problems using hybrid heuristics that combine stages of a column generation process to solve p-median problems.
Five different approaches have been proposed, considering solutions from the initial columns, the master problem and the column generation. They are combined with a path-relinking approach and the results are evaluated by the CRand index. The
Acknowledgment
The authors thank FAPES (process 59830042/2012), CNPq (processes 476862/2012-4, 471837/2008-3, 300692/2009-9, 300747/2010-1 and 477148/2011-5) and CAPES.
References (25)
- et al.
A survey on clustering algorithms for wireless sensor networks
Computer Communications
(2007) - et al.
A new grouping genetic algorithm for clustering problems
Expert Systems with Applications
(2012) - et al.
A genetic algorithm with gene rearrangement for k-means clustering
Pattern Recognition
(2009) - et al.
Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm
Pattern Recognition
(2008) - et al.
Document clustering method using dimension reduction and support vector clustering to overcome sparseness
Expert Systems with Applications
(2014) - et al.
Multi-objective evolutionary biclustering of gene expression data
Pattern Recognition
(2006) - et al.
Spectral methods for graph clustering – a survey
European Journal Of Operational Research
(2011) - et al.
Investigation of a new grasp-based clustering algorithm applied to biological data
Computers & Operations Research
(2010) - et al.
Clustering search algorithm for the capacitated centered clustering problem
Computers & Operations Research
(2010) - Filho, G. R., & Lorena, L. A. N. (2000). Constructive genetic algorithm and column generation: An application to graph...
Uma heurística de geração de colunas para o problema de formação de células de máquinas e partes
Pesquisa Operacional para o Desenvolvimento
Computational cluster validation in post-genomic data analysis
Bioinformatics
Cited by (6)
A hybrid heuristic for overlapping community detection through the conductance minimization
2022, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :An alternative is to combine heuristics with exact methods in order to produce high-quality solutions at a reasonable computational time. Such methods are known as hybrid heuristics or matheuristics [9], which lately have been successfully applied to optimization and clustering problems [5,10–13]. Then, the main contribution of this work is the proposal of a hybrid heuristic for detecting overlapping communities of a graph by the minimization of the conductance metric.
Path relinking for the vertex separator problem
2017, Expert Systems with ApplicationsCitation Excerpt :Recently, the population-based path relinking framework (Glover, 1998; Glover & Laguna, 1997) has attracted much attention in combinatorial optimization and intelligent problem solving. The approach has shown outstanding performances in solving a number of challenging decision and optimization problems in various settings, such as unconstrained binary quadratic optimization (Wang, Lü, Glover, & Hao, 2012), flow shop sequencing and scheduling (Costa, Goldbarg, & Goldbarg, 2012; Peng, Lü, & Cheng, 2015; Zeng, Basseur, & Hao, 2013), clustering (Martins de Oliveira, Nogueira Lorena, Chaves, & Mauri, 2014), web services composition (Parejo, Segura, Fernandez, & Ruiz-Cortés, 2014), frequency assignment (Lai & Hao, 2015) and quadratic multiple knapsack (Chen, Hao, & Glover, 2016). PR has also been combined with other metaheuristics such as genetic algorithms (Vallada & Ruiz, 2010), scatter search (González, Oddi, Rasconi, & Varela, 2015) and GRASP (Mestria, Ochi, & Martins, 2013) to solve several difficult combinatorial problems.
A comparison of two hybrid methods for constrained clustering problems
2017, Applied Soft Computing JournalCitation Excerpt :This paper examines the hybridization of the column generation approach for p-median problems [30]. This method was used in Oliveira et al. [27] for clustering problem. In our paper, CG method was improved and added a local search to solve a new problem named constrained clustering problem.
GRASP with path relinking for the selective pickup and delivery problem
2016, Expert Systems with ApplicationsCitation Excerpt :Their results indicate that the latter option achieved the best results. Besides hybridizing path relinking with GRASP, one can also find path relinking hybridized with tabu search (Jia & Hu, 2014; Lai & Hao, 2015; Peng, Lü, & Cheng, 2015; Urrutia, Milanés, & Løkketangen, 2015), local search (Yang, Zhang, & Zhu, 2015), population-based metaheuristics (de Oliveira, Enayatifar, Sadaei, Guimarães, & Potvin, 2016; Hamdi-Dhaoui, Labadie, & Yalaoui, 2014; Marinakis & Marinaki, 2015; Martí, Corberán, & Peiró, 2015; Ribas, Companys, & Tort-Martorell, 2015) and mathematical programming based approaches (de Oliveira, Lorena, Chaves, & Mauri, 2014; Li, Chu, Prins, & Zhu, 2014). One important component with path relinking is the neighborhood operator for moving from the initial solution to the guiding solution.
Path relinking for the fixed spectrum frequency assignment problem
2015, Expert Systems with ApplicationsForming the clusters of labour migrants by the degree of risk of HIV infection
2016, Eastern-European Journal of Enterprise Technologies