A New Parallel Training Algorithm for Optimum-Path Forest-Based Learning

Culquicondor, Aldo; Castelo-Fernández, César; Papa, João Paulo

doi:10.1007/978-3-319-52277-7_24

Aldo Culquicondor¹⁶,
César Castelo-Fernández¹⁶ &
João Paulo Papa¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10125))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

1422 Accesses

Abstract

In this work, we present a new parallel-driven approach to speed up Optimum-Path Forest (OPF) training phase. In addition, we show how to make OPF up to five times faster for training using a simple parallel-friendly data structure, which can achieve the same accuracy results to the ones obtained by traditional OPF. To the best of our knowledge, we have not observed any work that attempted at parallelizing OPF to date, which turns out to be the main contribution of this paper. The experiments are carried out in four public datasets, showing the proposed approach maintains the trade-off between efficiency and effectiveness.

You have full access to this open access chapter, Download conference paper PDF

Scalable Random Forest with Data-Parallel Computing

Parallel construction of Random Forest on GPU

Article 24 January 2022

Flat random forest: a new ensemble learning method towards better training efficiency and adaptive model size to deep forest

Article 09 May 2020

Keywords

1 Introduction

Pattern recognition is becoming even more important mainly due to the increasing needs from different applications to extract meaningful information from their data. Additionally, the problem gets worse since data is growing fast in both size and complexity. Humans have an innate ability to recognize patterns, but this is rather difficult to replicate on computers. Several techniques have been developed to address this issue, being the most popular ones Artificial Neural Networks (ANNs) [6] and Support Vector Machines (SVMs) [3]. Recently, a new framework to the design of graph-based classifiers named Optimum Path Forest (OPF) has been introduced in the scientific community. Such framework comprises supervised [10,11,12], semi-supervised [1, 2] and unsupervised learning algorithms [14]. As the main advantages, we shall observe some OPF variants are parameterless, and they do not make assumptions about separability of samples [7].

We refer to OPF as a single classifier in this paper, but it is in fact a framework to the design of graph-based classifiers. This means the user can design his/her own optimum-path forest-driven classifier by configuring three main modules: (i) adjacency relation, (ii) methodology to estimate prototypes, and (iii) path-cost function. Since OPF models the problem of pattern recognition as a graph partition task, it requires an adjacency relation to connect nodes (i.e. feature vectors extracted from dataset samples). Further, OPF rules a competition process among prototype samples, which are the most representative samples from each class. Therefore, a careful procedure to estimate them would be wise. Finally, in order to conquer samples, prototypes must offer them rewards, which are encoded by the path-cost function. In this paper, we are considering the OPF classifier proposed by Papa et al. [11, 12], which employs a full connectedness graph, and a path-cost function that computes the maximum arc-weight along a path. For the sake of clarity, we shall refer to this version as OPF only.

Although OPF has obtained recognition results comparable or even more accurate than SVMs and ANNs in a number of different applications, as well as it has been usually much faster for training, it can be time-consuming for very large datasets. Although OPF is parameterless, its training phase takes $\theta (n^2)$, where n stands for the number of training samples. Truly speaking, this is not that bad, since SVMs usually require a considerably higher computational load. However, there is still room for improvements, and that is the main contribution of this paper: to introduce a different data structure that allows the OPF parallelization. As a matter of fact, the proposed approach is able to produce equivalent results to the ones obtained by original OPF classifier concerning accuracy, though up to five times faster using a simple personal computer hardware.

The remainder of this paper is organized as follows. Section 2 reviews OPF theoretical background, and Sect. 3 presents the modifications that led to the new parallel training algorithm. Section 4 discusses the experiments, and Sect. 5 states conclusions and future works.

2 Supervised Classification Based on Optimum-Path Forest

Let $\mathcal{Z}$ be a dataset whose correct labels are given by a function $\lambda (x)$, for each sample $x\in \mathcal{Z}$. Thus, $\mathcal Z$ can be partitioned into a training ($\mathcal{Z}_1$), validation ($\mathcal{Z}_2$) and testing ($\mathcal{Z}_3$) set. Also, we can derive a graph $\mathcal{G}_1=(\mathcal{V}_1,\mathcal{A}_1)$ from the training set, where $\mathcal{A}_1$ stands for an adjacency relation known as complete graph, i.e. one has a full connectedness graph where each pair of samples in $\mathcal{Z}_1$ is connected by an edge. Additionally, each node $\mathbf v ^1_i\in \mathcal{V}_1$ concerns the feature vector extracted from sample $x^1_i\in \mathcal{Z}_1$. All arcs are weighted by the distance among their corresponding graph nodes. A similar definition can also be applied to the validation and test sets.

The OPF proposed by Papa et al. [12] comprises two distinct phases: (i) training and (ii) testing. The former step is based upon $\mathcal{Z}_1$, meanwhile the test phase aims at assessing the effectiveness of the classifier learned during the previous phase over the testing set $\mathcal{Z}_3$. Additionally, a learning algorithm was proposed to improve the quality of samples in $\mathcal{Z}_1$ by means of an additional set $\mathcal{Z}_2$. Roughly speaking, the idea is to train an OPF classifier over $\mathcal{Z}_1$ and then classify $\mathcal{Z}_2$. Further, we replace non-prototype samples in $\mathcal{Z}_1$ by misclassified samples in $\mathcal{Z}_2$, and the very same process is executed once again (i.e. training over $\mathcal{Z}_1$ and classification over $\mathcal{Z}_2$). The above procedure is executed until the accuracy between consecutive iteration does not change.

2.1 Training

The training step aims at building the optimum-path forest upon the graph $\mathcal{G}_1$ derived from $\mathcal{Z}_1$. Essentially, the forest is the result of a competition process among prototype samples that ended up partitioning $\mathcal{G}_1$. Let $\mathcal{S}\subseteq \mathcal{Z}_1$ be a set of prototypes, which can be chosen at random or using some other specific heuristic. Papa et al. [12] proposed to find the set of prototypes that minimizes the classification error over $\mathcal{Z}_1$, say that $\mathcal{S}^*\subseteq \mathcal{Z}_1$. Such set can be found by computing a Minimum Spanning Tree $\mathcal M$ from $\mathcal{G}_1$, and then marking as prototypes each pair of samples $(x_1, x_2)$, adjacent in $\mathcal M$, such that $\lambda (x_1)\ne \lambda (x_2)$.

Further, the competition process takes place in $\mathcal{Z}_1$, where nodes in $\mathcal{S}^*$ try to conquer the remaining samples in $\mathcal{Z}_1\setminus \mathcal{S}^*$. Basically, such process is based on a reward-compensation procedure, where the prototype that offers the minimum cost is the one that will conquer the sample. The reward is computed based on a path-cost function, which should be smooth according to Falcão et al. [5]. Therefore, Papa et al. [12] proposed to use $f_{max}$ as the path-cost function, defined as follows:

(1)

where $\pi _s \cdot (\mathbf s ,\mathbf t )$ stands for the concatenation between path $\pi _s$ and arc $(\mathbf s ,\mathbf t )\in \mathcal{A}_1$. Also, a path $\pi _s$ is a sequence of adjacent and distinct nodes in $\mathcal{G}_1$ with terminus at node $\mathbf s \in \mathcal{Z}_1$.

In short, by computing Eq. 1 for every sample $\mathbf s \in \mathcal{Z}_1$, we obtain a collection of optimum-path trees (OPTs) rooted at $\mathcal{S}^*$, which then originate an optimum-path forest. A sample that belongs to a given OPT means it is more strongly connected to it than to any other in $\mathcal{G}_1$.

2.2 Testing

In the testing step, each sample $\mathbf t \in \mathcal{Z}_3$ is classified individually as follows: $\mathbf t $ is connected to all training nodes from the optimum-path forest learned in the training phase, and it is evaluated the node $\mathbf s ^*\in \mathcal{Z}_1$ that conquers $\mathbf t $, i.e. the one that satisfies the following equation:

(2)

The classification step simply assigns the label of $\mathbf s ^*$ as the label of $\mathbf t $. Notice that a similar procedure to classify $\mathcal{Z}_2$ can be employed, too.

3 Parallel-Driven Optimum-Path Forest Training

In this section, we present the proposed approach based on parallel programming to speed up the naïve OPF training algorithm, hereinafter called POPF. Since standard OPF makes use of a priority queue implemented as a binary heap, it does not support multiple access at the same time. Therefore, POPF uses a simpler data structure along with a slightly different (parallel) training process, which is based on three main assumptions, as discussed below.

The first observation concerns the optimum-path computation process for each $\mathbf t \in \mathcal{Z}_1$, which is independent to other samples. On the other hand, costs need to be updated on a data structure so that a new sample can be selected for the next iteration, in order to expand the optimum path computed already. For this purpose, LibOPF [13] uses a binary heap as suggested in [5]. However, such data structure is not prepared for concurrent updates, i.e. if one attempts to perform the computation of $f_{max}(\mathbf t )$ for each $\mathbf t \in \mathcal{Z}_1$, a mutex would be required for each update process in the heap. However, this approach would not scale well if one increases the number of threads. Furthermore, this data structure introduces a $\mathcal{O}(\log (n))$ overhead in each update, where $n=|\mathcal{Z}_1|$.

The second observation concerns the graph, which is fully connected, implying that, at each iteration, all nodes need to be explored. Therefore, the computation of $f_{max}$ for all $s \in \mathcal{Z}_1$ takes $\theta (n)^2$ operations in total. In order to overcome such quadratic complexity, we can implement the priority queue as a standard array, but exploring the set of nodes in parallel and performing a parallel linear-search. At each iteration, each thread $\delta _i, \forall i = 1, \ldots , m$, explores a subset $\mathcal{W}_{(s,i)}$, such that $\mathcal{W}_s = \mathcal{W}_{(s,1)} \cup \cdots \cup \mathcal{W}_{(s,m)}$ is the set of neighbors of s, thus performing two tasks^{Footnote 1}: (1) to update the costs of each $t\in \mathcal{W}_{(s,i)}$ according to $f_{max}$ using arc $(\mathbf s ,\mathbf t )$, and (2) to compute the node $s^{(*,i)} \in \mathcal{W}_{(s,i)}$ with minimum cost for $\mathcal{W}_{(s,i)}$. Afterwards, the main thread finds the node $s^*$ with minimum cost among all $s^{(*,i)},\forall i=1,\cdots ,m$. Such node $s^*$ will be the first one to come out of the priority queue in the next iteration. Therefore, by using m threads, the overall time complexity of the training algorithm is reduced to $\theta (n^2/m)$.

Finally, the third observation is related to the Prim’s algorithm, which is used to calculate the Minimum Spanning Tree over $\mathcal{Z}_1$. As a matter of fact, we can use the very same OPF algorithm with a different path-cost function to compute the MST. Therefore, the aforementioned ideas can be applied to compute the MST too, taking advantage of parallelism in all the steps of the training process.

Algorithm 1 summarizes the ideas presented in this section. Note that even though parallelization takes place during the searching process for the best predecessor only, it is be better to start all threads only once at the beginning of the algorithm. The proposed approach was efficiently implemented using OpenMP [4], a well-known API for shared-memory parallel programming. OpenMP pragmas used in the implementation are included as comments.

4 Experiments and Results

In this section, we present the methodology used to assess the robustness of the proposed approach, as well as the experimental results. Table 1 presents the description of the datasets used in this work, which were taken from UCI Machine Learning Repository [8]. We intentionally chose datasets with numeric features, to avoid extra pre-processing, and with different orders of magnitude, to better describe the scalability of our approach.

Table 1. Description of the datasets and percentages used for $\mathcal{Z}_1$, $\mathcal{Z}_2$ and $\mathcal{Z}_3$.

Full size table

We compared POPF against naïve OPF using a microcomputer equipped with a 3.1 GHz Intel Core i7 processor, 8 GB of RAM, and running Linux 3.16. The programs were compiled using GCC 5.0, which implements OpenMP 4 specification. Also, we varied the number of threads concerning POPF according to the maximum concurrency allowed by the processor. For each experiment, we executed a hold-out-based partition of the dataset over 10 executions for mean accuracy and computational load computation purposes.

Table 2 presents the results regarding execution time, number of learning iterations and classification accuracy for $\mathcal{Z}_3$ – as defined by Papa et al. in [12], where POPF-m stands for POPF executed with m threads. Clearly, POPF maintained OPF accuracy for all number of threads, meaning the classifier obtained through the proposed approach preserves the same properties of the original one. Only a slight variation concerning MiniBooNE dataset can be observed. This is explained by the fact that same-cost samples could be stored in a different order in $\mathcal{Z}_1'$, which will change the evaluation order during the classification process over $\mathcal{Z}_3$ and will assign a different class when ties occur.

Table 2. Comparison against OPF and POPF with different number of threads.

Full size table

In Table 2 we also include parallel performance measures: speedup (S) – measuring gain in running time – and efficiency (E) – measuring thread utilization. They are defined as follows [9]:

$$\begin{aligned} S = \frac{T_{s}}{T_{p}} \ \ \ \ \ \text {and} \ \ \ \ \ E = \frac{S}{m}=\frac{T_{s}}{m \cdot T_{p}}, \end{aligned}$$

(3)

where $T_s$ and $T_p$ stand for the execution time of traditional and parallel OPF, respectively, and m denotes the number of threads.

We can observe that maximum speedup is obtained using 8 threads, being about five times faster than traditional OPF. Furthermore, speedup improves when the size of the dataset increases. Another worth noticing observation is that for the largest dataset, when using 2 threads, the efficiency obtained was greater than 100%. This confirms that POPF is considerably more efficient than traditional OPF, not only because of the parallel implementation, but also due to its asymptotic improvement. Figure 1 presents charts for S and E.

Regarding the overall parallel efficiency, it is important to stress the efficiency results obtained for both 4 and 8 threads. On one hand, we obtained an efficiency between 77% and 87% considering 4 threads, which is an outstanding result for any parallel implementation. On the other hand, the efficiency considering 8 threads was between 57% and 66%, which is a good thread utilization considering the fact that the processor used has only 4 physical cores (implementing 8 threads with HyperThreading$^{\textregistered }$ technology).

5 Conclusions and Future Work

In this work, we were able to parallelize the OPF training algorithm, and we demonstrated its efficiency concerning classification tasks. This new approach is based on three important observations: (i) the optimum-path computation process for each training sample is independent to each other; (ii) the full connectedness training graph allows us to replace the binary heap by a simple list (suitable for parallelization); and (iii) the computation of the MST during the training phase can also be performed in parallel. These changes allow to reduce the asymptotic complexity of the implementation and also turns the parallelization feasible.

We have observed that POPF preserves the accuracy of the original algorithm, but it is able to perform the learning phase at least five times faster using commodity hardware. Thus, an OPF with hundreds of thousands of nodes can be calculated in less than an hour. As such, POPF allows to perform classification of very large datasets when timing restrictions are present, and it brings closer the possibility of performing nearly real-time classification for reasonable sized-datasets even on a single computer or mobile device.

However, such real-time implementation still needs improvements in the classification algorithm. Thus, we are considering the use of spatial data structures to index the optimum path-forest obtained during training, so that a fewer amount of nodes are considered in classification, thus improving its running time.

Notes

1.
Notice $\mathcal{W}_{(s,i)}$ stands for the set of neighbours of node s in charge of thread $\delta _i$.

References

Amorim, W.P., Falcão, A.X., Carvalho, M.H.: Semi-supervised pattern classification using optimum-path forest. In: 27th SIBGRAPI Conference on Graphics, Patterns and Images, pp. 111–118 (2014)
Google Scholar
Amorim, W.P., Falcão, A.X., Papa, J.P., Carvalho, M.H.: Improving semi-supervised learning through optimum connectivity. Pattern Recogn. 60, 72–85 (2016). http://www.sciencedirect.com/science/article/pii/S0031320316300668
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Dagum, L., Enon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Article Google Scholar
Falcão, A.X., Stolfi, J., de Alencar Lotufo, R.: The image foresting transform: theory, algorithms, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 26(1), 19–29 (2004)
Article Google Scholar
Haykin, S., Network, N.: A comprehensive foundation. Neural Netw. 2, 2004 (2004)
Google Scholar
Haynes, S.D., Stone, J., Cheung, P.Y.K., Luk, W.: Video image processing with the sonic architecture. Computer 33(4), 50–57 (2000)
Article Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Pacheco, P.: An Introduction to Parallel Programming. Elsevier, Burlington (2011)
Google Scholar
Papa, J.P., Falcão, A.X.: A new variant of the optimum-path forest classifier. In: Bebis, G., et al. (eds.) ISVC 2008. LNCS, vol. 5358, pp. 935–944. Springer, Heidelberg (2008). doi:10.1007/978-3-540-89639-5_89
Chapter Google Scholar
Papa, J.P., Falcão, A.X., De Albuquerque, V.H.C., Tavares, J.M.R.: Efficient supervised optimum-path forest classification for large datasets. Pattern Recogn. 45(1), 512–520 (2012)
Article Google Scholar
Papa, J.P., Falcao, A.X., Suzuki, C.T.: Supervised pattern classification based on optimum-path forest. Int. J. Imaging Syst. Technol. 19(2), 120–131 (2009)
Article Google Scholar
Papa, J., Falcão, A., Suzuki, C.: LibOPF: a library for the design of optimum-path forest classifiers (2014). Software version 2.1 http://www.ic.unicamp.br/~afalcao/LibOPF
Rocha, L.M., Cappabianco, F.A.M., Falcão, A.X.: Data clustering as an optimum-path forest problem with applications in image analysis. Int. J. Imaging Syst. Technol. 19(2), 50–68 (2009)
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank Capes PROCAD grant 2966/2014, CNPq grants #306166/2014-3 and #470571/2013-6, FAPESP grant #2014/16250-9 and Universidad Católica San Pablo (UCSP) for their support.

Author information

Authors and Affiliations

Escuela de Ciencia de la Computacion, Universidad Catolica San Pablo, Arequipa, Peru
Aldo Culquicondor & César Castelo-Fernández
Computer Science Department, Sao Paulo State University - UNESP, Bauru, Brazil
João Paulo Papa

Authors

Aldo Culquicondor
View author publications
You can also search for this author in PubMed Google Scholar
César Castelo-Fernández
View author publications
You can also search for this author in PubMed Google Scholar
João Paulo Papa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aldo Culquicondor .

Editor information

Editors and Affiliations

Pontificia Universidad Católica del Perú, Lima, Peru
César Beltrán-Castañón
Uppsala University, Uppsala, Sweden
Ingela Nyström
University of Ottawa, Ottawa, Ontario, Canada
Fazel Famili

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Culquicondor, A., Castelo-Fernández, C., Papa, J.P. (2017). A New Parallel Training Algorithm for Optimum-Path Forest-Based Learning. In: Beltrán-Castañón, C., Nyström, I., Famili, F. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2016. Lecture Notes in Computer Science(), vol 10125. Springer, Cham. https://doi.org/10.1007/978-3-319-52277-7_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-52277-7_24
Published: 16 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52276-0
Online ISBN: 978-3-319-52277-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)