DLRankSVM: an efficient distributed algorithm for linear RankSVM

Jin, Jing; Lai, Guoming; Lin, Xiaola; Cai, Xianggao

doi:10.1007/s11227-016-1907-4

DLRankSVM: an efficient distributed algorithm for linear RankSVM

Published: 31 October 2016

Volume 73, pages 2157–2186, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jing Jin¹,
Guoming Lai²,
Xiaola Lin¹ &
…
Xianggao Cai³

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Linear RankSVM is one of the widely used methods for learning to rank. The existing methods, such as Trust-Region Newton (TRON) method along with Order-Statistic Tree (OST), can be applied to train the linear RankSVM effectively. However, extremely lengthy training time is unacceptable when using any existing method to handle the large-scale linear RankSVM. To solve this problem, we thus focus on designing an efficient distributed method (named DLRankSVM) to train the huge-scale linear RankSVM on distributed systems. First, to efficiently reduce the communication overheads, we divide the training problem into subproblems in terms of different queries. Second, we propose an efficient heuristic algorithm to address the load balancing issue (which is a NP-complete problem). Third, using OST, we propose an efficient parallel algorithm (named PAV) to compute auxiliary variables at each computational node of the distributed system. Finally, based on PAV and the proposed heuristic algorithm, we develop DLRankSVM under the framework of TRON. The extensive empirical evaluations show that DLRankSVM not only can obtain impressive speedups on both multi-core and distributed systems, but also can perform well in prediction compared with the other state-of-the-art methods. To the best of our knowledge, this is the first research work proposed to train the huge-scale linear RankSVM in a distributed fashion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GPU-accelerated parallel algorithms for linear rankSVM

Article 27 August 2015

Parallel Training GBRT Based on KMeans Histogram Approximation for Big Data

Robust distributed estimation and variable selection for massive datasets via rank regression

Article 20 June 2021

Notes

References

Airola A, Pahikkala T, Salakoski T (2011) Training linear ranking svms in linearithmic time using red-black trees. Pattern Recognit Lett 32(9):1328–1336
Article Google Scholar
Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–192
Article Google Scholar
Arabnia HR (1995) A distributed stereocorrelation algorithm. In: Fourth International Conference on Computer Communications and Networks. Proceedings. IEEE, pp 479–482
Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269
Article MATH Google Scholar
Arabnia HR, Fang WC, Lee C, Zhang Y (2010) Context-aware middleware and intelligent agents for smart environments. IEEE Intell Syst 25(2):10–11
Article Google Scholar
Arabnia HR, Oliver MA (1986) Fast operations on raster images with simd machine architectures. In: Computer Graphics Forum, vol 5. Wiley Online Library, pp 179–188
Arabnia HR, Oliver MA (1987) Arbitrary rotation of raster images with simd machine architectures. In: Computer Graphics Forum, vol 6. Wiley Online Library, pp 3–11
Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distribut Comput 24(1):107–114
Article Google Scholar
Bhandarkar SM, Arabnia HR (1995) The refine multiprocessortheoretical properties and algorithms. Parallel Comput 21(11):1783–1805
Article Google Scholar
Bhandarkar SM, Arabnia HR, Smith JW (1995) A reconfigurable architecture for image processing and computer vision. Int J Pattern Recognit Artific Intell 9(02):201–229
Article Google Scholar
Bottou L, Lin CJ (2007) Support vector machine solvers. Large scale kernel machines, pp 301–320
Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine learning. ACM, pp 89–96
Burges CJ, Svore KM, Bennett PN, Pastusiak A, Wu Q (2011) Learning to rank using an ensemble of lambda-gradient models. In: Yahoo! Learning to Rank Challenge, pp 25–35 (2011)
Cao Z, Qin T, Liu TY, Tsai MF, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning. ACM, pp 129–136
Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5):1155–1178
Article MathSciNet MATH Google Scholar
Chapelle O, Chang Y (2011) Yahoo! learning to rank challenge overview. In: Yahoo! Learning to Rank Challenge, pp 1–24
Chapelle O, Keerthi SS (2010) Efficient algorithms for ranking with svms. Inf Retrieval 13(3):201–215
Article Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Cossock D, Zhang T (2008) Statistical analysis of bayes optimal subset ranking. IEEE Trans Inf Theory 54(11):5140–5154
Article MathSciNet MATH Google Scholar
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Dembo RS, Steihaug T (1983) Truncated-newtono algorithms for large-scale unconstrained optimization. Math Program 26(2):190–212
Article MATH Google Scholar
Fürnkranz J, Hüllermeier E (2003) Pairwise preference learning and ranking. In: Machine Learning: ECML. Springer, pp 145–156
Herbrich R, Graepel T, Obermayer K (1999) Large margin rank boundaries for ordinal regression. Advances in neural information processing systems, pp 115–132
Ho CH, Lin CJ (2012) Large-scale linear support vector regression. J Mach Learn Res 13(1):3323–3348
MathSciNet MATH Google Scholar
Jafri R, Ali SA, Arabnia HR (2013) Computer vision-based object recognition for the visually impaired using visual tags. In: Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p 1
Jafri R, Ali SA, Arabnia HR, Fatima S (2014) Computer vision-based object recognition for the visually impaired in an indoors environment: a survey. Vis Comput 30(11):1197–1222
Article Google Scholar
Jafri R, Arabnia HR (2008) Fusion of face and gait for automatic human recognition. In: Fifth International Conference on Information Technology: New Generations, 2008. ITNG 2008. IEEE, pp 167–173
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst (TOIS) 20(4):422–446
Article Google Scholar
Jin J, Cai X, Lai G, Lin X (2015) Gpu-accelerated parallel algorithms for linear ranksvm. J Supercomput 71(11):4141–4171
Article Google Scholar
Jin J, Lin X (2014) Efficient parallel algorithms for linear ranksvm on gpu. In: Network and Parallel Computing. Springer, pp 181–194
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 133–142
Joachims T (2006) Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 217–226
Kuo TM, Lee CP, Lin CJ (2014) Large-scale Kernel Rank SVM. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp 812–820
Lee CP, Lin CJ (2014) Large-scale linear ranksvm. Neural Comput 26(4):781–817
Article MathSciNet Google Scholar
Li P, Wu Q, Burges CJ (2007) Mcrank: Learning to rank using multiple classification and gradient boosting. In: Advances in Neural Information Processing Systems, pp 897–904
Lin CJ, Moré JJ (1999) Newton’s method for large bound-constrained optimization problems. SIAM J Optim 9(4):1100–1127
Article MathSciNet MATH Google Scholar
Lin CJ, Weng RC, Keerthi SS (2008) Trust region newton method for logistic regression. J Mach Learn Res 9:627–650
MathSciNet MATH Google Scholar
Lin CY, Tsai CH, Lee CP, Lin CJ (2014) Large-scale logistic regression and linear support vector machines using spark. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, pp 519–528
Liu B, Chen J, Wang X (2015) Application of learning to rank to protein remote homology detection. Bioinformatics, p btv413
Luper D, Cameron D, Miller J, Arabnia HR (2007) Spatial and temporal target association through semantic analysis and gps data mining. In: IKE, vol 7. Citeseer, pp 25–28
Paisitkriangkrai S, Shen C, van den Hengel A (2015) Learning to rank in person re-identification with metric ensembles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1846–1855
Rabenseifner R, Hager G, Jost G (2009) Hybrid mpi/openmp parallel programming on clusters of multi-core smp nodes. In: 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing. IEEE, pp 427–436
Rahbarinia B, Pedram MM, Arabnia HR, Alavi Z (2010) A multi-objective scheme to hide sequential patterns. In: The 2nd International Conference on Computer and Automation Engineering (ICCAE), vol 1. IEEE, pp 153–158
Severyn A, Moschitti A (2015) Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp 373–382
Snir M (1998) MPI—the Complete Reference: the MPI core, vol 1. MIT Press
Taylor M, Guiver J, Robertson S, Minka T (2008) Softrank: optimizing non-smooth rank metrics. In: Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, pp 77–86
Teo CH, Vishwanthan S, Smola AJ, Le QV (2010) Bundle methods for regularized risk minimization. J Mach Learn Res 11:311–365
MathSciNet MATH Google Scholar
Ter Mors A, Valk J, Witteveen C, Arabnia H, Mun Y (2004) Coordinating autonomous planners. In: IC-AI, p 795
Trotman A (2005) Learning to rank. Inf Retrieval 8(3):359–381
Article Google Scholar
Wani MA, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multiring network. J Supercomput 25(1):43–62
Article MATH Google Scholar
Xia F, Liu TY, Wang J, Zhang W, Li H (2008) Listwise approach to learning to rank: theory and algorithm. In: Proceedings of the 25th International Conference on Machine Learning. ACM, pp 1192–1199
Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp 391–398
Yu H, Kim Y, Hwang S (2009) Rv-svm: An efficient method for learning ranking svm. In: Advances in Knowledge Discovery and Data Mining. Springer, pp 426–438
Yu J, Tao D, Wang M, Rui Y (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Article Google Scholar
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, p 10
Zheng Z, Zha H, Zhang T, Chapelle O, Chen K, Sun G (2008) A general boosting method and its application to learning ranking functions for web search. In: Advances in Neural Information Processing Systems, pp 1697–1704
Zhuang Y, Chin WS, Juan YC, Lin CJ (2014) Distributed newton method for regularized logistic regression. Department of Computer Science and Information Engineering, National Taiwan University, Tech. Rep

Download references

Acknowledgments

This research is supported in part by the National Natural Science Foundation of China under Grants No. 61309028 and No. 61472454, the National Social Science Foundation of China under Grants No. 12&ZD222, and the Project of Department of Education of Guangdong Province under Grants No. 2013KJCX0128. The authors thank the anonymous reviewers for their constructive comments and suggestions.

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510275, China
Jing Jin & Xiaola Lin
Computer Engineering Technical College, Guangdong Institute of Science and Technology, Zhuhai, 519090, China
Guoming Lai
School of Information Management, Sun Yat-sen University, Guangzhou, 510275, China
Xianggao Cai

Authors

Jing Jin
View author publications
You can also search for this author inPubMed Google Scholar
Guoming Lai
View author publications
You can also search for this author inPubMed Google Scholar
Xiaola Lin
View author publications
You can also search for this author inPubMed Google Scholar
Xianggao Cai
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xianggao Cai.

Appendices

Appendix A: Proof of Theorem 1

Proof

As described in Algorithm 1, any $l^{(s)}$ associated with the first $z-1$ computational nodes should satisfy $|dr(l^{(s)})|\le { error}$. This means that for the first $z-1$ computational nodes, all corresponding $l^{(s)}$ should meet the condition of $l^{(s)}\le (1+{ error})\frac{l}{z}$. Furthermore, the number of tuples $(y_i,q_i,\mathbf {x}_i)$ allocated to the zth computational node should be greater than zero. Therefore, we can get the following inequality:

$$\begin{aligned} l-\sum \limits _{s=1}^{z-1}l^{(s)}\ge l-(z-1)(1+{ error})\frac{l}{z}>0. \end{aligned}$$

(37)

According to above inequality, we have:

$$\begin{aligned} 0\le { error}<\frac{1}{z-1} \end{aligned}$$

(38)

which proves Theorem 1. $\square $

Appendix B: Proof of Theorem 2

Proof

Without lose of generality, we assume that $X_s=[\ldots , X^{(Q_t)},\ldots ]^T$ represents the set of the training instances allocated to the sth computational node. Then, according to (19) and (27), the computational formulas with respect to $A^{(s)T}_wA^{(s)}_wX_s\mathbf {v} , A^{(s)T}_wA^{(s)}_wX_s\mathbf {w} , A^{(s)T}_w\mathbf {e}^{(s)}_w$, and $p^{(s)}_w$ can be, respectively, denoted by:

$$\begin{aligned} A^{(s)T}_wA^{(s)}_wX_s\mathbf {v}= & {} \begin{bmatrix} \vdots \\ \left( \alpha _{ti}^+(\mathbf {w})+\alpha _{ti}^-(\mathbf {w})\right) \mathbf {x}^T_{ti}\mathbf {v}-\left( \beta _{ti}^+(\mathbf {w},\mathbf {v})+\beta _{ti}^-(\mathbf {w},\mathbf {v})\right) \\ \vdots \\ \end{bmatrix} \end{aligned}$$

(39)

$$\begin{aligned} A^{(s)T}_wA^{(s)}_wX_s\mathbf {w}= & {} \begin{bmatrix} \vdots \\ \left( \alpha _{ti}^+(\mathbf {w})+\alpha _{ti}^-(\mathbf {w})\right) \mathbf {w}^T\mathbf {x}_{ti}-\left( \gamma _{ti}^+(\mathbf {w})+\gamma _{ti}^-(\mathbf {w})\right) \\ \vdots \\ \end{bmatrix} \end{aligned}$$

(40)

$$\begin{aligned} A^{(s)T}_w\mathbf {e}^{(s)}_w= & {} \begin{bmatrix} \vdots \\ \alpha _{ti}^+(\mathbf {w})-\alpha _{ti}^-(\mathbf {w})\\ \vdots \\ \end{bmatrix} \end{aligned}$$

(41)

$$\begin{aligned} p^{(s)}_w= & {} \sum \limits _{t}{p^t_w}=\sum \limits _{t}\sum \limits _{i=1}^{l_t}\alpha _{ti}^+(\mathbf {w})=\sum \limits _{t}\sum \limits _{i=1}^{l_t}\alpha _{ti}^-(\mathbf {w}). \end{aligned}$$

(42)

Notice that, on the one hand, each computational nodes possesses the same global $\mathbf {w}$ and $\mathbf {v}$ at each same outer or CG iteration of TRON. On the other hand, the discussion in Sect. 3.3 shows that if $\mathbf {x}_i$ is $\mathbf {x}_{tj}$ (or $y_i$ is $y_{tj}$), then $\alpha ^+_i(\mathbf {w}) , \alpha ^-_i(\mathbf {w}) , \beta ^+_i(\mathbf {w},\mathbf {v}) , \beta ^-_i(\mathbf {w},\mathbf {v}) , \gamma ^+_i(\mathbf {w})$, and $\gamma ^-_i(\mathbf {w})$ must be equal to $\alpha ^+_{tj}(\mathbf {w})$, $\alpha ^-_{tj}(\mathbf {w}), \beta ^+_{tj}(\mathbf {w},\mathbf {v}),\beta ^-_{tj}(\mathbf {w},\mathbf {v}), \gamma ^+_{tj}(\mathbf {w})$, and $\gamma ^-_{tj}(\mathbf {w})$, respectively. As a result, using (40)–(42), $\sum _{s=1}^{z}f^{(s)}(\mathbf {w})$ can be indicated as follows:

$$\begin{aligned} \sum \limits _{s=1}^{z}f^{(s)}(\mathbf {w})= & {} \frac{z}{2}\mathbf {w}^T\mathbf {w}+\sum \limits _{s=1}^{z}C\left( \mathbf {w}X_s\left( A^{(s)T}_wA^{(s)}_wX_s\mathbf {w}-2A^{(s)T}_w\mathbf {e}^{(s)}_w\right) +p^{(s)}_w\right) \nonumber \\= & {} \frac{z}{2}\mathbf {w}^T\mathbf {w}+C\sum \limits _{t=1}^{m}\left( \sum \limits _{i=1}^{l_t}\mathbf {w}^T\mathbf {x}_{ti}\left( \left( \alpha _{ti}^+(\mathbf {w})+\alpha _{ti}^-(\mathbf {w})\right) \mathbf {w}^T\mathbf {x}_{ti}\right) \right) \nonumber \\&-C\sum \limits _{t=1}^{m}\left( \sum \limits _{i=1}^{l_t}\mathbf {w}^T\mathbf {x}_{ti}\left( \left( \gamma _{ti}^+(\mathbf {w})+\gamma _{ti}^-(\mathbf {w})\right) \right) \right) \nonumber \\&-2C\sum \limits _{t=1}^{m}\left( \sum \limits _{i=1}^{l_t}\mathbf {w}^T\mathbf {x}_{ti}\left( \alpha _{ti}^+(\mathbf {w})-\alpha _{ti}^-(\mathbf {w}\right) \right) +C\sum \limits _{t=1}^{m}\left( p^t_w\right) \nonumber \\= & {} \frac{z}{2}\mathbf {w}^T\mathbf {w}+C\sum \limits _{i=1}^{l}\mathbf {w}^T\mathbf {x}_{i}\left( \left( \alpha _{i}^+(\mathbf {w})+\alpha _{i}^-(\mathbf {w})\right) \mathbf {w}^T\mathbf {x}_{i}\right) \nonumber \\&-C\sum \limits _{i=1}^{l}\mathbf {w}^T\mathbf {x}_{i}\left( \gamma _{i}^+(\mathbf {w})+\gamma _{i}^-(\mathbf {w})\right) \nonumber \\&-2C\sum \limits _{i=1}^{l}\mathbf {w}^T\mathbf {x}_{i}\left( \alpha _{i}^+(\mathbf {w})-\alpha _{i}^-(\mathbf {w})\right) +Cp_w\nonumber \\= & {} \frac{z-1}{2}\mathbf {w}^T\mathbf {w}+f(\mathbf {w}). \end{aligned}$$

(43)

Consequently,

$$\begin{aligned} f(\mathbf {w})=\sum \limits _{s=1}^{z}f^{(s)}(\mathbf {w})-\frac{z-1}{2}\mathbf {w}^T\mathbf {w}. \end{aligned}$$

To facilitate the proof of the other two equations, we assume that $\mathbf {x}_i$ and $\mathbf {x}_{ti}$ are, respectively, indicated as below:

$$\begin{aligned} \mathbf {x}_{ti}= & {} \left[ x^1_{ti},\ldots ,x^n_{ti}\right] \\ \mathbf {x}_i= & {} \left[ x^1_i,\ldots ,x^n_i\right] . \end{aligned}$$

Then, by plugging (40) and (41) into $\sum _{s=1}^{z}\nabla f^{(s)}(\mathbf {w})$, we have

$$\begin{aligned}&\sum \limits _{s=1}^{z}\nabla f^{(s)}(\mathbf {w})=z\mathbf {w}+2C\sum \limits _{s=1}^{z}X^T_s\left( A^{(s)T}_wA^{(s)}_wX_s\mathbf {w}-A^{(s)T}_w\mathbf {e}^{(s)}_w\right) \nonumber \\&\quad =z\mathbf {w}+2C\nonumber \\&\quad \begin{bmatrix} \sum \limits _{t=1}^{m}\sum \limits _{i=1}^{l_t}x^1_{ti}\left( \left( \alpha _{ti}^+(\mathbf {w})+\alpha _{ti}^-(\mathbf {w})\right) \mathbf {w}^T\mathbf {x}_{ti}-\left( \gamma _{ti}^+(\mathbf {w})+\gamma _{ti}^-(\mathbf {w})-\alpha _{ti}^+(\mathbf {w})+\alpha _{ti}^-(\mathbf {w})\right) \right) \\ \vdots \\ \sum \limits _{t=1}^{m}\sum \limits _{i=1}^{l_t}x^n_{ti}\left( \left( \alpha _{ti}^+(\mathbf {w})+\alpha _{ti}^-(\mathbf {w})\right) \mathbf {w}^T\mathbf {x}_{ti}-\left( \gamma _{ti}^+(\mathbf {w})+\gamma _{ti}^-(\mathbf {w})-\alpha _{ti}^+(\mathbf {w})+\alpha _{ti}^-(\mathbf {w})\right) \right) \\ \end{bmatrix}\nonumber \\&\quad =z\mathbf {w}+2C\nonumber \\&\quad \begin{bmatrix} \sum \limits _{i=1}^{l}x^1_i\left( \left( \alpha _{i}^+(\mathbf {w})+\alpha _{i}^-(\mathbf {w})\right) \mathbf {w}^T\mathbf {x}_{i}-\left( \gamma _{i}^+(\mathbf {w})+\gamma _{i}^-(\mathbf {w})-\alpha _{i}^+(\mathbf {w})+\alpha _{i}^-(\mathbf {w})\right) \right) \\ \vdots \\ \sum \limits _{i=1}^{t}x^n_i\left( \left( \alpha _{i}^+(\mathbf {w})+\alpha _{i}^-(\mathbf {w})\right) \mathbf {w}^T\mathbf {x}_{i}-\left( \gamma _{i}^+(\mathbf {w})+\gamma _{i}^-(\mathbf {w})-\alpha _{i}^+(\mathbf {w})+\alpha _{i}^-(\mathbf {w})\right) \right) \\ \end{bmatrix}\nonumber \\&\quad =z\mathbf {w}+2CX^T\left( A_w^TA_wX\mathbf {w}-A_w^T\mathbf {e}_w\right) \nonumber \\&\quad =(z-1)\mathbf {w}+\nabla f(\mathbf {w}). \end{aligned}$$

(44)

Therefore, it proves that:

$$\begin{aligned} \nabla f(\mathbf {w})=\sum \limits _{s=1}^{z}\nabla f^{(s)}(\mathbf {w})-(z-1)\mathbf {w}. \end{aligned}$$

Similarly, by plugging (39) in to $\sum _{s=1}^{z}\nabla ^2 f^{(s)}(\mathbf {w})\mathbf {v}$, we can get that

$$\begin{aligned} \sum \limits _{s=1}^{z}\nabla ^2 f^{(s)}(\mathbf {w})\mathbf {v}= & {} z\mathbf {v}+2C\sum \limits _{s=1}^{p}X^T_s\left( A^{(s)T}_wA^{(s)}_wX_s\mathbf {v}\right) \nonumber \\= & {} z\mathbf {v}+2C\nonumber \\&\begin{bmatrix} \sum \limits _{t=1}^{m}\sum \limits _{i=1}^{l_t}x^1_{ti}\left( \alpha _{ti}^+(\mathbf {w})+\alpha _{ti}^-(\mathbf {w})\right) \mathbf {x}^T_{ti}\mathbf {v}-\left( \beta _{ti}^+(\mathbf {w},\mathbf {v})+\beta _{ti}^-(\mathbf {w},\mathbf {v})\right) \\ \vdots \\ \sum \limits _{t=1}^{m}\sum \limits _{i=1}^{l_t}x^n_{ti}\left( \alpha _{ti}^+(\mathbf {w})+\alpha _{ti}^-(\mathbf {w})\right) \mathbf {x}^T_{ti}\mathbf {v}-\left( \beta _{ti}^+(\mathbf {w},\mathbf {v})+\beta _{ti}^-(\mathbf {w},\mathbf {v})\right) \\ \end{bmatrix}\nonumber \\= & {} z\mathbf {v}+2C\nonumber \\&\begin{bmatrix} \sum \limits _{i=1}^{l}x^1_i\left( \alpha _{i}^+(\mathbf {w})+\alpha _{i}^-(\mathbf {w})\right) \mathbf {x}^T_{i}\mathbf {v}-\left( \beta _{i}^+(\mathbf {w},\mathbf {v})+\beta _{i}^-(\mathbf {w},\mathbf {v})\right) \\ \vdots \\ \sum \limits _{i=1}^{l}x^n_i\left( \alpha _{i}^+(\mathbf {w})+\alpha _{i}^-(\mathbf {w})\right) \mathbf {x}^T_{i}\mathbf {v}-\left( \beta _{i}^+(\mathbf {w},\mathbf {v})+\beta _{i}^-(\mathbf {w},\mathbf {v})\right) \\ \end{bmatrix}\nonumber \\= & {} z\mathbf {v}+2CX^T\left( A_w^TA_wX\mathbf {v}\right) \nonumber \\= & {} (z-1)\mathbf {v}+\nabla ^2 f(\mathbf {w})\mathbf {v} \end{aligned}$$

(45)

which proves that

$$\begin{aligned} \nabla ^2 f(\mathbf {w})\mathbf {v}=\sum \limits _{s=1}^{z}\nabla ^2 f^{(s)}(\mathbf {w})\mathbf {v}-(z-1)\mathbf {v}. \end{aligned}$$

Consequently, Theorem 2 is proved. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, J., Lai, G., Lin, X. et al. DLRankSVM: an efficient distributed algorithm for linear RankSVM. J Supercomput 73, 2157–2186 (2017). https://doi.org/10.1007/s11227-016-1907-4

Download citation

Published: 31 October 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s11227-016-1907-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DLRankSVM: an efficient distributed algorithm for linear RankSVM

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GPU-accelerated parallel algorithms for linear rankSVM

Parallel Training GBRT Based on KMeans Histogram Approximation for Big Data

Robust distributed estimation and variable selection for massive datasets via rank regression

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Proof of Theorem 1

Proof

Appendix B: Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now