Abstract:
Ridge regression is a regularization technique that can be used together with other regression algorithms to model highly correlated data. Like many other traditional tec...Show MoreMetadata
Abstract:
Ridge regression is a regularization technique that can be used together with other regression algorithms to model highly correlated data. Like many other traditional techniques, ridge regression of big data version requires a large number of iterations over the dataset to converge. As the dataset cannot all be stored in memory, the dataset is split into RAM-accommodable subsets for training, however, this strategy is time-consuming for reading all subsets from hard drive to memory over and over. To overcome the memory barrier, we proposed to use working sufficient statistics to solve the problem [1]. The parameters of the working sufficient matrix is small enough to be stored in RAM all the time. They can be updated at per row level to allow online computation. This strategy only requires one iteration over the dataset. While our previous work proved its theoretical correctness, it was not clear how our innovative algorithm would work in practice. In this study, we aims to validate and evaluate the performance improvement of the algorithm we proposed in earlier work-Three sets of experiments were conducted using large data-sets published by FAA and BTS to examine the computation time, memory requirement, and the accuracy of the output. Results showed that our exact ridge aggression algorithm enjoyed many benefits, such as faster computing time, minimal memory requirements and more accurate estimates.
Date of Conference: 10-13 December 2018
Date Added to IEEE Xplore: 24 January 2019
ISBN Information: