Elsevier

Applied Soft Computing

Volume 74, January 2019, Pages 747-759
Applied Soft Computing

Hierarchical extreme learning machine based image denoising network for visual Internet of Things

https://doi.org/10.1016/j.asoc.2018.08.046Get rights and content

Highlights

  • We address the heavy noise removing problem faced in visual Internet of Things by using the hierarchical extreme learning machine. The proposed framework contains a sparse auto-encoder and a supervised regression and a non-local aggregation.

  • We provide an effective patch-to-patch image denoising networks which are robust for dealing with various noise levels in both clipped and unclipped noisy model. The key advantage of this denoising network is fast training.

  • Experimental studies on images including both hand-written digits and natural scenes have shown that our method achieves excellent performance both in quality and efficiency. The nice performance can improve the compression ratio for data interactions in the visual Internet of Things.

Abstract

In the visual Internet of Things (VIoT), imaging sensors must achieve a balance between limited bandwidth and useful information when images contain heavy noise. In this paper, we address the problem of removing heavy noise and propose a novel hierarchical extreme learning machine-based image denoising network, which comprises a sparse auto-encoder and a supervised regression. Due to the fast training of a hierarchical extreme learning machine, an effective image denoising system that is robust for various noise levels can be trained more efficiently than other denoising methods, using a deep neural network. Our proposed framework also contains a non-local aggregation procedure that aims to fine-tune noise reduction according to structural similarity. Compared to the compression ratio in noisy images, the compression ratio of denoised images can be dramatically improved. Therefore, the method can achieve a low communication cost for data interactions in the VIoT. Experimental studies on images, including both hand-written digits and natural scenes, have demonstrated that the proposed technique achieves excellent performance in suppressing heavy noise. Further, it greatly reduces the training time, and outperforms other state-of-the-art approaches in terms of denoising indexes for the peak signal-to-noise ratio (PSNR) or the structural similarity index (SSIM).

Introduction

In today’s digital world, smart cameras are ubiquitously adopted in various areas such as security surveillance, automotives, and industry. Cameras have been widely connected to the Internet and managed by the infrastructure of the visual Internet of Things (VIoT). In recent years, the quality of imaging sensors has improved significantly, such that high-resolution cameras have become mainstream. The constant pursuit of more pixels integrated in a small chip results in low signal-to-noise ratio (SNR), which is related to the number of photons that are incident on a chip per unit area. A higher gain in signal amplification is necessary for some applications, such as low-light surveillance and dehazing enhancement [1], [2], [3]. The level of noise increases in proportion to the amplification factor. On one hand, under extreme conditions like low illumination or heavy noise, which substantially deteriorates image quality, object perception and recognition becomes difficult for both artificial observation and computer vision. Thus, image noise limits applications over the VIoT [4]. On the other hand, the transmission bandwidth is sensitive for IoT applications [5]. However, a noisy image incurs a compression ratio bottleneck for network communication between sensor nodes and servers. Noise seriously affects compression for popular image compression methods, because it brings much contaminated and useless information into images [6]. Therefore, image denoising is an important factor that influences the quality of many imaging sensor nodes and the performance of VIoT systems.

Diverse denoising methods have been proposed to reduce noise in low-quality images in the past decade. Traditional denoising methods include non-local mean (NLM) [7], block matching with 3D filtering (BM3D) [8], [9], [10], [11] and global denoising [12], low rank models including sparse representation denoising (k-SVD) [13], [14], and robust principal component analysis (R-PCA) [15]. For example, the BM3D represents a milestone among the traditional denoising methods, which is based on the assumptions that noise is additive, white, and Gaussian (AWG) [16], and noisy natural images contain the appearance of similar patches. Though a method of this kind is well engineered, the generality is weakened due to these assumptions. Until recently, deep neural networks achieved desirable performance in various computer vision and image processing tasks, including image denoising algorithms, such as the plain multi-layer perceptron denoising (MLPD) [17], deep class aware denoising (DCAD) [18], deep convolutional neural network-based denoising [19], and deep Gaussian conditional random field network (DGCRF) denoising [20]. Most parametric denoising frameworks can be abstracted as multi-layer networks with certain connections. By learning a mapping from contaminated images to noise-free images on large-scale training dataset, deep neural networks have become the current state-of-the-art image denoising paradigm. However, training deep neural networks requires a large amount of data, and thus is often time-consuming. The extreme learning machine (ELM) [21] was proposed with the intrinsic feature that it can be trained quickly. Many improvements based on the typical ELM have been proposed to satisfy special applications, such as hierarchical ELM (HELM) [22], and online sequential ELM (OS-ELM) [23].

To overcome the deficiency in heavy and/or varying noise removal and improve the efficiency of model training, we propose a new denoising framework (Fig. 1) that consists of three modules: (1) patch decomposition and pre-processing; (2) patch-to-patch denoising based on hierarchical extreme learning machine (HELM); and (3) non-local aggregation. In the first module, a noisy image is partitioned into overlapping patches with a fix size and stride. Meanwhile, other necessary pre-processing is implemented. The second module HELM contains an auto-encoder and supervised regression, and is applied to image denoising. In the last module, fine denoising is performed by aggregating non-local patches according to their structural similarity. Our approach can be used to address heavy noise and to achieve competitive performance when compared to other methods. We designed a multi-channel embedded image-processing device that can bear our proposed denoising algorithm for VIoT-based surveillance systems. The architecture of the VIoT-based surveillance system is illustrated in Fig. 2. Raw images captured from distributed cameras can be transmitted to a multi-channel embedded image-processing device via standard interfaces. The on-board denoising algorithms are implemented to remove noise in the raw image according to a remote user’s commands. The resulting noise-free images are then sent to a cloud server, where they can be retrieved by end users.

The rest of this paper is organized as follows. In Section 2, we briefly review previous research related to image denoising. Section 3 describes in detail the framework of a HELM-based denoising network with non-local aggregation. In Section 4, we introduce the embedded VIoT system for video surveillance. In Section 5, database and experimental setting are introduced. In Section 6, experimental results are presented, and comparisons with other methods are discussed. Finally, we draw conclusions and discuss future research directions in Sections 7 Discussion, 8 Conclusion.

Section snippets

Related work

In this section, we classify existing image denoising methods into traditional and neural network methods and review them separately. Then we introduce related research on embedded applications of image processing and machine learning for the IoT.

HELM-based image denoising

In this section, we briefly review the theory of ELM and then describe details of the proposed framework, including data preprocessing, HELM-based auto-encoder learning, and the non-local aggregation procedure.

VIoT-based embedded image processing system

To build a VIoT platform that can capture and analyze streaming video or images in-situ and connect the end device to the Internet, we designed a multi-channel video surveillance system using embedded image processing technology. This system can provide on-line processing and intelligent analysis, such as image denoising, image enhancement, and object detection and tracking, and can support 1000M Ethernet communication. The on-board system contains three serial digital interface (SDI) channels

Database and experimental setting

For the patch-to-patch denoising network, the first auto-encoder hidden layer contained 900 nodes, and the output hidden layer contained 5000 hidden nodes.

We evaluated our methods by investigating two different tasks: denoising of hand-written digits and natural images, using public datasets. The first experiment implemented our patch-based denoising network using a dataset of hand-written digits, and the second experiment trained and applied our denoising framework to the natural image

Handwritten digits denoising on MNIST dataset

Our patch-based denoising network was trained in approximately 2.5 min on an i7-5500U 2.7 GHz CPU. We subsequently tested this network with the images of hand written digits in the MNIST dataset. Our results, along with the results from the three conventional methods, are shown in Fig. 7. It can be seen that our method outperformed all three methods in all cases, except for two isolated points when σ= 50 and 70 in the unclipped model. The MLPD approach exhibited the best performance when

Discussion

The trained weights of the auto-encoder layer can be visualized as patches, as shown in Fig. 13. These weights are applied to the input patches with a dot product operation and can be interpreted as filtering kernels, which can be used to extract useful information like structural features from the noisy image. The outputs from this hidden layer are used as filtered elements to combine a denoised patch with the following hidden layer and the output layer.

The weights of the hidden layer and the

Conclusion

To address the problem of removing heavy noise in the visual Internet of Things, we proposed a novel denoising method that is based on hierarchical extreme learning machine. The framework of our method consists of a patch-to-patch image-denoising network and non-local aggregation. Fast training is a key feature of our method compared to other approaches, e.g., our denoising network can be well trained within several minutes on a single CPU. Experimental results show that our method can

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China [Grant No. 61571026]; the National Key Project of Research and Development Plan, China [No. 2016YFE0108100]; and the National Institutes of Health, the United States [Grant No. R01CA165255 and R21CA172864].

References (59)

  • LiQ. et al.

    Road vehicle monitoring system based on intelligent visual internet of things

    J. Sensors

    (2015)
  • D.K. Yadav, K. Singh, S. Kumari, Challenging issues of video surveillance system using Internet of Things in cloud...
  • BessisN. et al.

    Big Data and Internet of Things: A Roadmap for Smart Environments

    (2014)
  • PenceW.D. et al.

    Lossless astronomical image compression and the effects of noise

    Publ. Astron. Soc. Pac.

    (2009)
  • BuadesA. et al.

    Nonlocal image and movie denoising

    Int. J. Comput. Vis.

    (2008)
  • DabovK. et al.

    Image denoising with block-matching and 3d filtering

    Proc. SPIE - Int. Soc. Opt. Eng.

    (2006)
  • DabovK. et al.

    Image denoising by sparse 3-D Transform-domain collaborative filtering

    IEEE Trans. Image Process.

    (2007)
  • DabovK. et al.

    Image restoration by sparse 3D transform-domain collaborative filtering

    (2008)
  • K. Dabov, A. Foi, K. Egiazarian, Video denoising by sparse 3d transform-domain collaborative filtering, in: Signal...
  • TalebiH. et al.

    Global image denoising

    IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc.

    (2014)
  • EladM. et al.

    Image denoising via sparse and redundant representations over learned dictionaries

    IEEE Trans. Image Process.

    (2006)
  • MairalJ. et al.

    Learning multiscale sparse representations for image and video restoration

    SIAM J. Multiscale Model. Simul.

    (2008)
  • D.D. Muresan, T.K. Parks, Adaptive principal components and image denoising, in: Proc. Int. Conf. Image Processing,...
  • KatkovnikV. et al.

    From local kernel to nonlocal multiple-model image denoising

    Int. J. Comput. Vis.

    (2010)
  • BurgerH.C. et al.

    Image denoising: Can plain neural networks compete with BM3D?

  • T. Remez, O. Litany, R. Giryes, A.M. Bronstein, Deep class aware denoising,...
  • LiC. et al.

    Research on image denoising based on deep convolutional neural network

    Comput. Eng.

    (2017)
  • VemulapalliR. et al.

    Deep Gaussian conditional random field network: A model-based deep network for discriminative denoising

  • TangJ. et al.

    Extreme learning machine for multilayer perceptron

    IEEE Trans. Neural Netw. Learn. Syst.

    (2017)
  • Cited by (0)

    View full text