Original papers
Evaluating fidelity of lossy compression on spatiotemporal data from an IoT enabled smart farm

https://doi.org/10.1016/j.compag.2018.08.045Get rights and content

Highlights

Abstract

As the volume of data collected by various IoT sensors used in smart farm applications increases, the storing and processing of big data for agricultural applications become a huge challenge. The insight of this paper is that lossy compression can unleash the power of compression to IoT because, as compared with its counterpart (a lossless one), it can significantly reduce the data volume when the spatiotemporal characteristics of IoT sensor data are properly exploited. However, lossy compression faces the challenge of compressing too much data thus losing data fidelity, which might affect the quality of the data and potential analytics outcomes. To understand the impact of lossy compression on IoT data management and analytics, we evaluated four classification algorithms with reconstructed agricultural sensor data based on various energy concentration. Specifically, we applied three transformation-based lossy compression mechanisms to five real-world weather datasets collected at different sampling granularities from IoT weather stations. Our experimental results indicate that there is a strong positive correlation between the concentrated energy of the transformed coefficients and the compression ratio as well as the data quality. While we observed a general trend where much higher compression ratios can be achieved at the cost of a decrease in quality, we also observed that the impact on the classification accuracy varies among the data sets and algorithms we evaluated. Lastly, we show that the sampling granularity also influences the data fidelity in terms of the prediction performance and compression ratio.

Introduction

The advent of IoT revolutionizes the knowledge discovery paradigm for various domains (Ludena and Ahrary, 2013, Al-Fuqaha et al., 2015). Suggestive actionable knowledge can be extracted from a continuous stream of raw data collected from IoT devices. This paper is particularly interested in IoT enabled smart farming. In short, smart farming with data analytic capabilities can provide more precise forecasts and thus could potentially improve crop yields as well as reduce production costs by removing the use of non-essential pesticides or fertilizers.

Recent years have witnessed a plethora of IoT solutions beneficial to agricultural domains. In the agriculture industry, advanced decision support systems through IoT technologies are increasingly gaining attention because they enable precision farming. After processing the collected data, they provide forecast services to farmers and growers so that they can make smarter decisions. The three major features that can affect weather-based predictions are as follows:

  • A smart greenhouse is a facility that helps in the steady production of high-quality plants all year round by artificially controlling the cultivation environment. Different kinds of plants require different conditions (e.g., temperature, humidity, etc.) for their growth. If proper growth environments were provided, it would enable one to control plant growth rate (e.g., either promote or prevent flowering), thereby bringing huge economic benefits to framers and growers. An IoT enabled greenhouse control system collects information for managing plant growth and controls the facilities promoting optimal growth environments.

  • Frost and freeze damage to flowers and buds at or near the bloom stage could result in significant crop failures (Jaradat et al., 2008, Matzneller et al., 2016). For example, Chung et al. (2004) forecasted frost using global climate and weather data. If there were an accurate frost forecast, it would prevent damages from frost proactively, e.g., by moving a frost fan around the crop.

  • Plant pathogens and pests including insects, mites, weeds and fungi can negatively affect crop productivity and profitability. Tripathy et al. (2013) reported on the interrelationship of weather, crops, and pests. Crop pest are also sensitive to particular weather condition such as humidity.

One of the key challenges to enable IoT smart farming is how to manage big data collected from various sensors efficiently (Ukil et al., 2015). One solution for handling a large volume of data is to apply data compression techniques such that the storage and communication overheads are reduced (Bose et al., 2016, Huiibbe et al., 2013). Various compression algorithms have been applied to satisfy different application needs, and many of them considered lossy algorithms with examples being ZFP, SZ, ISABEL, and wavelet-based (Li et al., 2017a, Li et al., 2017b). Lossy compression (Chou and Piegl, 1992) can help reduce the data size significantly, but error rates and thus loss of data quality are not easy to bound. Several recent approaches have proposed techniques to bound the error introduced by applying lossy compression methods (Sustika and Sugiarto, 2016, Abo-Zahhad et al., 2015, Tao et al., 2017). In Tao et al. (2017), it focuses on the scientific applications where often exhibits fairly sharp or spiky data changes in small data regions. Sustika and Sugiarto, 2016, Abo-Zahhad et al., 2015 exploit sparse data pattern.

Nevertheless, as reported in several prior studies such as analyses of turbulent flow data (Li et al., 2015) and climate data (Baker et al., 2014), data reconstructed from lossy compression still allows meaningful analysis to be carried out. In Baker et al. (2014), the reconstructed data were able to reveal the same mean climate as the original data because climate data with compression rates of up to 5:1 can be reconstructed to be statistically indistinguishable from the original. However, lossy compression techniques are subjective to data fidelity issues (Li et al., 2017b). Often data fidelity is dependent on specific application domain because acceptable information loss varies among variables of interest (Baker et al., 2014).

In our previous paper (Moon et al., 2017b, Moon et al., 2017a), we showed that transformation based lossy compressions are useful for minimizing data reconstruction errors as well as important for maintaining errors within a tolerable range. Furthermore, we have studied the effect of lossy data compression on data fidelity before and after applying IoT analytics. However, from the viewpoint of data fidelity, there is a lack of verification for the relationship with the data collection or sampling frequency.

To manage IoT data efficiently and reliably, we collect, compress, and store climate data, and then reconstruct them for later analysis. We evaluate the fidelity of the reconstructed weather sensor data using lossy compression algorithms based on three transformations, namely, the Discrete Cosine Transform (DCT) (Razzaque et al., 2013), Fast Walsh-Hadamard Transform (FWHT) (Fino and Algazi, 1976), and Discrete Wavelet Transform (DWT) (Abo-Zahhad et al., 2015). Our objective was to evaluate the impact of the lossy compression and restoration on data reliability. Our experimental results using five sensor datasets show that lossy data compression can achieve 30×–100× compression ratios with marginal information loss. We collected weather sensors data using two sampling granularities (every minute and every hour) to evaluate how the sampling rate affects the amount of data reduction and quality of the data analysis.

Our compression mechanism is also simple in that it does not require complex quantization methods. In our comparison of the four classification algorithms for predicting frost, we observed that the prediction accuracy using compressed data containing only 90% of the total energy from the transformed coefficients did not drop much compared with that using the original data. In most cases, the frost prediction performance based on the reconstructed data is comparable with the performance based on the original data. Interestingly, in some cases, the prediction performance improves when the reconstructed data are used. These results clearly demonstrate that lossy compression leads to efficient management of big IoT data by reducing the data storage and transmission time while still maintaining the data quality.

Section snippets

Design of the transform-based lossy compression

Many of the lossy compression techniques exploit the fact that, while individual data values in the dataset might show some randomness, their overall patterns are spatiotemporally smooth. Because of this, compression techniques in conjunction with data transformation can be more effective because the transformed data usually reveal the correlation of the data explicitly. For example, let us consider the temperature data (shown in Fig. 1a) which is one of the datasets we evaluated in this paper.

Datasets

We evaluated the effectiveness of the compression algorithms discussed in Section 2.1 on climate data. In our evaluation, we used a real-world dataset from the wireless climate stations located in a small orchard in Youngcheon, South Korea. We chose the following five most important variables, namely temperature, humidity, solar radiation, wind direction and wind speed, from the climate data collected during October 2015 at the deployed weather station. The data were continuously monitored and

Conclusion

Emerging IoT-based smart farming produces a large volume of diverse data, which needs to be stored efficiently and reliably. In this paper, we evaluated the effectiveness of data compression on five of the most important variables in real climate/weather data as an exemplar of IoT applications. Specifically, we compared the performance of the predictive analytics on the reconstructed data using the DCT, FWHT, and DWT to evaluate the feasibility of applying lossy compression to IoT big data. Our

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. 1751143. This work was supported by the National Research Council of Science & Technology (NST) grant by the Korea government (MSIP) (No. CRC-15-01-KIST).

References (26)

  • M.A.K. Jaradat et al.

    Smoke modified environment for crop frost protection: a fuzzy logic approach

    Comput. Electron. Agric.

    (2008)
  • M.M. Abo-Zahhad et al.

    Compressive sensing algorithms for signal processing applications: a survey

    Int. J. Commun., Network Syst. Sci.

    (2015)
  • A. Al-Fuqaha et al.

    Internet of things: a survey on enabling technologies, protocols, and applications

    IEEE Commun. Surveys Tutorials

    (2015)
  • Baker, A.H., Xu, H., Dennis, J.M., Levy, M.N., Nychka, D., Mickelson, S.A., 2014. A methodology for evaluating the...
  • Bicer, T., Yin, J., Chiu, D., Agrawal, G., Schuchardt, K., 2013. Integrating online compression to accelerate...
  • Bose, T., Bandyopadhyay, S., Kumar, S., Bhattacharyya, A., Pal, A., June 2016. Signal characteristics on sensor data...
  • R. Chaturvedi et al.

    A survey on compression techniques for ECG signals

    Int. J. Adv. Res. Comput. Commun. Eng.

    (2013)
  • J.J. Chou et al.

    Data reduction using cubic rational B-splines

    IEEE Comput. Graph. Appl.

    (1992)
  • U. Chung et al.

    Site-specific frost warning based on topoclimatic estimation of daily minimum temperature

    Korean J. Agric. For. Meteorol.

    (2004)
  • B.J. Fino et al.

    Unified matrix treatment of the fast Walsh-Hadamard transform

    IEEE Trans. Comput.

    (1976)
  • J.H. Han et al.

    Frostfall forecasting in the Naju pear production area based on discriminant analysis of climatic

    Korean J. Agric. For. Meteorol.

    (2009)
  • Huiibbe, N., Wegener, A., Ling, Y., Ludwig, T., June 2013. Evaluating lossy compression on climate data. In:...
  • Johnson, J.D., 1975. Diseases of Peaches and Plums....
  • Cited by (42)

    • Soil moisture forecast for smart irrigation: The primetime for machine learning

      2022, Expert Systems with Applications
      Citation Excerpt :

      This is an issue mainly for SM coming from soil probes, which are more subject to interruption and decalibration because they are subject to bad weather, the risk of damage by agricultural machinery, and because they are difficult to access for maintenance (often soil probes are located amidst dense crops of multi-kilometric dimensions) (Kamienski et al., 2019). Therefore, we recommend data sourcing to be redundant, either by same-type sensor redundancy (Junior & Kamienski, 2021; Moon, Kim, Zhang, & Son, 2018; Torres, da Rocha, da Silva, de Souza, & Gondim, 2020) or by multiple data sourcing (Ardagna et al., 2018; Togneri et al., 2019) for each monitored feature. Combining in situ and remote sensing simultaneously is an example of multiple data sourcing.

    • An overview of agriculture 4.0 development: Systematic review of descriptions, technologies, barriers, advantages, and disadvantages

      2021, Computers and Electronics in Agriculture
      Citation Excerpt :

      A few authors defined smart farm as the adoption of digital or information and communication technologies for farm management (e.g., O’Grady and O’Hare, 2017; Pivoto et al., 2019; Fielke et al., 2020) to monitor different parameters (Balducci et al., 2018; Balducci et al., 2018) based on the collection of a set of data (Colezea et al., 2018; Lioutas et al., 2019; Van der Burg et al., 2019). The data used can make it possible to cultivate crops (Lee et al., 2019; Nawandar and Satpute, 2019), with potentials in the crop yields – operational efficiency (Elijah et al., 2018), lowering production costs, eliminating the use of agricultural pesticides and non-essential fertilizers (Wolfert et al., 2017; Moon et al., 2018; Quiroz and Alférez, 2020). Other terms are used in agriculture 4.0 literature: precision agriculture, digital agriculture, smart agriculture, autonomous agricultural robots, smart factories, Agri Artificial Intelligence, and agri-food 4.0.

    View all citing articles on Scopus
    View full text