A machine code-based genetic programming for suspended sediment concentration estimation

https://doi.org/10.1016/j.advengsoft.2010.06.001Get rights and content

Abstract

Correct estimation of suspended sediment concentration carried by a river is very important for many water resources projects. The application of linear genetic programming (LGP), which is an extension to genetic programming (GP) technique, for suspended sediment concentration estimation is proposed in this paper. The LGP is compared with those of the adaptive neuro-fuzzy, neural networks and rating curve models. The daily streamflow and suspended sediment concentration data from two stations, Rio Valenciano Station and Quebrada Blanca Station, operated by the US Geological Survey (USGS) are used as case studies. The root mean square errors (RMSE) and determination coefficient (R2) statistics are used for evaluating the accuracy of the models. Comparison of the results indicated that the LGP performs better than the neuro-fuzzy, neural networks and rating curve models. For the Rio Valenciano and Quebrada Blanca Stations, it is found that the LGP models with RMSE = 44.4 mg/l, R2 = 0.910 and RMSE = 13.9 mg/l, R2 = 0.952 in test period is superior in estimating daily suspended sediment concentrations than the best accurate neuro-fuzzy model with RMSE = 52.0 mg/l, R2 = 0.876 and RMSE = 17.9 mg/l, R2 = 0.929, respectively.

Introduction

Accurately estimation of suspended sediment concentration carried by a river is important with respect to channel navigability, reservoir filling, hydroelectric-equipment longevity, river aesthetics, fish habitat and scientific interests. All surface water reservoirs are designed to a volume known as “the dead storage” to accommodate the sediment income that will accumulate over a specified period called the economic life. The under-estimation of sediment yield results in insufficient reservoir capacity. To acquire an appropriate reservoir design and operation it is mandatory to determine sediment yield accurately. In environmental engineering, if the particles also transport pollutants, the estimation of river sediment load has an additional significance.

The popularity of neuro-fuzzy (NF) and artificial neural network (ANN) techniques increase in various areas because their capability for solving complex problems that might otherwise not have a tractable solution. According to the recent experiments, the NF and ANN offer promising results in the field of water resources and hydrology, such as rainfall–runoff modeling [13], [25], streamflow estimation [12], [24], [26], [29], [30], [31], reservoir inflow forecasting [6], [35], predicting hydromechanic parameters [19], [20], and suspended sediment estimation [14], [27], [28], [42]. However, there are some disadvantages of the ANN and NF methods. The ANN is a black box model and network structure is hard to determine. It is usually determined using a trial-and-error approach, i.e. sensitivity analysis [2], [26], [27]. Its training algorithm has the danger of getting stuck into local minima, etc. The NF models combine the linguistic representation of a fuzzy system with the learning ability of the ANN. Therefore, they can be trained to perform an input/output mapping, just as with an ANN, but with the additional benefit of being able to provide the set of rules on which the model is based. However, the NF is also a black box modeling approach, where the fuzzy rules have been added. Because the knowledge of the models’ structure was known but the equation of the phenomenon, which is actually transparent was not known. In this study, the proposed LGP models are explicit mathematical formulations.

GP can be successively applied to areas where the interrelationships among the relevant variables are poorly understood; finding the size and shape of the ultimate solution is hard and a major part of the problem; conventional mathematical analysis does not, or cannot, provide analytical solutions; an approximate solution is acceptable; small improvements in performance are routinely measured (or easily measurable) and highly prized; there is a large amount of data, in computer readable form, that requires examination, classification, and integration [9].

GP has been successfully applied and verified in the field of water resources engineering [8], [15], [17], [20], [21], [22], [39], [41]. However, there are only a few records of LGP applications. More recently, Guven [23] modeled the time series of daily flow rate in rivers using LGP, and Guven et al. [18] presented LGP as an alternative tool in the prediction of scour depth around a circular pile due to waves in medium dense silt and sand bed. Only three studies observed for sediment modeling using GP approach; Babovic [7] used experimental flume data of others and expressed a new GP-based formulation for bed concentration of suspended sediment. Kizhisseri et al. [32] used GP methodology to explore a better correlation between the temporal pattern of fluid field and sediment transport by utilizing two datasets; one from numerical model results and other from Sandy Duck field data. Aytek and Kisi [3] obtained an explicit formulation of the daily suspended sediment–discharge relationship by applying the GP technique on the daily streamflow and suspended sediment data from two stations on Tongue River in Montana. Azamathulla et al. [5] used genetic programming approach for predicting sediment concentrations of Malaysian rivers. To the knowledge of the authors, LGP has not yet been applied in estimation of suspended sediment concentration.

The main aim of this study is to develop an explicit formulation based on LGP for accurately estimation of suspended sediment concentration. The accuracy of LGP is compared with those of the adaptive neuro-fuzzy, neural networks and rating curve models employed in the previous work of Kisi [28]. This is the first study that compares the accuracy of the LGP with those of the neuro-fuzzy and neural network models in the hydrological context.

Section snippets

Linear genetic programming (LGP)

Genetic programming (GP) automatically creates computer programs to perform a selected task using Darwinian natural selection. GP is a robust, dynamic and fast growing discipline, and it has been applied to diverse and difficult problems with great success [16], [33]. In this study, we use LGP variant of GP, which operates directly on machine code. The main characteristic of LGP in comparison to tree-based GP is that expressions of a functional programming language (like LISP) are substituted

Case study

The streamflow and suspended sediment concentration data of Rio Valenciano Station near Juncos (USGS Station No: 50056400, Latitude 18°12′58′′, Longitude 65°55′34′′) and Quebrada Blanca Station at Jagual (USGS Station No: 50051150, Latitude 18°09′40′′, Longitude 65°58′58′′) operated by the US Geological Survey (USGS) are used in the study. The drainage areas at these sites are 43.57 km2 and 8.63 km2, respectively. The gauge datums are 98 and 130 m above sea level for the Rio Valenciano and

Application and results

Several input combinations are tried using LGP to estimate suspended sediment concentrations of the Rio Valenciano Station. The input combinations used in this application are (1) Qt; (2) Qt and Qt−1; (3) Qt and St−1; (4) Qt, Qt−1 and St−1. Here, the Qt and St denote the flow and sediment at time t. The LGP models are tested for each input combinations and the results are compared with those of the neuro-fuzzy (NF), ANN and sediment rating curve (SRC) models obtained from the previous study [28]

Conclusions

The ability of LGP approach in suspended sediment concentration estimation has been investigated in the present study. The accuracy of the LGP was compared with those of the adaptive neuro-fuzzy, neural networks and rating curve models obtained from the previous study [28]. The daily streamflow and suspended sediment data from two stations, Rio Valenciano and Quebrada Blanca, in USA were used for the model simulations. The comparison results indicated that the LGP model performed better than

Acknowledgements

The data used in this study were downloaded from the web server of the USGS. The author wishes to thank the staff of the USGS who are associated with data observation, processing, and management of USGS Web sites.

References (42)

  • V. Babovic

    Data mining and knowledge discovery in sediment transport

    Comput – Aided Civ Infrastruct Eng

    (2000)
  • Babovic V, Keijzer M. Declarative and preferential bias in GEP-based scientific discovery. Genet Program Evolv Mach...
  • Banzhaf W, Nordin P, Keller RE, Francone FD. Genetic programming: an introduction. San Francisco (CA): Morgan Kaufmann;...
  • M. Brameier et al.

    A comparison of linear genetic programming and neural networks in medical data mining

    IEEE Trans Evol Comput

    (2001)
  • Brameier M. On linear genetic programming. Ph.D. thesis. University of Dortmund;...
  • F.J. Chang et al.

    Real-time recurrent learning neural network for stream-flow forecasting

    Hydrol Process

    (2002)
  • Y.M. Chiang et al.

    Integrating hydrometeorological information for rainfall–runoff modelling by artificial neural networks

    Hydrol Process

    (2009)
  • J. Dorado et al.

    Prediction and modelling of the rainfall–runoff transformation of a typical urban basin using ANN and GP

    Appl Artif Intell

    (2003)
  • O. Giustolisi

    Using genetic programming to determine Chezy resistance coefficient in corrugated channels

    J Hydroinform

    (2004)
  • A. Guven et al.

    Prediction of pressure fluctuations on stilling basins

    Can J Civil Eng

    (2006)
  • A. Guven et al.

    Prediction of scour downstream of grade-control structures using neural networks

    J Hydraul Eng

    (2008)
  • Cited by (38)

    • Two decades on the artificial intelligence models advancement for modeling river sediment concentration: State-of-the-art

      2020, Journal of Hydrology
      Citation Excerpt :

      GA provided better results than RM for the estimation of 365 days data set. Kisi and Guven (2010) developed a machine code-based GP for SSC forecasting. The obtained result of the GP was compared with those of the adaptive NF, ANN and rating curve models.

    • An integrated framework of genetic network programming and multi-layer perceptron neural network for prediction of daily stock return: An application in Tehran stock exchange market

      2019, Applied Soft Computing Journal
      Citation Excerpt :

      By aggregating these components, we can achieve the rules to reach the goal, which here is the expected return. GNP is applicable in miscellaneous fields such as portfolio optimization [17], data mining [18], and assessment and prediction of many real world applications in the literature, including estimation of suspended sediment concentration [19], prediction of surface finish of the turning process [20], assessment of soil-fiber composite [21], etc. Consequently, the present paper aims to forecast daily stock return by aggregating technical indicators, using GNP programming and MLP.

    • Evaluating the apparent shear stress in prismatic compound channels using the Genetic Algorithm based on Multi-Layer Perceptron: A comparative study

      2018, Applied Mathematics and Computation
      Citation Excerpt :

      Since determining the velocity gradient without detailed measurements is difficult and Cfa has high uncertainty with different geometry and roughness values [13], other methods of estimating apparent shear stress without a need for these parameters are required. In the past decade, soft computing and artificial intelligence methods have been used successfully for simulating complex hydraulic engineering problems [14–23]. On the topic of shear stress simulation, Cobaner et al. [24] used an ANN model to estimate the shear force carried by walls in smooth rectangular channels and ducts.

    View all citing articles on Scopus
    View full text