D-Piper, a modified piper diagram to represent big sets of hydrochemical analyses

https://doi.org/10.1016/j.envsoft.2021.104979Get rights and content

Highlights

  • The Piper diagram is not useful for representing big hydrochemical data sets.

  • D-Piper, an open-source Python code, enables hydrochemical analysis of big data.

  • D-Piper is based on spatial point density and data distribution characteristics.

Abstract

The Piper diagram is the most widely used chart in groundwater hydrochemical studies to represent the chemical facies of a set of water samples. So far, most modifications of the original Piper diagram over time show a common problem: the Piper chart is not suitable for representing big data sets. When there are more than 40 or 50 analyses, the symbols overlap, and the data distribution becomes obscured. To overcome this limitation the D-Piper diagram displays the spatial point density instead of individual points. With this modification, both visualization and interpretation improve as the number of points increases. Besides, several representation methods to account for distributional characteristics of the data allow the user to unravel hidden hydrochemical structures. The D-Piper diagram outperforms other point density-based solutions, particularly in the representation of low-density areas. Furthermore, the D-Piper implementation is coded in Python, and it is freely available.

Introduction

In hydrogeology, graphical representation tools of water analyses allow both interpreting hydrochemical processes and comparing samples from different sources. Most of these tools represent the concentrations of the major components of the water or their relationships. The main graphs used for this purpose are the Stiff (1951), Piper (1944), Durov (1948), Chadha (1999) diagrams, the Collins (1923) bar diagrams and, among the general use diagrams, pie charts, and X Y scatterplots. In practice, the most widely used hydrochemical diagram is that proposed by Piper (1944) and, to a lesser extent, the Stiff and Schoeller diagrams (Schoeller, 1964).

Almost all of these diagrams were designed more than fifty years ago, and, although they are still very useful, none were intended to represent hundreds or thousands of analyses on the same graph.

The idea of interpreting water analyses using triangular diagrams arose around the year 1942, when simultaneously Hill (1940) and Langelier and Ludwig (1942), each arrived at the same method of graphic representation on their own. Later, Piper proposed his namesake diagram (Piper, 1944). Since then, different proposals and variations of the initial diagram have appeared, such as the aforementioned Durov diagram (Durov, 1948), the extended Durov diagram (Burdon and Mazloum, 1958), taken up and improved by Lloyd (1965), and the Chadha diagram (1999).

The usefulness of the Piper diagram in the fields of geochemistry and hydrochemistry is not limited to the classification and identification of facies and the comparison and grouping of samples, but it also allows analyzing water mixing processes and some geochemical reactions.

Although the Piper diagram was devised with a hydrochemical and hydrogeological approach, applications of the original version or some variant can also be found in many other fields. For example, Teng et al. (2016) suggested using the Piper diagram as a visualization tool for the design and optimization of chemical engineering processes such as the production of cinnamaldehyde; Ray and Mukherjee (2008) represented a combination of major ions in rectangular coordinates reproducing similar patterns to those of the Piper diagram, easily implementable in Excel; Shelton et al. (2018) proposed a modification of the Piper diagram using compositional data analysis. However, these diagrams are not yet widely used because either they do not provide complementary information to the original or their interpretation is too complex.

The Piper diagram (Fig. 1) shows major cations and anions of water analyses expressed in percentage of equivalents per million (% epm) in three diagram panels shaped by a mesh of equal-sized triangular cells, two triangular and one rhomboidal. Cations (Ca2+, Mg2+, Na++K+) and anions (SO42−, CO32− + HCO3, Cl) are represented in the triangular panels and then projected onto the central rhomboidal panel for cationic-anionic facies identification.

Fig. 2a shows a typical example of the use of the Piper diagram, in which the hydrochemical facies of three groups of water of different origins are compared. In this example, each group can be easily differentiated based on its anionic facies and yet the cationic facies are similar in the three groups. Fig. 2b shows how the graph looks like when a large set of analyses is represented. This graph evidences the main drawback of the Piper diagram: only general patterns such as the dominant calcium-magnesium bicarbonate facies are observable. However, no other structures in the data can be detected as a result of individual samples overlap, and it is even impossible to get an idea of the number of analyses represented.

On the other hand, due to the notable reduction in the cost of analytical techniques and the ease of storing a large number of analyses in an accessible way, it is now easy to access databases in which there are hundreds or even thousands of water analyses. The representation of these analyses in the Piper diagram is often meaningless since point overlap hides most of the samples (Fig. 2b).

Recently, Russionello and Lautz (2020) have proposed a new implementation of the Piper diagram (PIED Piper) in Matlab that allows displaying big data sets using translucent points, contours and heatmaps of data density, and convex hulls for groupings. While being an excellent tool to improve the visualization and interpretation of large data sets, it has an important limitation: there are no available representations based on the statistical distribution of the data. Incorporating information on the distributional properties of the data is essential to achieve suitable representations (Conolly and Lake, 2006). Otherwise, relevant information on the hydrochemical characteristics could be missed. Optimal displaying requires the selection of class-intervals in density graphs to be carried out based on this information.

Given the above mentioned limitations of the Piper diagram, the objective of this study is to design a modification of said diagram that allows representing any number of samples, keeping the maximum amount of information about the structure of the original data. Similarly to Russionello and Lautz (2020), this objective is met by displaying point density inside each of the triangular panels and in the rhomboidal panel, rather than representing each of the analyses independently. Moreover, several displaying methods to account for distributional characteristics of the data are provided in the implementation.

Section snippets

Methods

The new density D-Piper V.1 diagram is implemented in Python 3.7.6 using the standard libraries NumPy 1.17.4 (Harris et al., 2020), Pandas 1.1.3 (McKinney, 2010) and Matplotlib 3.2.2 (Hunter, 2007). The library jenkspy 0.2.0 used to calculate the interval cutoff values with the Jenks method can be downloaded from the website https://github.com/mthh/jenkspy. The data sets shown in this work come from the Water database of the Geological Survey of Spain (http://info.igme.es/BDAguas/) comprising

Results and discussion

Fig. 3a–f shows examples of the application of the D-Piper diagram with some of the main graphical options and the different methods to display point density. The graphs allow visualizing the most abundant facies in the data set and their distribution. The data structure appears clearly in areas where the standard Piper diagram only depicted a masked surface (Fig. 2b). Except for the user-defined interval (Fig. 3f), all the D-Piper representations have the same number of intervals for

Conclusions

Hydrochemical databases contain an increasing amount of information, sometimes in the order of hundreds or thousands of analyses. The traditional Piper diagram is unsuitable to display these big data sets. The D-Piper code presented in this paper effectively displays point density in large sets of water analyses. It makes it easy to visualize the structure of the hydrochemical facies by allowing the user to choose among a wide variety of representation methods depending on the data

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research has been funded by the CLIGRO proyect of the Spanish National Plan for Scientific and Technical Research (CGL2016-77473-C3-1-R), the National System of Youth Guarantee (PEJ2018-002477) is co-founded under the Youth Employment Operational Program with financial resources from YEI and ESF, and the proyect IGME HIDROCAMBIO Ref 2616.

References (24)

  • O. García-Menéndez

    Evaluación multiparamétrica de un esquema MAR (Managed Aquifer Recharge) en un acuífero costero salinizado (Plana de Castellón, España)

    (2018)
  • C.R. Harris et al.

    Array programming with NumPy

    Nature

    (2020)
  • Cited by (0)

    View full text