1 Introduction

Since the beginning of modern aviation, researchers have tried to learn more about the aerodynamics of free-flying animals because flapping flight is a very good propulsion mechanism at low Reynolds numbers. The possibility to fly at low Reynolds numbers is important in many ways and a huge number of studies have been performed in this area of research. However the aerodynamics of the problem are not yet fully understood. Especially in the future of planetary exploration, small drones that can fly long distances with minimal energy requirements will be of special importance. Birds can be a role model for these probes. For this reason, the shape of bird wings in free flight is of particular importance because most of the aerodynamic properties can be determined by them. Within the “Schwerpunktprogramm 1207” of the German research funding organization (DFG), studies on barn owls in gliding flight were performed [1, 4, 11]. Stationary cameras and moving camera frames were compared to analyze the advantages and disadvantages of the experimental methods. The data was evaluated by methods of Optical Flow [4], Stereo Matching [11] and laser light section process [1]. In areas of low texture, Stereo Matching algorithms showed holes as a result of missing image information. The evaluation with correlation windows resulted in a bad resolution and the projection of point patterns on the wing got blurred during a wing beat amplitude. Further investigations on a flapping barn owl by Wolf [12] have shown that a moving camera with a resting coordinate system on the bird can simplify the evaluation of the data. Otherwise a stationary camera allows for the masking of the moving bird from the resting background. In addition, the behavior of a bird during the flight with a moving camera system can change in comparison to its real reactions. In order to measure the wing-surface structures, methods which produce reliable results and do not harm the animal are needed. On this account, methods of Computer Vision (CV) are used to reconstruct objects from two cameras into three dimensional point clouds, without harming or influencing the bird. Our method provides the possibility, after a complete calibration with a 3D two-plane calibration target, to triangulate point clouds (XYZ) in a three dimensional space through unique point correspondences on the camera sensor [6]. On the one hand, the distance between two camera centers (baseline) is fixed and should be as short as possible to reliably determine the displacement field between them. On the other hand, a decrease of the spatial resolution of the point position in the direction of the measurement volume depth for small camera distances is noticed. Therefore a compromise has to be found. An increase of the baseline reduces the measurement uncertainty in the Z-direction, but also leads to occlusions. In this case the search for point correspondences gets ambiguous. In real experimental setups the camera distance can vary between a few centimeters and meters. This paper offers a possibility to obtain unique disparity maps where 3D point clouds of a considered falcon wing are determined.

2 Measurement Setup

The atmospheric wind tunnel of the University of Bundeswehr Munich was used for the measurements of flapping flight of a Lanner-falcon. The wind tunnel has a cross-section of \(1.85\,\mathrm {m}\times 1.85\,\mathrm {m}\), a \(22\,\mathrm {m}\) long measurement domain and can produce velocities between \(2\,\frac{\mathrm {m}}{\mathrm {s}}\) and \(45\,\frac{\mathrm {m}}{\mathrm {s}}\). The data was acquired with 4 Phantom v12 high speed cameras from Vision Research. The camera sensor of these cameras generates images with up to \(1280\times 800\,\mathrm {px}\) and can record with \(6200\,\frac{\mathrm {frames}}{\mathrm {s}}\) at full resolution. To resolve the motion of a wing up- and downstroke over time precisely, a frame rate for the recordings of \(1000\,\frac{\mathrm {frames}}{\mathrm {s}}\) was selected. The measurement object, a Lanner-falcon with a wing span of about \(0.9\,\mathrm {m}\), flies through the wind tunnel upstream against an uniform flow. As a result the time of flight in the measurement domain rises and more than one flap can be recorded. At our experiments in the atmospheric wind tunnel at the University of Bundeswehr Munich baselines of \(700-900\,\mathrm {mm}\) occurred. The high speed cameras were equipped with Zeiss Distagon \(35\,\mathrm {mm}\) lenses with a working distance between camera and measurement object of approximately \(1\,\mathrm {m}\). Figure 1 shows the wind tunnel with the four mounted cameras and the flow direction in a schematic representation.

Fig. 1.
figure 1

Two stereo camera systems for upper and lower side view

3 Calibration

The calibration of the stereoscopic camera system was done in two steps: First, a three dimensional two-plane calibration target was mounted in the measurement domain. Next, the target was moved by rotation and translation within the observation area of both cameras. The arbitrary movement of the calibration pattern requires a calibration with a pinhole camera model. The intrinsic camera parameters A were calculated for every camera, which include the focal length f in mm, the pixel size of the camera sensor s in \(\,\frac{\mathrm {mm}}{\mathrm {px}}\) and the principal point \((u_0,v_0)\) in px [8]. To get this information, the determined sensor coordinates of the calibration points were assigned to physical positions, whereby a mathematical connection between sensor positions (xy) and world coordinates (XYZ) was created. Additionally, the extrinsic parameters K for the first camera view of the calibration pattern were fitted. It includes the rotation R with three Euler angles to rotate the coordinate system and a translation t, which moves the coordinate system from the origin in the world coordinate system to the camera sensor. For every other view, the rotation and translation were calculated. The back projection error was finally minimized by a least square algorithm. The relative position of each camera, depending on the calibration target, is defined by the extrinsic parameters. Mathematically, the camera matrix P can be calculated by putting together the intrinsic and extrinsic parameters of every camera (Fig. 2).

$$\begin{aligned} P=A\cdot K=A\cdot \left[ R|t\right] \end{aligned}$$
(1)
$$\begin{aligned} A=\left[ \begin{array}{ccc} \frac{f}{s}&{}0&{}u_0\\ 0&{}\frac{-f}{s}&{}-v_0+h\\ 0&{}0&{}1\end{array}\right] \end{aligned}$$
(2)
$$\begin{aligned} R=R(r_x,r_y,r_z)=R_z\cdot R_y\cdot R_x \end{aligned}$$
(3)
$$\begin{aligned} t=\left( \begin{array}{c} t_x\\ t_y\\ t_z\end{array}\right) \end{aligned}$$
(4)
Fig. 2.
figure 2

Transformation between world and camera coordinate system

With Eq. (1), it is possible to convert world coordinates into sensor positions.

$$\begin{aligned} \left( \begin{array}{c} x\\ y\\ 1\end{array}\right) =P\cdot \left( \begin{array}{c} X\\ Y\\ Z\\ 1 \end{array}\right) \end{aligned}$$
(5)

To reduce the complexity of the transformation homogeneous coordinates are used to project a three dimensional frame into a two dimensional frame with just one matrix multiplication. Otherwise a multiplication and a summation need to be performed successively, which increases the computational load. Lens distortions, on the basis of the curvature of the camera lenses, can be eliminated with the radial distortion coefficients \((k_1,k_2)\). Barrel distortion has no further influence on our experimental setup.

$$\begin{aligned} \left( \begin{array}{c} x_u\\ y_u\end{array}\right) =f\left( \left( \begin{array}{c} x_d\\ y_d\end{array}\right) ,s,k_1,k_2\right) =\left( \begin{array}{c} x_d\\ y_d\end{array}\right) \left( 1+\frac{\sqrt{x_{d}^{2}+y_{d}^{2}}k_1}{500^{2}s}+\frac{(x_{d}^{2}+y_{d}^{2})k_2}{500^{3}s^2}\right) \end{aligned}$$
(6)

In Eq. (6), distorted pixel coordinates \((x_d,y_d)\) are corrected by the radial distortion coefficients to get undistorted pixel positions \((x_u,y_u)\) [8]. Another important connection is the fundamental matrix F, which describes the projective geometry between two cameras. It is determined by corresponding sensor point pairs of the calibration target in two cameras. For every pair of connecting points x = (xy) and x’ = (\(x',y'\)) in two images, the following equation has to be fulfilled:

(7)

4 Epipolar Geometry

The matrix multiplication of the fundamental Matrix F by a point \(\mathbf x \) on one sensor can be interpreted geometrically as a line on the other camera sensor (see Fig. 3).

$$\begin{aligned} l'=F\mathbf x \end{aligned}$$
(8)

The search for corresponding points therefore simplifies from a search on the whole sensor to a search along a line. Two epipolar lines on one sensor intersect in one common point, the epipole. With the help of point pairs from the calibration it is possible to build transformation matrices T. They allow for the transformation of the epipole from any position in the camera coordinate system to a point at infinity, which leads to horizontal epipolar lines. This reduces the correspondence search to a one dimensional problem. If the transformation matrix is applied to every pixel in the image, the picture is rectified [5] (see Fig. 4).

Fig. 3.
figure 3

Epipolar plane between two cameras

Fig. 4.
figure 4

Transformation of epipole to infinity

5 Calculation of the Displacement Field

The rectified image pairs show, by virtue of a long baseline, disparities in the order of \(\gg O(1)\,\mathrm {px}\). State-of-the-art algorithms do not produce reliable results for those kinds of displacements. The disparity space gets larger with large disparity ranges, which raises the required memory and ambivalent results can occur. Therefore the determination of the complete displacement field V is split into two consecutive steps. In the first step, a coarse displacement field is defined based on manually chosen points. These points are marked at prominent positions in the left camera image. After that, the equivalent points are found in the right image. With the help of n point correspondences \(c_i (1\le i\le n)\), a spline is set up with the following equation [2]:

$$\begin{aligned} f(x,y)=a_1+a_{x}x+a_{y}y+\sum _{i=1}^{n}w_{i}U(|c_{i}-(x,y)|) \end{aligned}$$
(9)

with the help function

$$\begin{aligned} U(r)=r^2\log {r^2} \end{aligned}$$
(10)

where r is the geometrical distance between two points

$$\begin{aligned} r=|(x,y)_{1}-(x,y)_{2}| \end{aligned}$$
(11)

A displacement field \((V_m)\) is then produced for every point within the image. Based on the rectification all disparities lay along the x-axis in the image. If the left camera image is warped with the disparity map \(V_m\), both camera images are nearly superimposed. The disparities reduce to the order of O(1) and O(2). These can be determined under the use of a Semi-Global Block Matching algorithm [7]. This method creates sparse disparity maps \(V_f\). On positions where a left-right and right-left plausibility check is successful, a disparity is only saved. The computation is done with the OpenCV database on a Nvidia GTX 660 GPU, which leads to a reduction of the calculation time to under 10 s for an image pair with \(1280\times 800\,\mathrm {px}\). To fill the holes in \(V_f\), the Optical Flow [3] between the same image pair as before produces estimations for empty areas. The chosen approach for Optical Flow is to minimize an energy function of the whole image in order to get a dense and steady displacement field.

$$\begin{aligned} E(\mathbf w )&=\int _{\varOmega }\varPsi (|I(\mathbf x +\mathbf w )-I(\mathbf x )|^2)\mathrm {d}\mathbf{x }+\gamma \int _{\varOmega }\varPsi (|\nabla I(\mathbf x +\mathbf w )-\nabla I(\mathbf x )|^2)\mathrm {d}\mathbf{x }\nonumber \\&\quad +\alpha \int _{\varOmega }\varPsi (|\nabla u|^2+|\nabla v|^2)\mathrm {d}\mathbf{x } \end{aligned}$$
(12)

with the following comments:

$$\begin{aligned} \mathbf w =\left( \begin{array}{c} u\\ v\\ 1 \end{array}\right) \end{aligned}$$
(13)
$$\begin{aligned} \varPsi (s^2)=\sqrt{s^2+\epsilon ^2}\text { with }\epsilon =0.001 \end{aligned}$$
(14)

The first part of Eq. (12) describes the assumption of constant intensity distribution of a surface, the second term describes the existence of constant gradients along edges and the last one describes the smoothness of the velocity (displacement) field. A sensitivity analysis was performed to choose the parameters \(\alpha \) (regularization parameter) and \(\gamma \) (weight parameter) with two synthetic images and a disparity of \(50\,\mathrm {px}\) between the images.

$$\begin{aligned} \alpha =0.0197\text {, }\gamma =100 \end{aligned}$$
(15)

Furthermore a Gaussian pyramid with a scale factor of 0.8 was used to pursue a coarse-to-fine strategy in the algorithm. Areas of very low texture within the image that do not produce reliable results with the Block Matching algorithm can be closed with values from Optical Flow. The parameters of Semi-Global Block Matching originate from the OpenCV documentation [9].

The complete displacement field is defined as a summation of the single disparity maps.

$$\begin{aligned} V=V_f+V_m \end{aligned}$$
(16)

As explained above, Optical Flow and Semi-Global Block Matching carry out spatial smoothing because smooth objects cannot exist with large displacement jumps. To reduce more imperfections, temporal smoothing of the disparity maps is done by transforming displacement fields before and after the considered time step with Optical Flow to the same temporal position. Afterwards, a temporal averaging is implemented.

6 Triangulation and Masking

For every point x in the left camera image, the summation of this coordinate with the displacement results in the corresponding point x’ in the right camera image. With these point pairs x and x’ and the camera matrices \(P_i\), which were calculated by the calibration, the world coordinates (XYZ) can be triangulated using the optimal triangulation method [6] (see Fig. 5). Every point inside the camera image, with a corresponding point in the other image, can be reconstructed but in our experiments only the moving falcon is of interest. That’s why objects of interest are masked by thresholding the Optical Flows. In this case, only areas with velocities higher than a threshold are integrated into the mask. For the measurement of the Lanner-falcon, the bird is separated from the stationary background of the wind tunnel.

Fig. 5.
figure 5

Schematics of the triangulation with corresponding points

7 Sequence of a Wing Flap

The procedure to generate a 3D point cloud with two cameras at a time step \(t_0\) can be adapted to additional time steps (\(t_n\) with \(n>1\)) with the same operations. The most time-consuming and not automatic part of the reconstruction, the determination of the manual displacement field \(V_m\), is bypassed with Optical Flow. Correspondences from the first image are transformed with velocity fields to the next time steps, whereby the repeated selection of points is dropped. In Fig. 6, the downstroke of a wing is portrayed. The three reconstructions are taken from images with a time lag of \(\varDelta t=20\,\mathrm {ms}\).

Fig. 6.
figure 6

Sequence of a wing downstroke (\(\varDelta t=20\,\mathrm {ms}\))

Fig. 7.
figure 7

Upper and lower side of the reconstruction from different view points

8 Upper and Lower Wing Side

At least four cameras are needed to reconstruct the upper and lower side of a flapping bird wing. The calibration with a three dimensional two-plane pattern allows for the calibration of the cameras which see the front or the backside of the target. Hence, all cameras have the same world coordinate system. The upper and lower side can be completely determined independent from each other. In the last step, both point clouds are combined. In Fig. 7, the reconstruction of both sides is displayed. The contour and the shape of the bird body and wing are well developed.

9 Error Analysis

To estimate the uncertainty of the stereo reconstruction, the triangulated calibration point positions were compared with the real known world coordinates of these points. The mean, absolute difference in all three dimensions is less than \(1\,\mathrm {mm}\) (\(\overline{|\varDelta X|}=0.67\,\mathrm {mm}\), \(\overline{|\varDelta Y|}=0.46\,\mathrm {mm}\), \(\overline{|\varDelta Z|}=0.47\,\mathrm {mm}\)). As a reference the calibrated volume extends over \(385\,\mathrm {mm}\times 400\,\mathrm {mm}\times 340\,\mathrm {mm}\). Furthermore, the standard deviation (\(\sigma _X=0.11\,\mathrm {mm}\), \(\sigma _Y=0.12\,\mathrm {mm}\), \(\sigma _Z=0.19\,\mathrm {mm}\)) shows the same order of magnitude for the in-plane components \(\sigma _X\) and \(\sigma _Y\). The out-of-plane component is slightly larger in comparison to the values mentioned before, because an angle of around \(43^{\circ }\) was measured in the configuration between the cameras. Therefore, the accuracy of the \(\sigma _Z\) should be larger than the in-plane deviation. The mean absolute difference shows one problem of the reconstruction. Thin objects like single feathers at the trailing edge of the wing are highly affected by the reconstruction error because the thickness at this position on the wing is just one order of magnitude higher than the error. So the combination of upper and lower side leads to an overlap at these positions, which at the moment is corrected manually with the knowledge of physical boundaries (the leading and trailing edge of the upper and lower side cannot intersect). In Chap. 6 it is explained that two corresponding points on two camera sensors are needed to reconstruct a point in world space. The uncertainty in Z-direction drops due to the long baseline between the cameras in the measurement setup but both cameras do not show the same objects any more. As a result occlusions occur which produce ambiguity in the process of building the displacement field. Hence, wrong disparities, especially in the area of the trailing edge and the wing tip, result in fragments after the triangulation which do not belong to the surface. In Fig. 6 these outliers are visible. The possibility to use a third camera with a field of view on the wing tip should reduce the ambiguity. On the other hand, noise can be removed with statistical methods. For the k (i.e. in our experiment \(k=100\)) nearest neighbors, the mean distance \(\delta \) is calculated and the standard deviation \(\sigma _\delta \) for the distribution is determined. Points with distances outside of \(\delta \pm \sigma _\delta \) are trimmed [10].

10 Conclusion

This paper presented a stereoscopic method to reconstruct the surface of a bird wing recorded in free flight in a wind tunnel. The procedure allows for the determination of three dimensional point clouds for a given time sequence with minimal manual influence. It is possible to predict the form of the wing cross-section on different wing span positions from the determined surfaces of the upper and lower wing side in order to calculate aerodynamic properties according to airfoil theory. Also, structures like reverse flow breaks can be visualized with this method. Nevertheless, the approach is applicable to other objects or animals with highly textured surfaces.