Back to articles
Regular Articles
Volume: 1 | Article ID: jpi0109
Image
Color and Quality Enhancement of Videoconferencing Whiteboards
  DOI :  10.2352/J.Percept.Imaging.2018.1.1.010504  Published OnlineJanuary 2018
Abstract
Abstract

Whiteboards are commonly used as a medium of instant illustration of ideas during several activities including presentations, lectures, meetings, and related others through videoconferencing systems. However, the acquisition of whiteboard contents is inhibited by issues inherent to the camera technologies, the whiteboard glossy surfaces along with other environmental issues such as room lighting or camera positioning. The contents of whiteboards are mostly invisible due to the low luminance contrast and other related color degradation problems. This article presents an account of a work aimed at extracting the whiteboard image and consequently enhancing its perceptual quality and legibility. Two different methods based on color balancing and color warping are introduced to improve the global and local luminance contrast as well as color saturation of the contents. The methods are implemented based on different general models of the videoconferencing environment for avoiding color shifts and unnaturalness of results. Our evaluations, through psycho-visual experiments, reveal the significance of the proposed method’s improvements over the state of the art methods in terms of visual quality and visibility.

Subject Areas :
Views 51
Downloads 17
 articleview.views 51
 articleview.downloads 17
  Cite this article 

Carlos Andres Arango Duque, Mekides Assefa Abebe, Muhammad Shahid, Jon Yngve Hardeberg, "Color and Quality Enhancement of Videoconferencing Whiteboardsin Journal of Perceptual Imaging,  2018,  pp 010504-1 - 010504-13,  https://doi.org/10.2352/J.Percept.Imaging.2018.1.1.010504

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2018
  Article timeline 
  • received September 2017
  • accepted July 2018
  • PublishedJanuary 2018

Preprint submitted to:
jpi
Journal of Perceptual Imaging
J. Percept. Imaging
J. Percept. Imaging
2575-8144
Society for Imaging Science and Technology
1.
Introduction
Over the years, the popularity of videoconferencing systems and online content sharing for educational and commercial purposes have been greatly increased. Activities, such as lectures, brainstorming sessions, business meetings, and so on, are now proficiently taking place over the Internet. The use of whiteboards for dissemination of information and interactive discussions is also very common. In many scenarios, the video of the sessions as well as the handwritten content on the whiteboards are archived for later use.
The nature of today’s videoconferencing technologies allows users to use their preferred digital camera technology and conferencing environments. Therefore, the reproduction qualities of the corresponding whiteboard contents differ depending on the different capacities and settings of the digital cameras, the camera viewpoint, and the room lighting conditions. Most of the videoconferencing systems include low dynamic range digital camera technologies. The illumination condition of the video conferencing rooms is also usually not controlled professionally.
As a result, the acquisition results of such videoconferencing situations tend to have limited luminance contrast ratios which in turn limit the visibility of details, mainly in the poorly exposed regions [1, 2]. Whiteboard contents are additionally more prone to the lightning and environmental issues due to the glossy nature of their surfaces. Depending on the illumination direction and homogeneity as well as the observers’ viewpoint, different shading and specular reflections arise and lead to further contrast reduction, color clipping, and loss of details [3].
The reduced contrast range of the whiteboard environment, subsequently, blurs and reduces the saturation of pen-stroke content [4, 5]. However, a strong color saturation is found to be a determining factor for the appeal and naturalness of natural images [6, 7]. Therefore, further processing is required to enhance the overall appearance as well as the visibility of whiteboard contents.
Consequently, over the years, a variety of approaches for improving the qualities of videoconferencing whiteboard contents have been proposed. Most of the approaches were intended for whiteboard contents extraction and enhancements of their visual representations [811]. They were mainly intended for some type of whiteboard contents, mainly handwritten texts. Apart from content detection and recovery, other perceptual problems such as the truncated global luminance contrast of the whiteboards as well as the local contrast and color saturation of pen-stroke content were not well investigated [12, 13]. Additional processing artifacts, desaturated colors as well as visually non-discernible contents were observed in the results of some state of the art enhancement methods. More detailed discussion on these approaches is presented in the following section.
In this article, we have proposed a new and efficient post-processing framework for detection and better visual appearance enhancement of whiteboard contents. The framework includes novel ways of illumination, whiteboard background and contents’ color estimations as well as perceptual quality enhancement methods. The perceptual appeal and visibility of the whiteboard and its contents are improved by increasing their luminance contrast and color saturation. Unlike the prior methods, the proposed framework provides a general modeling of the different lightness and color variations in the whiteboard environment. The proposed models are also easily adaptable for other related imaging applications.
The performance of the proposed method is correspondingly evaluated in relation to the state of the art methods. The evaluation is performed visually and subjectively through several psycho-visual experiments. The experimental results verified that the proposed frameworks were more appealing and colorimetrically-accurate, together with visually readable results.
2.
Related Work
Several whiteboard enhancement methods for streaming videos have been introduced. The majority of the methods employed segmentation-based text recognition, multiple frame based content recovery, and color enhancement approaches.
The methods with segmentation-based text recognition approaches mainly consider the detection and extraction of texts from whiteboard images. Different geometric properties based classifications [8], connected components based segmentations [9], and Hidden Markov Model (HMM) based approaches [10] were utilized for identification, segmentation, and extraction of handwritten content. HMM was also used for further classification of handwritten content into different classes such as text, lines, circles, and arrows [11].
The multiple frame based content recovery approaches, on the other hand, rely on a sequence of whiteboard images, from consecutive frames of a video, to recover occluded contents and remove redundant data [14, 15]. Similarly, whiteboard capturing systems which can classify pixels into whiteboard background, pen-strokes, and foreground objects are also introduced [16, 17]. These methods additionally extract the pen-stroke content and process the background color of the whiteboard to be completely white.
However, the whiteboard images generated by most of the conventional video conferencing systems have very low image qualities. The contents of such whiteboard images are, most of the time, hard to visually discern much less computationally detect and extract. This type of quality degradation usually occurs due to the low capacities of the systems’ camera technologies and the uncontrolled illumination conditions of the videoconferencing rooms. Furthermore, the glossy material of the whiteboard surfaces usually creates different shading and over-exposure problems depending on the illumination direction and observers’ or camera viewpoints [1, 2].
All these problems of the camera, illumination conditions, and glossiness together contribute for the reduction of both global and local contrast of the whiteboard regions. The reduction of luminance contrast and other specularity related problems [4, 5, 18, 19] will then lead to color desaturation and blurring of fine whiteboard contents such as pen-stroke.
For this reason, for applications like videoconferencing, enhancements of the visibility and perceptual qualities of whiteboards along with pen-stroke content are very essential. Even so, only a limited number of color enhancement approaches have been proposed so far [12, 13]. The method proposed by Zhang et al. [12] is designed for detection, rectification, and enhancement of whiteboards. They used their estimated whiteboard background color for global white balancing of the entire whiteboard. The method introduced by Gormish et al. [13], on the other hand, extracts and enhances the pen-stroke content. A local median filter is used to estimate the whiteboard background and extract pens-stroke content. The color saturation of the pen-stroke content is then manipulated based on their proposed sigmoidal function.
Despite the appealing results of the Gormish et al. [13] method, the dependency of their enhancement method on their segmentation results may sometimes lead to invisible results and processing artifacts. Their sigmoidal saturation enhancement curve also does not model the perceptual saturation attribute more accurately. Therefore, it sometimes generates over-saturated and unnatural colored results. In addition, the white balancing method of Zang et al. [12] increases the global lightness of the whiteboard background and reduces the saturation of colored contents. Their processing mostly seen producing invisible contents.
In this regard, we have proposed more efficient color enhancement framework which better preserves the color appearance and enhances the visibility of the whiteboard contents. The enhancement process of the proposed method applies techniques such as local and global contrast enhancement and color warping. All the processing in our method are performed according to the characterization of the camera technology, the illumination condition, and the whiteboard background model. Accordingly, many of the problems of the prior enhancement methods are avoided. The complete framework of the proposed method is discussed as follows.
3.
Method
As described in the previous sections, in videoconferencing applications, the appearance of the whiteboard contents is usually degraded due to the poor qualities of the applied camera technologies and environmental lighting condition. Therefore, most videoconferencing systems require different post-processing stages to enhance the appearance as well as the overall qualities of whiteboard contents.
To this end, we have developed a new framework which includes system calibration as well as image quality enhancement methods. In the framework, the whiteboard region will be first cropped and rectified from the input video frame, based on perspective homographic transformation [2022]. The detection of the whiteboard region can be performed manually or using an automatic detection algorithm. Next, a system calibration, which is the modeling of the illumination axis and the whiteboard background, will be performed. The resulting models will then be used to enhance the rectified whiteboard image into a more visually appealing and readable image. The full workflow of the proposed framework is given in Figure 1 and its major components, the system calibration and quality enhancement modules, are discussed in the following sections.
Figure 1.
Flow chart of the proposed method.
3.1
System Calibration
The system calibration module of the proposed framework is used to estimate the whiteboard background luminance as well as the scene’s color information. The estimation is performed according to the linear mathematical model of shading and illumination axis estimation methods proposed by Tomazevic et al. [23] and Park et al. [24], respectively.
3.1.1
Whiteboard Background Luminance Estimation
In the linear shading model, the actual image captured by a camera I(x, y) is defined as in Eq. (1), where the shading-free image and the shading multiplicative and additive components are denoted as U(x, y), SM(x, y) and SA(x, y), respectively. Since whiteboards are physically built to be uniformly white, the additive component can be removed from Eq. (1).
(1)
I(x,y)=U(x,y)SM(x,y)+SA(x,y).
According to Zhang et al. [25], if the rectified whiteboard is divided into small rectangular cells, then a significant number of pixels in each cell belong to the background. Additionally the luminance of these background pixels, most of the time, would have the highest values and would be uniformly distributed in each cell. Therefore, the background luminance of the whiteboard could be computed as the collection of the luminances of the brightest pixels of all cells in the rectified image. However, such models usually lead to black hole artifacts whenever there is an object occluding the whiteboard.
To avoid such artifacts, we could model the luminance values of the entire whiteboard by fitting a polynomial surface in the luminance space. The surface fitting is applied by minimizing the sum square error of the function f, as given in Eqs. (2) and (3).
(2)
F=minifi2
where,
(3)
fi=ayi3+bxi3+cyi2xi2+dyixi2+eyi2xi+fyi2+gxi2+hyixi+iyi+jxizi
and the 3D points {(xi, yi, zi)|i = 1, …, n}, xi, yi represent the whiteboard cell coordinates and zi their respective luminance values.
The proposed third degree polynomial surface is able to accurately approximate the whiteboard luminance values as long as there are not outliers. However, during the presence of specular highlights, occlusions, and drawings the least-square minimization technique leads to biased results. An example of such results is given in Figure 2(b). To avoid errors due to these types of outliers, we use a more robust technique called Least Median Squares (LMedS). This method aims to minimize the median rather than the sum of squared errors:
(4)
F=minmedi=1,,nfi2.
However, this cannot be solved with a closed solution thus we have applied a Monte Carlo type minimization technique proposed by Rousseuw et al. [26]. The procedure goes as follows:
Choose a sub-sample of p points from the whiteboard background (p = 10 for a 3rd degree polynomial surface).
Estimate the surface using Eq. (2).
Determine the median of squared residuals
(5)
Mj=medi=1,,nfi2(pj,zj).
Repeat the previous steps N times and retain the estimate pj for which Mj is minimal
where N is:
(6)
N=log(1P)log(1(1ε)p)
where P is the probability that at least one of the N sub-samples is a good set of points to model the surface and ε is the percentage of outliers in the sample. However, [27] pointed out that the LMedS efficiency suffers in the presence of Gaussian noise. To compensate for this drawback the next step is to carry a weighted least-squares procedure. First we calculate the so-called robust standard deviation:
(7)
α̂=1.4826Mj
where Mj is the minimal median of squared residuals (calculated in Eq. (5)) and the constant 1.4826 is a coefficient to achieve the same efficiency as a least-squares method in the presence of only Gaussian noise and no outliers. Finally we fit the surface using weighted least-squares method and the robust standard deviation α̂, thus:
(8)
F=minpiwifi2
where,
(9)
wi=1fi2(2.5α̂)20,otherwise .
The described new median based weighted minimization method is able to ignore outliers as well as the effect of Gaussian noise from the input image. The surface fitting results, provided in Fig. 2(c), show the accuracy of the new method for fitting whiteboard luminance surfaces even in the presence of specular highlights.
Figure 2.
Whiteboard surface fitting results. The result shows that the simple least-square minimization method fails to correctly fit the luminance of the outlier regions, whereas the proposed method accurately fits the whiteboard luminance surface (mesh surface), shown in Fig. 2(a), even with the presence of highlights. (a) Captured whiteboard image with two strong specular reflection areas. (b) Three attempts to model the luminance using least squares, along the dashed lines of (a). (c) Proposed LMedS results of the whiteboard surface.
3.1.2
Whiteboard Background Color Estimation
Whiteboards are mostly smooth white surfaces with some shaded regions. If the color values of whiteboard image pixels are visualized in the sRGB color space, they should ideally form a cloud of points surrounding the neutral gray axis [28]. However, in reality, the points form a contained ellipsoidal distribution with its semi-major axis oriented around illumination axis el  [24] which is defined by the color of the scene dominant illuminant (see Figure 3(b)).
In case of a single illuminant, the axis of illumination can be estimated as the intersection between two planes which, in our case, are red versus blue (rb) and green versus blue (gb). A plane is mathematically defined using the slope-intercept formula. Therefore, the slope and intersect values of the two planes (mrb, mgb, crb, cgb) that best approximate the Ri, Gi and Bi color values of the whiteboard pixels can be computed by a least-square optimization of the two functions given in Eq. (10).
(10)
R11R21Rn1mrbcrb=B1B2BnG11G21Gn1mgbcgb=B1B2Bn.
The intersection of the resulted planes, given the B values, is computed based on Eq. (11) and example results are presented in Fig. 3(c). The computed intersection is the representation of the whiteboard background color orientation.
(11)
R=Bcrbmrband G=Bcgbmgb.
Figure 3.
Whiteboard background color estimation. The figures show that from the whiteboard background pixels (a) a color cloud can be estimated (b). However it is not aligned with the neutral gray axis (blue line in (b)). On the other hand, the proposed illumination axis (the black line in (c)) is able to align with the color cloud.
The illumination axis orientation vector e can be further computed based on the color of the illuminant and the darkest color of the illumination axis, as given in Eq. (12). The color of the illuminant We and the darkest color Ble are calculated by intersecting the illumination axis with the surfaces of RGB color cube and computing the color value with single channel saturation and lowest intensity values, respectively. The equations used for the computation are given in Eqs. (13) and (14).
(12)
e=WeBleWeBle
(13)
We=Wr=1,mrb+crbcgbmgb,mrb+crbif max(Wr)1Wg=mgb+cgbcrbmrb,1,mgb+cgbif max(Wg)1Wb=1crbmrb,1cgbmgb,1if max(Wb)1
(14)
Ble=Blr=0,crbcgbmgb,crbif max(Blr)0Blg=cgbcrbmrb,0,cgbif max(Blg)0Blb=crbmrb,cgbmgb,0if max(Blb)0.
3.1.3
Pen-stroke Color Estimation
Similar to the whiteboard color, the dry-erase marker color is also dependent on the intensity and color of the illuminant. Hence, the pen-stroke colors can be as well modeled around the illumination axis. Figure 4 is provided as an example for our assumption. As is shown in the figure, the RGB values of the pen-stroke colors are distributed and concentrated around the illumination axis. Therefore, the color and lightness components of any pen-stroke color on a given whiteboard can be mathematically modeled according to the illumination axis discussed in the previous sections.
Figure 4.
Pen-stroke color segmentation and their representation in RGB color space. The black line in the RGB cube plot represents the illumination axis.
The lightness value Le of a given pen-stroke point Fe = [Re, Ge, Be] in the RGB color space can be computed as an orthogonal projection of its vector Fe onto the illumination axis ei as given in Eq. (15). The chroma of the pen-stroke color, on the other hand, is represented as the shortest distance to the illumination axis. This is calculated as the euclidean norm of the vector rejection ce, defined by Eq. (16), of the color vector Fe. The computed vector projection and rejection components can be combined later to generate the original vector, Fe=ce+Le (see Figure 5(a)).
(15)
Le=(Fe.ei)ei
(16)
ce=FeLe.
The hue component of a pen-stroke color can also be represented in terms of the illumination axis. Unlike the lightness and chroma components the hue angle is computed by shifting and transforming the pen-stroke and illumination colors RGB values Fe=[Re,Ge,Be] and We to rg chromaticity space, so that the illumination axis starts at the origin (see Fig. 5(b)). The used transformation formulas are given in Eqs. (17) and (18).
(17)
rf=ReRe+Ge+Begf=GeRe+Ge+Be
where Fe = FeBle
(18)
rw=RwRw+Gw+Bwgw=GwRw+Gw+Bw
where We = WeBle.
The hue value He in the resulted rg color space is then given by Eq. (19), which is the representation of the angle of the chromaticity values with respect to the shifted illumination axis white value We as the rotation axis.
(19)
He=arctangfgwrfrw.
Figure 5.
Computation of pen-stroke color components with respect to the illumination axis. (a) Chroma and lightness computations and (b) hue angle computation in the rg chromaticity space.
3.2
Quality Enhancement
The system calibration module of the proposed method, discussed in the previous sections, models the pen-stroke color, illumination, and background color of the detected whiteboard. The resulting models can be applied to enhance the appearance of the overall whiteboard image as well as increase the readability and visibility of the pen-stroke texts and diagrams drawn on the whiteboard. The image enhancement module of the proposed framework (Fig. 1) aims to accomplish such tasks. Given the image cells, the background model and the masked pen-stroke obtained from the previous module, the image enhancement module lighten the whiteboard by rotating the color clusters toward the neutral gray axis. It also contains a contrast enhancement method to increase the contrast of the whiteboard image together with a color warping method which is intended to enhance the visibility of pen-stroke color content. The processing flow of the module and its main components are depicted in Figure 6 and their detailed description is provided in the following sections.
Figure 6.
Flow chart of the image enhancement module of the proposed method.
3.2.1
Contrast Enhancement
As is known, contrast is one of the determining factors of image appearance and quality. To have a more appealing and visible whiteboard image, enhancing the contrast and lightening the background is necessary. The contrast enhancement was able to be achieved by rotating the artifact free whiteboard image contents and background color clusters toward the neutral gray axis. The rotation is based on color cluster rotation technique [28], which enables us to create a rotation matrix based on the illumination axis ei and the neutral gray axis wi given by (33)[1,1,1] in the RGB color space.
The Rodrigues rotation formula [29], given in Eq. (21), is used to generate the rotation matrix R3 for a ϕ angle rotation around the normalized unitary vector K. The angle of rotation ϕ and the rotation axis K are in turn computed as a dot product and a normalized cross product of the illumination axis ei and wi as shown in Eq. (20).
(20)
K=wi×eiwi×eicos(ϕ)=wiei
(21)
R3(ϕ,K)=I3sin(ϕ)K+(1cos(ϕ))K2
where, I3 is the 3 × 3 identity matrix and the cross-product vector K in a matrix form of the normal vector K = [kx, ky, kz] is given by Eq. (22).
(22)
K=0kzkykz0kxkykx0.
Finally, the resulting matrix is applied to rotate the whiteboard image and estimated background color clusters at the origin of the color space Pc by first translating the colors to the origin using the black value Ble of the illumination axis, given in Eq. (14). To further enhance the contrast and whiten the whiteboard background, each cell of the whiteboard image is scaled according to the estimated background F. Therefore, the rotated pixel values Pcnew and the scaled values Pcw are mathematically expressed as follows.
(23)
Pcnew=R3(ϕ,K)(PcBle)
(24)
Pcw=min1,PcnewR3(ϕ,K)(FBle).
3.2.2
Pen-stroke Color Identification
The described conversion of the whiteboard background to a lighter color and the contrast stretching methods lead to more enhanced image results. However, the pen-stroke colors sometimes appear to be washed out and less saturated. We find that further enhancement and correction of the pen-stroke color lightness and saturation attributes are essential for getting more visually appealing and readable whiteboard contents. To be able to do such enhancements, the pen-stroke colors present in the considered whiteboard image have to be first detected and identified.
We first extract the pen-stroke pixels from the whiteboard image using the foreground mask, which is created by further refinements of the pen-stroke mask (described in the previous System Calibration section). The mask is refined based on further segmentation of the pen-stroke mask of each image cell according to Otsu’s thresholding method [30] and ignoring connected components that are smaller than one cell. We assumed that a normal pen-stroke would normally cover at least two consecutive cells.
Once the pen-stroke pixels are detected and extracted, we have performed an automatic identification of the color of the existing pen-stroke pixels in the whiteboard image. The color identification is performed in the chroma and hue space, discussed in the System Calibration section. Among the pen-stroke pixels, those with a chroma value greater than the chroma threshold Tch, which is the mean value of the 10% highest chroma values, will be selected. The hue angles of the selected pixels are then computed, using Eq. (19), and their hue distribution is analyzed in an unfolded angular histogram of hue. The most frequently appeared hue values would appear as peak values in the hue histogram. The major hues in the extracted pen-stroke pixels are then selected accordingly and the pen-stroke pixels which have hue values closer to these hues will be identified.
This type of hue histogram and chroma only thresholding method usually fails to detect black pen-stroke pixels because they do not have hue and their chroma values are normally very low. Therefore, their tendency to form peaks in the hue histogram is low. However, black pen-stroke pixels also have very low lightness values and the combination of chroma and lightness thresholding will lead to an accurate black pen-stroke pixels identification. Therefore, from the first detected pen-stroke pixels, we have chosen the black pixels by selecting the ones with less lightness and chroma values than the chroma and lightness thresholds computed as the mean value of the 10% lowest chroma values and mean value of the lightness of the pixels with the 10% lowest chroma values.
3.2.3
Color Warping
As discussed in the Introduction, professionally uncontrolled video acquisition with low dynamic range devices usually leads to color clipped and desaturated whiteboard pen-stroke content. According to related perceptual studies, less saturated contents are also found to be less appealing than the well-saturated ones [31]. To that end, the final stage of the proposed method is intended to the enhancement of the saturation and lightness values of the identified pen-stroke pixels as well as the background pixels of the rectified whiteboard image. To have a more perceptually appealing result, the RGB values of the pixels corresponding to these lightness and saturation values are transformed to new and more appropriate values in the sRGB color space. We have applied a method known as color warping [32] in order to accomplish such types of RGB value transformations.
Color warping is an image processing technique which transforms a set of source color values to another set of destination color values such that:
Colors close to a given source color mapped close to the corresponding destination color.
Colors that have the same distance to two source colors are influenced equally by the two source/destination pairs.
Colors are influenced more by closer source colors.
In our case, the pen-stroke colors together with the background pixel colors computed in the previous sections are considered as source colors (PcS). The destination colors, on the other hand, are created by projecting the source colors into those of more appropriate saturation and lightness values, using Eq. (25).
(25)
PcD=β1Le(i)ei+β2Ce(i)ci
where, Le, Ce, ei and ci are the lightness, chroma, the normalized projected vector of the neutral gray axis and normalized rejected vector, respectively. Additionally, the β1 and β2 values are set to be 0.75 and 3.0 for colored pen-stroke pixels and 0.75 and 0.0 for black pen-stroke colors, respectively. The values are chosen in such a way that the colored pixels are paired to more saturated and contrasted colors, whereas the black pixels are paired with stronger blacks. The background pixels, on the other hand, are mapped to destination neutral white ([1,1,1]).
Finally, the input pixel colors Pcw are warped to the computed destination colors Pcwarped according to Eq. (26). Where i ranges from 1 to the number of pixels in the source and destination sets (N). The two weighting values w1 and w2, on the other hand, represent the normalized inverse distance between the input color and the ith source color and the exponential function of the distance which corresponds to a normal distribution with a standard deviation α. Mathematically, they are computed as in Eqs. (27) and (28), where the variables di and δ(i, imin) represent the euclidean distance between the input pixel color and the ith source color and the Kronecker delta function, respectively. The Kronecker delta is a function of two variables and returns 1 if the variables are equal, and 0 otherwise. imin in our method is determined as the index of the source color which is more closer to the input color.
(26)
Pcwarrped=Pcw+i=1Nw1(i)w2(i)(PcDiPcSi)
(27)
w1(i)=1dii=1N1di,if min(di)>0δ(i,imin),if min(di)=0
(28)
w2(i)=edi22α2.
4.
Results and Discussion
As discussed in our Introduction, a number of quality degradation occur during the reproduction process of whiteboard contents. Over the years many quality enhancement methods were proposed to address the problem. As we explained it in our literature review, we find the methods proposed by Gormish et al. [13] and Zhang et al. [25, 33] to be more related to the proposed method.
Therefore, to show the significance of the improvements of our approach compared to the aforementioned methods, we have conducted a series of evaluations. First we have done a visual comparison of the enhanced results of the different methods for some example whiteboard images. Later, we have performed an extensive subjective evaluation of the methods by means of a psycho-visual experiment. In both evaluations, the image appearance of the enhanced images was considered. The image appearance comparison is done by evaluating the visual appeal, overall contrast, and color reproduction accuracy.
The implementations of the methods proposed by Gormish et al. [13] and Zhang et al. [25, 33] were done as per the descriptions provided by their respective papers. We used the default parameters given in their papers to process the experimental images. The process and the results of our visual and subjective evaluations are discussed in the following section.
4.1
Visual Evaluations
As a preliminary evaluation, we go through the evaluated method results of some example images and visually compared their qualities.
Let us take an image containing some writing text. In most of the cases, all the methods generate discernible contents as shown in Figure 7. However the method of Gormish et al. creates discontinuities in the pen-stroke lines when their Canny edge detector is not able to properly segment the pen-strokes. Furthermore, in some cases it introduces a color change for saturated pen-stroke content (see Fig. 7(c)). Another thing to consider is that green marker pen-stroke is normally not as saturated as the other colors, thus, in these cases, the method of Zhang et al. leads to loss of details and less legibility (See Fig. 7(d)).
Figure 7.
Example enhancement results. Our proposed method results in a better contrast while preserving more details.
Now let us see an image with both writing and drawing contents. As it can be seen in Figure 8, our proposed method results in a sharper, higher contrast, and more saturated image than the other methods. Our result has a bright and more realistic background and the contrast between the background and the pen-stroke colors is much higher. The pen-stroke colors are reproduced more accurately and with more saturation, as it can be seen in Fig. 8(b). The method by Gormish et al. also results in a good contrasted image and saturated pen-stroke colors. However, the method tends to introduce color shifts and some visually disturbing artifacts. For example, in Fig. 8(c), the method enhances the borders and certain areas of filled image regions creating visually discomforting holes. This is mainly due to the limitations of the Canny edge detector method included in their algorithm. There are also visible random color variations between the pen-stroke pixels and also white halos appear around the letters which are undesirable effects from their enhancement curve and segmentation methods. The method by Zhang et al., on the other hand, gives lower contrast between the pen-stroke color and the background. The pen-stroke colors are more desaturated and the colors are completely changed in some parts (see the colors of the rectangular boxes in Fig. 8(d)).
Figure 8.
Example enhancement results. The result of the proposed method looks more real and visually appealing.
4.2
Subjective Evaluation: Psycho-visual Experiment
According to our initial impressions from the examples presented in the previous section, the proposed method seems to show a significant improvement over the evaluated state of the art methods. To validate whether the results of our method actually are, according to perceptual principles, different from previous methods, we have performed two psycho-visual experiments under two different circumstances:
Letters experiment: the whiteboards contain only thin pen-stroke (writing). (See Figure 9(a)–9(d).)
Figures experiment: the whiteboards contain both thin pen-stroke and painted figures. (See Fig. 9(e)–9(h).)
The methodology used for both experiments is called pair comparison in which observers judge quality based on a comparison of image pairs, and the observer is asked which image in the pair is the best according to a given criterion [34]. In our experiment, the observers judgment was made based on their overall preferences according to the color appearance as well as readability of the contents. The method was chosen due to its simplicity. The fact that we only needed few number of stimuli and compared few enhancement methods, observers were able to complete both experiments in a short period of time with no stress and fatigue. In the experiments the observer is forced to make a decision and cannot judge the two images as equals (forced-choice). Participants were shown whiteboard images of the same scene processed in three different ways (using our proposed method, Gormish et al. [13] and Zhang et al. [25, 33]), and the input image itself. During both experiments, the observers were asked to choose one of the paired images which they thought was more visually appealing.
Figure 9.
The input experimental stimuli used for the subjective evaluation of the proposed method. Stimuli (a)–(d) and (e)–(h) are half of the original input stimuli used for experiment figures and letters, respectively. All the input stimuli are generated by cropping the whiteboard section of the captured images.
4.2.1
Experimental Procedure
For our experiments, we have created two whiteboard image databases: one contains whiteboard images captured in rooms without windows and controlled lighting (the room lighting depends on a single illuminant) and the other contains images captured in different classrooms, study rooms, and offices of Gjøvik University College with windows (the lighting conditions are affected both by lamps installed in the room and daylight). We captured our images using a USB3 Vision Flea3 camera, from the company Point Grey, at a spatial resolution of 4096 × 2160 pixels. The Flea3 is a high resolution camera with a maximum frame rate of 21 fps and it uses a CMOS Sony IMX121 sensor with a pixel size of 1.55 micrometers.
From all the images of the two databases, we select eight sets of test images which contain only handwritten texts for the first experiment and another eight sets of test images containing both handwritten texts and drawings using different colored markers for the second experiment. Some of the original input experimental stimuli are presented in Fig. 9.
We have a total of 18 non-expert observers volunteered for each experiments. Both experiments were conducted in the same experimental room with the same lighting condition. We also make all our observers to take the test on the same desktop computer. The Dell Inspiron 546 desktop computer with a 20-inch, 16:9 wide screen and high definition Dell monitor of sRGB color gamut was used. Also, to run the pair comparison methodology, QuickEval, which is the web-based psychometric evaluation tool provided by the Norwegian Color and Visual Computing Laboratory, is used.
The selection of each observer is stored in an n × n frequency matrix (where n stands for the number of evaluated methods) in which the value 1 is stored in row i and column j of the matrix when reproduction of method i is selected over reproduction of method j. Finally, the percentage frequency matrix is computed by averaging the frequency matrix of all the participated observers and stored for further analysis. If there is a reference condition, such as a non-distorted image, it should be put in the matrix as the first condition in the first row and column.
4.2.2
Experimental Results
The results of the subjective experiments were analyzed by estimating which portion of the population will select one method over another. To do this, pairwise comparison data was scaled in Just-Objectionable-Difference (JOD) units (Tables I and II) under Thurstone Case V assumptions [34], where the difference in 1 JOD unit corresponds to 75% of observers selecting one algorithm over another. To scale the pairwise comparison data in JOD units, we applied Maximum Likelihood Estimation (MLE) method of Perez-Ortiz and Mantiuk [35] which is based on the classical method of Sylverstein and Farrel [36]. The error bars in Figure 10 denote 95% confidence intervals computed by bootstrapping. One thing to take in mind is that absolute JOD values are arbitrary and only the relative differences are relevant. (That is the reason why we assign to the first condition (original images) in Tables I and II a fixed quality value of 0 but for Fig. 10 we assign positive values instead.)
Table I.
JOD scores of the letters experiment.
Lower JOD limitJOD scoreUpper JOD limit
Original000
Proposed method 1.7256 2.0089 2.3578
Gormish et al.−1.0253−0.4818 0.0581
Zhang et al.−2.5442−2.0070−1.4348
Table II.
JOD scores of the figures experiment.
Lower JOD limitJOD scoreUpper JOD limit
Original000
Proposed method 1.121 1.4588 1.7522
Gormish et al.−0.3421 0.2390 0.8003
Zhang et al.−1.8072−1.2496−0.7077
The values show that the proposed method’s results were preferred more than that of all the other methods for both experiments. The results of the method proposed by Gormish et al. were preferred more than the original images for the figures experiment but not for the letters experiment. The results of Zhang et al.’s method, on the other hand, were the least preferred in both experiments.
Figure 10.
Visualization of the scaling results and confidence intervals for the chosen dataset. Note that there is no confidence interval for the first condition, as this is always set up to a fixed value (since scores are relative).
We have to be careful not to use the calculated confidence intervals to infer statistical significance of the quality difference. Since all conditions are “linked” to each other by pairwise comparisons, changing the value of one condition will “push” the values of all directly or indirectly linked conditions. This correlation between conditions can be captured in a covariance matrix Σ. If we want to reject H0 that the difference in JOD scores between two conditions is 0, we need to compute the variance for that difference [35]. As we can see from Figure 11, we can determine that, with the exception of the Original-Gormish pair, the difference for any pair of conditions was statistically significant.
Figure 11.
Graphical representation of the scaling. Red points represent conditions, and they are only connected to their neighbors, as these are usually the comparisons in which we are most interested. Blue solid lines represent statistically significant differences, as opposed to red dashed lines.
We decided to test the methods performance under different lighting conditions. For that purpose the datasets were divided between the images captured in rooms with no windows (controlled illumination) and the images captured in rooms with windows (the illumination condition is affected by daylight). As we can see in Figures 12 and 13, the results of our method and the results of Zhang et al.’s method remained consistent in both types of illuminations compared to the previous experiment. On the other hand, the results of the method proposed by Gormish et al. seem to be affected by the change in illumination, becoming even less preferred in rooms with uncontrolled illumination.
Figure 12.
Visualization of the scaling results and confidence intervals for the chosen dataset.
Figure 13.
Visualization of the scaling results and confidence intervals for the chosen dataset.
In summary, the method of Gormish et al. showed a great variability in its results. Since the method usually generates results with higher contrast between pen-stroke and the background of the whiteboard, there were some instances where the reproductions of Gormish et al. lead to preferred results over the other methods. However, its reliability on segmentation for the enhancement process resulted in images with diminished legibility. Their S-shaped enhancement curve also resulted in images with over-saturated (and sometimes altered) colors that looked unnatural and unpleasant for the observers. Generally, from the experimental results, we were able to observe that the performance of their algorithm depends on the type of contents and scene contrast. The method generates better results for lines than filled shapes. It also usually performs better when the lines in the whiteboard have strong contrast relative to the background.
However, the variability of observer responses for the results of Zhang et al.’s method is found to be low. This means that the observers consistently prefer the results of the other methods over the results of Zhang et al.’s method. Their method has a tendency of producing images with low contrast and contents which are almost indiscernible, which is usually the result of their enhancement method that increases the lightness and the uniformity of the background while reducing the saturation of the pen-stroke content.
5.
Conclusion
The problem of whiteboard content appearance and quality degradation, which mainly occurs during the acquisition process of videoconferencing systems, is considered. A review of several approaches applied to overcome such problems and a discussion of their respective advantages and limitations is presented. Furthermore, we have proposed a novel post-processing approach which is designed to enhance the perceptual appearance of a whiteboard image as well as its contents. The proposed method first extracts and rectifies the whiteboard image from a given video frame and models the perceptual appearance of the scene illumination, the whiteboard background, and the pen-stroke colors of the whiteboard contents. The resulted models are then used to enhance the contrast and color saturation of the whiteboard image. The method enhances the contrast and lightens the whiteboard background by rotating the corresponding whiteboard image color clusters toward the neutral gray axis. The visibility of pen-stroke color content was also further enhanced through a method called color warping. Finally, the significance of the proposed method’s appearance and legibility enhancements relative to the input image as well as other state of the art methods’ results were visually and psycho-visually evaluated and confirmed.
6.
Future Work
As we have shown in the results section, the proposed method is able to significantly improve the performances of other state of the art whiteboard content enhancement methods. The method leads to more perceptually appealing and readable whiteboard image results. However, the method is only intended to consider one illumination condition and the whiteboard detection process is performed manually.
Therefore, concerning the post-processing and enhancement of whiteboard images, there is more to be done in the near future. For example, we think that the color saturation and contrast of the pen-stroke colors can be even more increased while keeping the color fidelity of the system using different hue preserving color image enhancement techniques [37, 38]. Additionally, the use of a more perceptually accurate, uniform and device independent color space should be considered for most of the enhancement operations. Such color space, usually leads to more generalized solutions and more perceptually accurate results than the device dependent sRGB color space.
The whiteboard detection method can also be fully automated and the entire method can be extended to be applicable in the presence of multiple illuminations using multiple illuminant and their spatial distribution estimation methods [39, 40]. Other additional features such as correction of color clipped and over-exposed whiteboard regions due to specular reflections can also be added for even more enhanced and appealing results.
References
1ReinhardE.WardG.PattanaikS.DebevecP.2005High Dynamic Range Imaging: Acquisition, Display, and Image-Based LightingThe Morgan Kaufmann Series in Computer GraphicsMorgan Kaufmann Publishers Inc.San Francisco, CA, USA
2BanterleF.ArtusiA.DebattistaK.ChalmersA.Advanced High Dynamic Range Imaging: Theory and Practice20111st ed.A. K. Peters, Ltd.Natick, MA, USA
3AbebeM. A.BoothA.KervecJ.PouliT.LarabiM.-C.2018Towards an automatic correction of over-exposure in photographs: Application to tone-mappingComputer Vision and Image UnderstandingVol. 1683203–20Elsevier B.V.Special Issue on Vision and Computational Photography and Graphics. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1077314217300954
4FairchildM.Color Appearance ModelsThe Wiley-IS&T Series in Imaging Science and Technology2005WileyChichester[Online]. Available: https://books.google.no/books?id=8_TxzK2B-5MC
5SharmaG.Digital Color Imaging Handbook2002CRC Press, Inc.Boca Raton, FL, USA
6F. E. A., de Ridder Huib, B. F. J. J., “Chroma variations and perceived quality of color images of natural scenes,” Color Res. Appl. 22, 96–110 (1998)
7Huib de RidderE. A. F.BlommaertF. J.1995Naturalness and image quality: chroma and hue variation in color images of natural scenesProc. SPIE241124112411-112411–2411-11
8LuH.KowalkiewiczM.2012Text segmentation in unconstrained hand-drawings in whiteboard photosInt’l. Conf. on Digital Image Computing Techniques and Applications (DICTA), Fremantle, WA, Australia161–6IEEEPiscataway, NJ
9PlotzT.ThurauC.FinkG. A.2008Camera-based whiteboard reading: New approaches to a challenging taskInt’l. Conf. on Frontiers in Handwriting Recognition, Montreal, Canada385390385–90
10WieneckeM.FinkG. A.SagererG.2003Towards automatic video-based whiteboard reading2003 Proc. Seventh Int’l. Conf. on Document Analysis and Recognition, Edinburgh, UKVol. 1879187–91IEEEPiscataway, NJ
11VajdaS.RothackerL.FinkG. A.2012A Method for Camera-Based Interactive Whiteboard Reading112125112–25SpringerBerlin, Heidelberg
12ZhangZ.HeL.-W.2007Whiteboard scanning and image enhancementDigit. Signal Process.17414432414–3210.1016/j.dsp.2006.05.006
13GormishM.ErolB.Van OlstD. G.LiT.MariottiA.2011Whiteboard sharing: capture, process, and print or emailProc. SPIE787978790D
14DicksonP. E.AdrionW. R.HansonA. R.2008Whiteboard content extraction and analysis for the classroom environment2008 Tenth IEEE Int’l. Symposium on Multimedia702707702–7IEEEPiscataway, NJ
15DicksonP. E.KondratC.AdrionW. R.RichardsT. D.SzetoR. B.2016Improved whiteboard processing for lecture captureIEEE Int’l. Symposium on Multimedia (ISM)649654649–54IEEEPiscataway, NJ
16HeL. W.LiuZ.ZhangZ.Why Take Notes? Use the Whiteboard Capture System,” Technical Report MSR-TR-2002-89, Microsoft Research, September 2002, [Online; accessed 12-April-2017]
17HeL. W.ZhangZ.2007Real-time whiteboard capture and processing using a video camera for remote collaborationIEEE Trans. Multimedia9198206198–20610.1109/TMM.2006.886385
18PeliE.ArendL.LabiancaA. T.1996Contrast perception across changes in luminance and spatial frequencyJ. Opt. Soc. Am. A13195319591953–910.1364/JOSAA.13.001953[Online]. Available: http://josaa.osa.org/abstract.cfm?URI=josaa-13-10-1953
19FromeF. S.BuckS. L.BoyntonR. M.1981Visibility of borders: separate and combined effects of color differences, luminance contrast, and luminance levelJ. Opt. Soc. Am.71145150145–5010.1364/JOSA.71.000145[Online]. Available: http://www.osapublishing.org/abstract.cfm?URI=josa-71-2-145
21CriminisiA.ReidI. D.ZissermanA.1999A plane measuring deviceImage Vis. Comput.17625634625–3410.1016/S0262-8856(98)00183-8
22GetreuerP.2011Linear methods for image interpolationImage Processing On Line
23TomazevicD.LikarB.PernusF.2000A comparison of retrospective shading correction techniquesProc. 15th Int’l. Conf. on Pattern Recognition. ICPR-2000, Barcelona, SpainVol. 3564567564–7IEEEPiscataway, NJ
24ParkJ. B.2003Efficient color representation for image segmentation under nonwhite illuminationProc. SPIE5267163174163–74
25ZhangZ.HeL.-w.2007Whiteboard scanning and image enhancementDigital Signal Processing17414432414–3210.1016/j.dsp.2006.05.006
26RousseeuwP. J.1984Least median of squares regressionJournal of the American Statistical Association79871880871–8010.1080/01621459.1984.10477105[Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/01621459.1984.10477105
27RousseeuwP. J.LeroyA. M.Robust Regression and Outlier Detection2005Vol. 589John Wiley & SonsNew York
28PaulusD.CsinkL.NiemannH.“Color cluster rotation,” Proc. Int’l. Conf. on Image Processing (ICIP), (IEEE, Piscataway, NJ, 1998). [Online]. Available: http://www5.informatik.uni-erlangen.de/Forschung/Publikationen/1998/Paulus98-CCR.pdf
29FaugerasO.1993Three-Dimensional Computer VisionArtificial IntelligenceMIT PressCambridge, MAa Geometric Viewpoint
30OtsuN.1979A threshold selection method from gray-level histogramsIEEE Trans. Syst. Man Cybern.9626662–610.1109/TSMC.1979.4310076[Online]. Available: http://dx.doi.org/10.1109/TSMC.1979.4310076
31AbebeM. A.PouliT.LarabiM.-C.ReinhardE.2017Perceptual lightness modeling for high-dynamic-range imagingACM Trans. Applied Perception (TAP)15110.1145/3086577
32HardebergJ. Y.FarupI.KolsyvindStjernvangG.2002Color management for digital video: Color correction in the editing phaseAdvances in Graphic Arts and Media Technology, Proc. 29th IARIGAI Research Conf.166179166–79 http://english.hig.no/content/download/28949/331048/file/Hardeberg2002c.pdf
33ZhangZ.wei HeL.2004Note-taking with a camera: whiteboard scanning and image enhancement2004 IEEE Int’l. Conf. on Acoustics, Speech, and Signal ProcessingVol. 3IEEEPiscataway, NJpp. iii–533–6
34ThurstoneL. L.1927A law of comparative judgementPsychological Review34273286273–8610.1037/h0070288
35Perez-OrtizM.MantiukR. K.“A practical guide and software for analysing pairwise comparison experiments,” arXiv:1712.03686 [cs, stat], Dec. 2017, arXiv:1712.03686. [Online]. Available: http://arxiv.org/abs/1712.03686
36SilversteinD. A.FarrellJ. E.2001Efficient method for paired comparisonJ. Electron. Imaging10394399394–910.1117/1.1344187[Online]. Available: https://www.spiedigitallibrary.org/journals/Journal-of-Electronic-Imaging/volume-10/issue-2/0000/Efficient-method-for-paired-comparison/10.1117/1.1344187.short
37NaikS. K.MurthyC. A.2003Hue-preserving color image enhancement without gamut problemIEEE Trans. Image Process.12159115981591–810.1109/TIP.2003.819231
38GoraiA.GhoshA.2011Hue-preserving color image enhancement using particle swarm optimizationIEEE Recent Advances in Intelligent Computational Systems563568563–8IEEEPiscataway, NJ
39BeigpourS.RiessC.van de WeijerJ.AngelopoulouE.2014Multi-illuminant estimation with conditional random fieldsIEEE Trans. Image Process.23839683–9610.1109/TIP.2013.2286327
40ZikoI. M.BeigpourS.HardebergJ. Y.2014Design and creation of a multi-illuminant scene image datasetImage and Signal Processing, Lecture Notes in Computer Science8509531538531–8