Robust Multi-modal Detection of Industrial Signal Light Towers

Mataré, Victor; Niemueller, Tim; Lakemeyer, Gerhard

doi:10.1007/978-3-319-68792-6_35

Victor Mataré¹⁷,
Tim Niemueller¹⁷ &
Gerhard Lakemeyer¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9776))

Included in the following conference series:

Robot World Cup

1918 Accesses

Abstract

Introducing robots to provide flexible logistics in a smart factory and cohabitation of robot workers and human operators will require robots to recognize and interpret the same cues in the environment as humans do. In this paper, we describe a novel method to detect machine light signal towers as one such cue that are frequently seen on production machines. It uses color information to determine basic regions of interest and applies a number of spatial constraints to make it robust against many common disturbances. As an option, the algorithm can use laser data for machine-specific reduction of the search space for a speed up by an order of magnitude providing fast, accurate, and robust detection. It recognizes the respective activation states and even blinking lights.

You have full access to this open access chapter, Download conference paper PDF

Color and Depth Sensing Sensor Technologies for Robotics and Machine Vision

Inertial Safety from Structured Light

A multi-camera image processing and visualization system for train safety assessment

Article 18 January 2017

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Industrial manufacturing is expected to change considerably in the near future – a paradigm shift often called Industry 4.0 [1]. Part of this vision are smart factories, context-aware facilities that can take into account information like object positions or machine status [2]. They provide manufacturing services that can be combined efficiently in (almost) arbitrary ways. This challenge is modeled by the RoboCup Logistics League (RCLL) [3].

While some factories will be designed according to this vision with networked machinery, even more existing facilities will be incrementally upgraded for economic reasons, requiring the robots to adapt to existing machines, and to work safely alongside humans [4, 5]. The light signals used in the RCLL are industry-standard parts^{Footnote 1} that are often used to indicate a machine’s status, e.g. when it is about to run out of material, or whether it is currently safe for a human to perform certain operations. Being able to visually recognize these is important even in the presence of a network to communicate that very information, for example to prevent misunderstandings between humans and robots in case of a signal or network failure.

In this paper, we describe a novel method that uses a coarse (yet expressive and very efficient) color model to search for relevant regions of interest (ROI) of the light colors red, yellow, and green. These regions are then filtered by a number of spatial constraints to eliminate typical false positives like colored reflections on metal parts of the machine. A machine-specific laser-based detection of the signal tower can be used to reduce the image search space considerably, providing an order of magnitude speed-up while increasing reliability. Eventually, the detected ROIs for the three colors are analyzed for their activation state (cf. Fig. 1) and for temporal relations to detect blinking lights.

In the following Sect. 2 we briefly describe the RCLL and the problem of light signal tower detection. In Sect. 3 we highlight some related work before describing the method in detail in Sect. 4. We provide evaluation results in Sect. 5 before we conclude in Sect. 6.

2 RoboCup Logistics League and Signal Light Towers

RoboCup [6] is an international initiative to foster research in the field of robotics and artificial intelligence. Besides robotic soccer, RoboCup also features application-oriented leagues which serve as common testbeds to compare research results. Among these, the industry-oriented RoboCup Logistics League^{Footnote 2} (RCLL) tackles the problem of production logistics in a smart factory. Groups of three robots have to plan, execute, and optimize the material flow and deliver products according to dynamic orders in a simplified factory. The challenge consists of creating and adjusting a production plan and coordinating the group [3].

A game is split into two major phases. In the exploration phase, the robots must determine the positions of machines assigned to their team and recognize and report a combination of marker and light signal state. During the production phase, the robots must transport workpieces to create final products according to dynamic order schedules which are announced to the robots only at run-time, while the machines indicate their status with light signals.

Machines in the RCLL are represented by Festo’s Modular Production System (MPS) stations, each equipped with a red/yellow/green signal light tower. For example, in Fig. 2 a robot approaches a ring station, where the signal tower is on the front left corner of the station.

The distinctive feature of this vision problem is the presence of active light sources with an extreme variation in brightness which far exceeds the sensitivity range of our consumer-grade cameras.

To be able to detect blinking states, we have to recognize both lit and unlit signals, but depending on ambient light, unlit signals may be captured as almost all black while lit signals are captured as mostly white (cf. Fig. 3). Another problem is the fact that the individual red/yellow/green segments are not optically separated internally, for example, a lit red segment will always make parts of an unlit yellow segment appear red. In combination with extensive and unpredictable background clutter (cf. Fig. 3) coming from colorful reflections on shiny machine parts, colorfully dressed spectators and other objects, false positives become a major problem. Since individual segments are made of a transparent material with a fluted surface, the use of many light emitting sensors like a Kinect is infeasible. The use of stereo cameras is made difficult since the amount of textures is low if the region of a color is mostly a bright spot if the light is turned on, or the remainder of the image too dark if tuned down.

3 Related Work

Automatic detection of roadside traffic lights is a related field in particular for autonomous driving. Ziegler et al. describe the challenges posed by a long real-world overland journey [7] under urban and rural conditions at daytime. While it is in principle possible to work around the whole issue by broadcasting traffic signal states over radio, this would require major infrastructure investments [8].

A common practice is to build a database containing features of known intersections to assist locating a traffic signal within a camera image [7, 8]. The required data are gathered on a special mapping run of the routes. Fairfield and Urmson generate a detailed prior map that contains a global 3D pose estimate of every traffic signal [8]. Ziegler et al. create a manually labeled 2D visual feature database [7]. During autonomous driving, these hints are then used to limit the search space for the classifier that detects the red, yellow and green lights.

Such approaches do not cover some of the typical problems outlined in Sect. 2 and do not use a second sensor that allows to reduce the problem space.

Another approach in the RCLL has been to reduce camera exposure and contrast until only lit signals would create a saturated output [9]. A drawback of this approach is that this makes the camera unusable for other tasks.

Color detection has been a long-standing issue in RoboCup. In other leagues like the Standard Platform League, lookup tables were sufficient while constant lighting was provided [10]. These methods generally cannot capture the dynamic range with active light sources. Edge and color segmentation have been used to detect vertically stacked color-coded landmarks [11]. While somewhat similar in shape, they did not change during the game and had no temporal dependencies.

4 Multi-modal Light Signal Detection

Image processing is performed as a sequence of operations forming a processing pipeline that is depicted in Fig. 4. A classifier takes an input image and determines regions of interest (ROI) by detecting colors along a grid with pixels of relevant colors according to similarity color models. An assembly stage combines ROIs of different colors according to some spatial constraints. Additionally, based on the detection of the flat side panel of the MPS (cf. Fig. 2) by means of a 2D laser scanner, the ROIs can be further constrained by an estimate of the expected position within the image. This combination of different sensors makes this a multi-modal approach which significantly reduces the search space and the chance of false positives. Distance-based tracking ensures that consecutive frames are accepted for small movements. A brightness classifier detects lit/unlit signal segments in the determined ROIs and temporal aggregation is performed to detect blinking signals.

In the following we will detail the major components of the pipeline which has been implemented using the computer vision framework in Fawkes [12].

4.1 Color Model

The color model is responsible for deciding whether an input color matches a certain reference color. The used color model has been ported from the VLC video player^{Footnote 3}. It works directly with the YUV colorspace that is produced natively by most webcams, thus eliminating colorspace conversion. In the YUV colorspace, the luminance (roughly conforms to the concept of brightness) information is encoded entirely in the Y dimension, while the color value (chrominance) is a 2D vector in the UV plane. The saturation of a color then corresponds to the length of the UV vector.

Normalizing the two color vectors by their saturation and computing the length of the difference vector then gives a reasonable similarity measure: \( \delta _{UV} = |~|\mathbf {r}|\cdot \mathbf {c} - |\mathbf {c}|\cdot \mathbf {r}~| \), where \(\mathbf {r} = (u_{r},v_{r})^T\) is the reference color, \(\mathbf {c} = (u_{c}, v_{c})^T\) is the input color, and \(\delta _{UV}\) is the scalar color difference. Specifying a threshold on \(\delta _{UV}\) then allows us to decide whether some pixel from the camera image matches a given color within a certain tolerance. Along with a threshold on \(|\mathbf {c}|\) and on \(\delta _Y\), such a color model describes a subset of the UV space (similar to Fig. 5) that extends through a portion of the Y dimension. Multiple such color models can be combined into a multi-color model that contains all shades we expect to see e.g. in the red light in a signal tower.

4.2 Classifier

A classifier takes an input image and outputs regions of interest. The color classifier used in this work takes a color model that ascribes a principal color to a pixel color and a scanline grid. The classifier then analyzes each crossing of the grid. If the pixel is found to belong to a known color class, it considers the direct \(5 \times 5\) neighborhood. Only if a sufficient number of neighboring pixels are assigned to the same color class, the pixel is considered as a positive match. Areas with a sufficient number of similarly colored points result in a ROI. A post-processing step merges overlapping or adjacent ROIs of the same color.

4.3 Signal Assembly

In the signal assembly, we compose a signal of the ROIs denoting enabled or disabled green (\(G_1\) and \(G_0\)) and red (\(R_1\) and \(R_0\)) signal lights which have been determined by the classifier described above. Algorithm 1 depicts the overall approach: first, it is tried to determine if ROIs can be found that fit into a laser-based ROI (ll. 1–4). If this succeeds, only the best matching ROI combination is kept (ll. 5–6), otherwise a full search on the image is performed (ll. 7–11). If no previous detections exist the algorithm returns the detected signals (l. 12). For the remaining candidates, a distance-based tracking is performed (ll. 13–20). States of previous detections are updated if a new detection is spatially close (ll. 14–17) or just added otherwise (l. 18).

Red/Green Matching. A crucial part is the matching of red and green ROIs that are spatially related such that they can represent a light signal. The input ROIs can be of the full image, or constrained to a laser-based ROI (see next section). We limit the search for the signal to red and green ROIs since the yellow light may appear to change color if the lights above or below are lit. Depending on the environment—which might contain arbitrary colorful objects that match the reference colors—the color classifier can return any number of rectangular ROIs, some of which may be part of the signal we are looking for. Algorithm 2 shows the procedure. First, Geom_OK checks the width and vertical position of the green ROI, and the horizontal alignment of both ROIs:

Any (r, g) pair that does not satisfy this constraint cannot possibly be part of one signal tower, so it is skipped (ll. 2 and 18). A pair that passes is then checked for a special case that can occur due to the extreme brightness of the red and green lights (ll. 3 and 4). The used webcams have an acrylic lens cover that easily gathers a slight haze from dust and wiped-off fingerprints, often causing lit signals to create a colored bloom around the actual light source. The result is a ROI that does contain the signal light, but which is overly large. Whether a ROI \(\rho _1\) is affected by bloom is determined in relation to another ROI \(\rho _2\):

If bloom is detected, the geometry of the ROI that is likely not or less-affected by bloom is used to improve the geometry. After this, another constraint tests if the vertical space between r and g is sufficient to fit a similarly-sized yellow ROI in between (Vspace-OK). If this constraint is violated, the (r, g) pair is skipped. Otherwise the two ROIs are aligned well enough horizontally and a similarly-sized gap for a yellow ROI exists in between. If these are still too dissimilar in width (l. 6), the width of both is set to the mean width while preserving the center position (l. 7–9). If a pair of red and green ROIs ran through this process, we assume both must be part of the same signal tower, and generate a yellow ROI y that fits in between (l. 11–14).

Laser-Assisted ROI Pre-processing

If the position of the MPS table could be detected with the 2D laser scanner, a bounding box can be estimated in which the colored ROIs are to be expected (cf. pink box in Fig. 6). We call this rectangular region the laser ROI or l. Within l, we can expect to find (almost) no clutter, which allows us make additional assumptions, as described in Algorithm 3. For example, we can now handle overexposure (Fig. 6) by simply merging the broken-down red or green ROIs into one (Lines 2 and 3). If the red or green light is switched off, large parts of it may appear in a very dark shade that does not have enough saturation to discriminate it from other, unwanted objects. In this case, the merged ROI may still not cover the full area of the signal light, but we also do not suffer from bloom. Since we do not expect black clutter (T-shirts, black machine parts etc.), we can look for the black socket (l. 12) or the black cap on top (l. 4). If the “black” classifier is successful, \(r_m\) or \(g_m\) may be improved using the respective black ROIs (ll. 5–10 and 13–16). In the case of green, we only extend \(g_m\) (i.e. \(\delta _y\) must be positive), since an unlit green signal part often turns out so dark as to appear black.

After this pre-processing the red/green matching algorithm is tried once with \(r_m\) and \(g_m\) (l. 18). If this succeeds, we have successfully obtained a tuple \((r_m, y, g_m)\) that covers the full signal tower and can be passed on for tracking, brightness classification and blinking detection. If the red/green matching fails while both \(r_m\) and \(g_m\) are defined, one of the two ROIs might be blown up because of bloom, and can be improved if the other one does not suffer from bloom. Since the width of both \(r_m\) and \(g_m\) is limited to the width of the laser ROI l, we can estimate how badly bloom affects a ROI by its aspect ratio (ll. 22–23). The height of a bloom-affected ROI can then be improved in relation the ROI that is less affected (ll. 24–28).

After this, the Red/Green matching is tried once more with improved \(r_m\) or \(g_m\). If this fails again, we give up on the current combination of ROI sets.

Apart from the case where we were able to obtain both \(r_m\) and \(g_m\), we also handle cases where one of the two is missing (ll. 33–35). If, e.g., there is only a red ROI \(r_m\), matching green and yellow ROIs can be generated. In this case a black ROI b that might have been found can be used to estimate the overall height of the color ROIs. Eventually, three similarly sized ROIs should be found.

4.4 Tracking, State Detection, and Filtering

After ROIs have been determined, distance-based tracking is performed. A resulting ROI tuple denoting a signal tower is matched against previous detections based on their distance and a maximum threshold (algorithm 1, ll. 14–16).

To determine the activation states, the brightness of the respective ROIs is evaluated. ROIs of high brightness are considered to be active lights. This information is stored in a circular buffer. The buffer length is determined by the number of frames that can be processed per second and the maximum blinking frequency in the RCLL, which is 2 Hz. The light state is considered to be unknown, as long as the buffer is not completely filled. Once filled, the number of on/off transitions is counted. If this is larger than 1, the specific light is blinking.

Additionally, a confidence value is produced based on the visibility of the signal tower. A positive value for this visibility history denotes consecutive positive sightings, negative values how many images the signal tower could not be detected. The value immediately turns negative on failed detections and is not step-wise decremented.

A filtering stage can be used that performs outlier removal, i.e., if the light signal is not visible for a short time the old state is assumed to still be valid. Additionally, the visibility history is used to explicitly state that a signal is unknown if the value is below a given threshold.

5 Evaluation

The approach has been evaluated in terms of run-time and detection rates. The experiments were conducted on the actual robot that features an additional laptop (cf. Fig. 2) with a Core i7-3520M CPU and 8 GB of RAM.

Figure 7 shows the run-time per frame as 1-second averages (30 images), without (a) and with (b) laser-based ROI pre-processing. During each run, the situation was modified twice after 20 and after 40 s, each time introducing more background clutter. Overall, the classifier requires the largest amount of processing time. After introducing more clutter, this part requires more processing time (to be expected with more pixels classified as red or green), as does the ROI assembly stage, since more ROIs are produced and are tried to be combined to a signal tower. Enabling the laser-assisted ROI pre-processing considerably reduces the overall processing time due to search space reduction for the classifier. The ROI assembly stage takes longer since it now requires additional classifier runs for the black cap and socket. The occasional outliers in (b) are due to the laser-line detection not converging and falling back to full-image classification.

Table 1 shows the detection rate from running the image processing pipeline on an actual robot detecting signals on an MPS in three situations posing typical problems. For each situation, the robot moved to four nearby locations facing the MPS and took 30 images. This was done for all valid light signal combinations (no blinking). Figure 8 shows example images for each dataset. Three different configurations were used. The pipeline was run without and with the laser-based ROI pre-processing. Finally, filtering was enabled. Blind search incurs high run-time and mediocre detection results (first macro column). Using the laser-based ROI vastly reduces the search space, increasing the detection rate considerably (second macro column). This is improved even further using the filtering and outlier removal (last macro column). With conservative settings requiring a high confidence, this results in virtually no false detections in actual games.

Table 1. Results after applying the approach in three situations (cf. Fig. 8), each with seven signal combinations and from four different positions in front of the MPS; we give True (T) and false (F) positives (P) and negatives (N) (T/N omitted in this test), and detection rate.

Full size table

6 Conclusion

Integrating robots into human working areas will require recognizing cues that were designed for human consumption, such as light signal towers which are mounted to many machines in factories. In this paper, we have presented a novel approach to detect such towers and recognize the respective signal states. The algorithm encodes detailed human knowledge (collected in several RCLL competitions) that deals with typical problems that arise, for instance due to reflections of the lights on metal machine parts, or because colored light shines into adjacent lights when illuminated. To improve efficiency and robustness, a multi-modal approach has been chosen combining detection from a 2D laser scanner and a camera image. To use the algorithm in a new situation, the main modification required is providing a new mapping from such 2D laser scanner data to a region of interest in the image. The evaluation results show that the algorithm performs at a high speed allowing real-time light tower detection with a very good detection rate yielding only a negligible number of false readings.

An implementation of the algorithm is available as part of the Fawkes software stack release^{Footnote 4} for the RCLL [12]. The datasets and evaluation scripts are available on the project website.^{Footnote 5}

Notes

1.
Similar to http://www.werma.com/en/s_c1006i2580/K37_cable_24VAC/DC_GN/YE/RD/69811075.html.
2.
RoboCup Logistics website: http://www.robocup-logistics.org.
3.
Based on VLC’s (http://www.videolan.org) color threshold filter (colorthres.c).
4.
https://www.fawkesrobotics.org/p/rcll2015-release/.
5.
https://www.fawkesrobotics.org/p/rcll-signal-vision.

References

Kagermann, H., Wahlster, W., Helbig, J.: Recommendations for implementing the strategic initiative INDUSTRIE 4.0. Final Report, Platform Industrie 4.0 (2013)
Google Scholar
Lucke, D., Constantinescu, C., Westkämper, E.: Smart factory - a step towards the next generation of manufacturing. In: 41st CIRP Conference on Manufacturing Systems and Technologies for the New Frontier (2008)
Google Scholar
Niemueller, T., Ewert, D., Reuter, S., Ferrein, A., Jeschke, S., Lakemeyer, G.: RoboCup logistics league sponsored by festo: a competitive factory automation testbed. In: Behnke, S., Veloso, M., Visser, A., Xiong, R. (eds.) RoboCup 2013. LNCS, vol. 8371, pp. 336–347. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44468-9_30
Chapter Google Scholar
Andersen, R.H., Solund, T., Hallam, J.: Definition and initial case-based evaluation of hardware-independent robot skills for industrial robotic co-workers. In: 41st International Symposium on Robotics (2014)
Google Scholar
Angerer, S., Strassmair, C., Staehr, M., Roettenbacher, M., Robertson, N.M.: Give me a hand - the potential of mobile assistive robots in automotive logistics and assembly applications. In: IEEE International Conference on Technologies for Practical Robot Applications (TePRA) (2012)
Google Scholar
Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawa, E.: Robocup: the robot world cup initiative. In: 1st International Conference on Autonomous Agents (1997)
Google Scholar
Ziegler, J., Bender, P., Schreiber, M., Lategahn, H., Strauss, T., Stiller, C., Dang, T., Franke, U., Appenrodt, N., Keller, C.G., Kaus, E., Herrtwich, R.G., Rabe, C., Pfeiffer, D., Lindner, F., Stein, F., Erbs, F., Enzweiler, M., Knöppel, C., Hipp, J., Haueis, M., Trepte, M., Brenk, C., Tamke, A., Ghanaat, M., Braun, M., Joos, A., Fritz, H., Mock, H., Hein, M., Zeeb, E.: Making bertha drive - an autonomous journey on a historic route. IEEE Intell. Transp. Syst. Mag. 6(2), 8–20 (2014)
Article Google Scholar
Fairfield, N., Urmson, C.: Traffic light mapping and detection. In: International Conference on Robotics and Automation (ICRA) (2011)
Google Scholar
Jentzsch, S., Riedel, S., Denz, S., Brunner, S.: TUMsBendingUnits from TU Munich: RoboCup 2012 logistics league champion. In: RoboCup Symposium (2012)
Google Scholar
Barrett, S., Genter, K., He, Y., Hester, T., Khandelwal, P., Menashe, J., Stone, P.: UT Austin Villa 2012: standard platform league world champions. In: Chen, X., Stone, P., Sucar, L.E., Zant, T. (eds.) RoboCup 2012. LNCS (LNAI), vol. 7500, pp. 36–47. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39250-4_4
Chapter Google Scholar
Murch, C.L., Chalup, S.K.: Combining edge detection and colour segmentation in the four-legged league. In: Australasian Conference on Robotics and Automation (2004)
Google Scholar
Niemueller, T., Reuter, S., Ferrein, A.: Fawkes for the RoboCup logistics league. In: Almeida, L., Ji, J., Steinbauer, G., Luke, S. (eds.) RoboCup 2015. LNCS (LNAI), vol. 9513, pp. 365–373. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-29339-4_31
Chapter Google Scholar

Download references

Acknowledgments

T. Niemueller was supported by the German National Science Foundation (DFG) research unit FOR 1513 on Hybrid Reasoning for Intelligent Systems (https://www.hybrid-reasoning.org).

Author information

Authors and Affiliations

Knowledge-Based Systems Group, RWTH Aachen University, Aachen, Germany
Victor Mataré, Tim Niemueller & Gerhard Lakemeyer

Authors

Victor Mataré
View author publications
You can also search for this author in PubMed Google Scholar
Tim Niemueller
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Lakemeyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tim Niemueller .

Editor information

Editors and Affiliations

University of Bonn, Bonn, Germany
Sven Behnke
Department of Computing, Curtin University, Perth, Western Autralia, Australia
Raymond Sheh
Computer Engineering Department, Istanbul Technical University, Istanbul, Turkey
Sanem Sarıel
School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Daniel D. Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mataré, V., Niemueller, T., Lakemeyer, G. (2017). Robust Multi-modal Detection of Industrial Signal Light Towers. In: Behnke, S., Sheh, R., Sarıel, S., Lee, D. (eds) RoboCup 2016: Robot World Cup XX. RoboCup 2016. Lecture Notes in Computer Science(), vol 9776. Springer, Cham. https://doi.org/10.1007/978-3-319-68792-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-68792-6_35
Published: 01 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68791-9
Online ISBN: 978-3-319-68792-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robust Multi-modal Detection of Industrial Signal Light Towers

Abstract

Similar content being viewed by others