# A 5-Decade Dynamic Range Ambient-Light-Independent Calibrated Signed-Spatial-Contrast AER Retina with 0.1ms Latency and Optional Time-to-First-Spike Mode J. A. Leñero-Bardallo, T. Serrano-Gotarredona, and B. Linares-Barranco *IEEE Fellow* Instituto de Microelectrónica de Sevilla (IMSE-CNM-CSIC). Ed. C/ Américo Vespucio s/n, 41092 Sevilla, Spain. Email: bernabe@imse-cnm.csic.es Abstract- Address Event Representation (AER) is an emergent technology for assembling modular multi-blocks bio-inspired sensory and processing systems. Visual sensors (retinae) are among the first AER modules to be reported since the introduction of the technology. Spatial contrast AER retinae are of special interest since they provide highly compressed data flow without reducing the relevant information required for performing recognition. Reported AER contrast retinae perform a contrast computation based on the ratio between a pixel's local light intensity and a spatially weighted average of its neighbourhood. This resulted in compact circuits, but with the penalty of all pixels generating output signals even if they sensed no contrast. In this paper we present a spatial contrast retina with signed output: contrast is computed as the relative difference (not the ratio) between a pixel's local light and its surrounding spatial average and normalized with respect to ambient light. As a result, contrast is ambient-light-independent, includes a sign and the output will be zero if there is no contrast. Furthermore, an adjustable thresholding mechanism has been included, such that pixels remain silent until they sense an absolute contrast above the adjustable threshold. The pixel contrast computation circuit is based on Boahen's Biharmonic operator contrast circuit, which has been improved to include mismatch calibration and adaptive current based biasing. As a result, the contrast computation circuit shows much less mismatch, is almost insensitive to ambient light illumination, and biasing is much less critical than in the original voltage biasing scheme. The retina includes an optional global reset mechanism for operation in ambient-light-independent Time-to-First-Spike Contrast Computation Mode. A 32x32 pixel test prototype has been fabricated in 0.35µm CMOS. Experimental results are provided. ## I. INTRODUCTION AER is a spike based signal representation hardware technique for communicating spikes between layers of neurons in different chips. AER was first proposed in 1991 in one of the Caltech research labs [1]-[2], and has been used since then by a wide community of neuromorphic hardware engineers. A variety of AER visual sensors can be found in the literature, such as simple luminance to frequency transformation sensors [3], Time-to-First-Spike (TFS) coding sensors [4]-[7], foveated sensors [8]-[9], more elaborate transient detectors [10]-[11], motion sensing and computation systems [12]-[16], and spatial and temporal filtering sensors that adapt to illumination and spatio-temporal contrast [17]-[18]. Spike based visual sensors can code their output signals using rate coding or TFS coding. When using rate coding, each pixel is autonomous and continuously generates spikes at a frequency proportional to the signal to transmit (such as luminance or contrast). Under such circumstances, there are no video frames, so that sensing and processing is continuous and frame-free. When using TFS coding, a global system-wide reset is provided and each pixel encodes its signal by the time between this reset and the time of the only spike it generates. Sensing and processing is frame-constraint. However, TFS is a highly compressed coding scheme (each pixel generates at the most one spike per frame) and frame time can be dynamically adjusted to an optimum minimum by subsequent processing stages. TFS coding and related concepts were originally proposed by Thorpe based on neuro physiological and psycophysical experiments [19], and have evolved to very high speed image processing software tools [20]. Spatial contrast AER retina sensors are of special interest. Computing contrast on the focal plane significantly reduces data flow, while relevant information for shape and object recognition is preserved. In a conventional luminance sensor (a commercial camera) all pixels are sampled with a fixed period and its light intensity (integrated over this period) is communicated out of the sensor to the next stage. In an AER sensor pixels are not sampled. On the contrary, the pixels are the ones who initiate an asynchronous communication cycle, called "event", when a given condition is satisfied. For example, a spatial contrast retina pixel would send an event whenever the computed local contrast exceeds a given threshold Previously reported plain spatial contrast retinae [21]-[22] compute a contrast current per pixel $I_{cont}(x,y)$ as the ratio between a pixel's locally sensed light intensity $I_{ph}(x,y)$ and a spatially weighted average of its surrounding neighborhood $I_{avg}(x,y)$ computed with some kind of diffusive network $$I_{cont}(x,y) = I_{ref} \frac{I_{ph}(x,y)}{I_{avg}(x,y)}$$ (1) where $I_{ref}$ is a global scaling current. Since this is always positive, let us call it "unipolar" contrast computation, with contrast being computed as the ratio between two photo currents. This yielded circuits where no subtraction operation was required. This was crucial to maintain mismatch (and precision) at reasonable levels. Note that for computing $I_{avg}$ and $I_{cont}$ circuits have to handle directly photo currents, which can be as low as pico-amperes or less. Performing a simple mirroring operation introduces mismatches with errors in the order of 100% [23]. This can be overcome by increasing Copyright (c) 2008 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org. | Ta | hl | ı | 1 | |----|----|---|---| | | Cul03 [3] | Chen07 [6] | Licht07 [11] | Zagh04 [17]-[18] | Ruedi03 [4] | Costas07 [22] | This work | |------------------------------------|---------------------------|------------------|---------------------------------------------------------------------|-----------------------------------------------------|-------------------------------------------------------|--------------------------------------------------------------------|--------------------------------------------------------------------| | Functionality | Luminance to<br>Frequency | Luminance to TFS | Temporal<br>Contrast to<br>Number of<br>Events | Spatial and<br>Temporal<br>Contrast to<br>Frequency | Spatial Contrast<br>Magnitude and<br>Direction to TFS | Spatial Contrast<br>to Frequency | Spatial Contrast<br>to Frequency or<br>TFS | | Light to Time<br>Restriction | YES | YES | NO | NO | YES | NO | NO | | Latency | 120μs - 125s | 10μs - 1s | 15μs - 400μs<br>(strong biases);<br>0.9ms - 4ms<br>(nominal biases) | not reported | 2ms - 150ms | not reported | 0.1ms - 10ms | | Dynamic<br>Range | 120dB | >100dB | 120dB | 50dB | 110dB | 100dB | 100dB | | Spatial<br>Contrast<br>computation | N/A | N/A | N/A | diffusive grid<br>neighbourhood | 4 nearest pixels<br>(up, right, left,<br>bottom) | diffusive grid<br>neighbourhood<br>(adjustable up to<br>10 pixels) | diffusive grid<br>neighbourhood<br>(adjustable up to<br>10 pixels) | | FPN | 4% | 4.6% | 2.5% | 1-2dec | 1.7% | 6.6% | 0.60% | | Power | 3-71mW | N/A | 24mW | 63mW | 300mW | 33μW - 10mW | 0.66 - 6.6mW | transistor area, but then leakage currents may become comparable to the available photo currents. Consequently, while handling photo currents, it is desirable to keep complexity at a minimum. Therefore, from a circuit point of view, the way of computing contrast as in eq. (1) was very convenient. However, this presents an important drawback: when there is no contrast $(I_{avg} = I_{ph})$ then $I_{cont} \neq 0$ . In an AER circuit this means that a pixel sensing no contrast will be sending out information consuming (events) and communication bandwidth on the AER channels. This is contrary to the advantages of AER (where it is expected that only information relevant events will be transmitted) and contrary to the advantages of computing contrast at the focal plane (so that only contrast relevant pixels need to send information). In prior work [22], although spatial contrast was computed by eq. (1) in the retina, a post-processing with AER (convolution) modules was added to effectively compute the Weber Contrast<sup>1</sup> as the signed quantity $$I_{cont}(x,y) = I_{ref}\left(\frac{I_{ph}(x,y)}{I_{avg}(x,y)} - 1\right)$$ (2) This reduced significantly the data flow (from about 400keps<sup>2</sup> to about 10keps), but also at the expense of reducing pixel speed response and contrast sensitivity by a factor of about 10. In the present paper we present a new spatial contrast retina design [25], where the contrast computation follows eq. (2). The design is based on the original contrast computation circuit by Boahen [21], which has been improved to overcome its inherent limitations on mismatch, ambient light dependence, and critical controllability. Section II discusses related work and summarizes a prior AER mismatch-calibrated contrast retina pixel [22] that followed eq. (1), Section III summarizes briefly Boahen's spatial contrast computation circuit, Section IV summarizes a more compact calibration circuit than the one used in [22] and which has been used in the present design, and Section V introduces the new pixel design. Finally, Section VI provides experimental characterization and test results. ## II. PREVIOUS DESIGNS A variety of AER retina sensors have been reported, from which we have selected a few for comparison purposes. Table summarizes and compares their functionalities performance figures. Three types of functionalities considered: sensing pixel luminance, sensing pixel temporal contrast, and sensing pixel spatial contrast with respect to a given neighborhood. For (spike) signal coding, three methods are used: signal to frequency (rate) coding, signal to number of events (NE) coding, and signal to Time-to-First-Spike (TFS) coding. When using rate-coding (as in [3],[17],[18],[22]), a current that carries the information of interest (luminance, contrast) is fed to an integrate-and-fire circuit whose spike frequency is controlled by the current. For NE coding (as in [11]), every time the light sensed by a pixel changes by a relative amount, a new spike is generated. In TFS coding (as in the information signal is also fed to an integrate-and-fire circuit, but the integrators are periodically and globally reset and only fire one spike between consecutive resets. This way, the information is coded as the time between the global reset and the pixel spike time. For a luminance retina [3],[6] the photo current is the one to be integrated. Since light (photo current) can change over many decades, this results in timings directly dependent on ambient light. Consequently, the dynamic range of light sensing capability is directly <sup>1.</sup> Weber Contrast is defined as $WC = (I-I_{qvg})/I_{avg}$ for a pixel photo current with respect to its neighborhood average photo current, or as $WC = (I_1-I_2)/(I_1+I_2)$ between two adjacent pixels or regions. Both expressions are equivalent by making $I = I_1$ and $I_{avg} = (I_1+I_2)/2$ . <sup>2.</sup> keps stands for "kilo events per second". transformed into the latency variation of the output. This is a severe practical restriction, labelled in Table 1 as the "Light to Time Restriction". For contrast computations (either spatial or temporal), light difference is normalized to average light, so that contrast is (by definition) independent of ambient light. Consequently, these retinae should not suffer from the "Light to Time Restriction". This is the case for all contrast retinae in Table 1, except for [4]. The reason is that in [4] for each frame there are two separate steps in time. The first one uses a Light to Time integration (which lasts between 0.5µs - 150ms depending on ambient light) to obtain a voltage representation of pixel contrast. The second step transforms these voltages **TFS** representation into requiring ambient-light-independent time of about 2ms. In the present paper we present a spatial contrast retina whose ambient-light-independent pixel spatial contrast can be either coded as frequency or TFS. In a previous spatial contrast AER retina design [22], each pixel computes local spatial contrast as a ratio $$I_{cont}(x, y) = I_{ref}(x, y) \frac{I_{avg}(x, y)}{I_{nh}(x, y)}$$ (3) where $I_{ph}(x, y)$ is the pixel photo current, and $I_{avg}(x, y)$ is a neighborhood pixel photo current average computed by a diffusive grid [26]. The resulting current $I_{cont}(x, y)$ is thus proportional to a unipolar contrast (as in eq. (1)) and is fed to an integrate-and-fire neuron generating spikes with a frequency proportional to $I_{cont}(x, y)$ . Scaling current $I_{ref}(x, y)$ is made locally trimmable for each pixel in order to compensate for mismatch. As a result, inter-pixel mismatch contrast computation could be reduced from about $\sigma \approx 60\%$ to $\sigma \approx 6\%$ using 5-bit pixel registers to control $I_{ref}(x, y)$ . Pixel complexity was kept relatively simple (104 transistors + 1 capacitor) thanks to the unipolar nature of the contrast computation, and the whole pixel could be fit into an area of $58\mu m \times 56\mu m$ in a 0.35 $\mu m$ CMOS process. The main drawback is that pixels with no contrast would generate output events at a constant rate proportional to $I_{ref}$ . To overcome this, a 4-AER-module system was assembled [22] to subtract this offset and compute effectively a signed contrast as in eq. (2). However, contrast sensitivity was reduced by a factor of 8, thus reducing its speed response as well as contrast sensitivity. #### III. BOAHEN SPATIAL CONTRAST PIXEL In the design presented in this paper, the speed and contrast sensitivity reduction problem is solved by performing all the signed-spatial-contrast computation at the sensor chip using an improved version of Boahen's original biharmonic contrast computation circuit [21]. The continuous approximation of Boahen's pixel circuit, shown in Fig. 1, solves approximately the following equations [26] $$I_h(x, y) = I_{nh}(x, y) + a\nabla^2 I_c(x, y)$$ (4) $$I_c(x, y) = I_u - b\nabla^2 I_b(x, y) \tag{5}$$ Fig. 1: Boahen original contrast computation circuit Fig. 2: Interpretation of spatial contrast computations Solving for $I_h$ results in the biharmonic equation used in computer vision to find an optimally smooth interpolating function of the stimulus $I_{ph}$ [27]. Consequently, the output $I_c(x,y)$ is the second order spatial derivative of the interpolation $I_h$ according to eq. (5). Since the interpolation is a spatially integrated version of the stimulus, $I_c$ can be interpreted as a version of a first order derivative of the stimulus, therefore, spatial contrast. This can also be understood with the help of Fig. 2. The top trace shows a step stimulus $I_{ph}$ and its spatial average ( $I_{avg}$ or $I_h$ ). The center trace shows the contrast computation as $I_{avg}/I_{ph}$ (as was done in [22]), and the bottom trace shows the contrast computation as the second order spatial derivative of $I_h$ . Both are equivalent, although not identical. According to eq. (5), $I_c$ includes a DC term $I_u$ . The original circuit implementation of this model suffered from a series of drawbacks. First, mismatch was comparable to output signal. Second, output signal would degrade for the same contrast stimulus when changing lighting conditions. Third, contrast gain had to be adjusted through critically sensitive bias voltages with very narrow tuning range. All three drawbacks have been improved with the present implementation. Fig. 3: Digitally controlled length MOS used for calibration Fig. 4: Translinear tuning circuit ## IV. COMPACT CALIBRATION CIRCUIT We reduce mismatch by introducing calibration. One dominant source of mismatch is the DC component $I_u$ in eq. (5). Since this current is set constant, independent of lighting conditions, we can directly subtract it with a trimmable current source. The output current will thus be directly the signed contrast current we were looking for. To implement the trimmable current source, we follow the recently reported very compact circuit based on series transistors association [29]. Fig. 3 shows the basic principle behind this circuit. Each switched MOS operates as a segment of an effective longer MOS whose length is controlled digitally by switching individual segments from ohmic to saturation, and vice versa. The key consists of making each segment to contribute approximately as a power of 2 to the total length. The digital control word $w_{cal} = \{b_{N-1}...b_1b_0\}$ sets the state of the switches. As a result, the effective length is digitally controlled as in a digital-to-analog conversion. On the right of Fig. 3 we show the symbol of a digi-MOS (digitally-controlled MOS) which we use to represent the circuit on the left. Fig. 4 shows the circuitry used to subtract the DC component $I_u$ of the contrast current. Transistors to the left of the dashed line are shared by all pixels and are located at the chip periphery, while those to the right are replicated for each pixel. Current $I_u$ ' sets the subtracting DC level (while also introducing mismatch), while $\{I_1, I_2, I_3\}$ are adjusted so that $I_{cal}$ has a tuning range covering the inter-pixel mismatch. Transistors $M_{1-4}$ form a translinear loop [26], thus $I_{cal} = I_1 I_2 / I_{3n}$ . And $I_{3n}$ is a mirrored version of $I_3$ by transistors $M_p$ and $M_q$ . Transistor $M_q$ is the digi-MOS of Fig. 3. Consequently, $I_{cal}$ is proportional to the pixel calibration word $w_{cal}(x,y)$ , which is stored on in-pixel static RAM latches loaded at start-up. Note that current $I_{cal}$ could have been generated directly by current mirror $M_p$ - $M_q$ . However, in this case, if one wants to scale $\{I_u, I_{cal}, I_u'\}$ globally (to adjust the retina output frequency range) the circuit would change the current through the calibration branch containing $M_q$ . On the contrary, with the circuit in Fig. 4 one can scale $\{I_u, I_{cal}, I_u'\}$ while keeping the calibration branch current $I_{3n}$ (and $I_3$ ) constant, and scale through peripheral currents $I_1$ and/or $I_2$ . This way, calibration degrades less when tuning the output frequency range. In the Section on experimental results we explain how we proceed to perform calibration. ## V. IMPROVED SIGNED-SPATIAL-CONTRAST PIXEL Fig. 5 shows the schematics of all pixel circuitry. Fig. 5(a) provides an overall block diagram, indicating the signals interchanged between blocks. The pixel contains three main parts: (1) the photo sensing and contrast computation part, including calibration, which provides the ambient light independent contrast current $I_{cont}$ ; (2) the integrate-and-fire part, which includes refractory circuitry, thresholding, and TFS mode; (3) and the pixel AER communication circuitry that sends out events to the periphery. Let us now describe each one. #### A. Photo Sensing and Contrast Computation Fig. 5(b) shows how Boahen's contrast computation circuit has been modified to include a current biasing scheme for controlling the original voltages $V_{cc}$ and $V_{hh}$ in Fig. 1. This way, gate voltages $V_{cc}$ and $V_{hh}$ tend to follow voltage excursions at nodes 'C' and 'H'. The first advantage of this is that biasing will adapt to ambient light conditions. For example, if all photodiode currents are scaled up/down by the same factor, the voltage at all nodes 'H' will follow it logarithmically. Since $I_u$ is constant, the voltage at node 'C' will thus also follow the same shift. Since bias currents $I_{hh}$ and $I_{cc}$ are kept constant, the gate voltages of transistors $M_h$ and $M_c$ will thus follow also this same global voltage shift, adapting themselves to the global light change. The second advantage of this current biasing scheme is that it attenuates mismatch. After doing careful mismatch analysis and identifying the main sources of mismatch for this circuit, one can find out that transistor $M_a$ and current $I_u$ are the dominant sources of mismatch. This can be understood as follows. Mismatch in $I_u$ goes directly into the DC offset of $I_c$ , which will be calibrated by $I_{cal}$ . Mismatch of $M_b$ is less critical because its inter-pixel gate voltage (node 'C') variability affects the bottom diffusive grid and the computation of the average current $I_h$ . Thus its variability impact is attenuated by the average computation. However, $M_a$ mismatch ( $V_{gs}$ variation of $M_a$ ) changes directly the source voltage of $M_b$ , affecting directly the gain of contrast output (coefficient 'b' in eq. (5)), whose effect is not directly calibrated by $I_{cal}$ . Consequently, $M_a$ needs to be sized to 5 Fig. 5: Pixel schematics diagram. (a) Compact block diagram. (b) Detail of photo sensing and contrast computation circuit. (c) Detail of signed integrate-and-fire circuit. (d) Detail of reset and refractory circuit. (e) Detail of thresholding circuit. (f) Detail of comparators. (g) Detail of event block circuit. Fig. 6: Effect of contrast thresholding on the relationship between pixel output frequency and contrast current. minimize mismatch. The effect of $I_u$ will be compensated by calibration, and the effect of $M_a$ will be attenuated by the current biasing scheme. Note that mismatch in all $M_a$ transistors will introduce random voltage variations at nodes 'H' and 'C'. These variations will be transformed into random lateral currents through transistors $M_h$ and $M_c$ . The random currents through $M_h$ will be collected by output current $I_c$ and can be compensated by calibration. However, random currents through $M_c$ transistors operate as if they were generated by the photodiodes. Thanks to the current biasing scheme, an increase in 'C' will increase the gate voltage of the new bottom NMOS transistor, increasing its source voltage, thus increasing the gate voltage of $M_c$ , which will reduce the lateral random current. A similar effect will be happening for transistors $M_h$ . Finally, the third advantage is a more robust means for biasing the lateral transistors. In the original scheme, voltages $V_{cc}$ and $V_{hh}$ suffered from a very narrow and critical tuning range (about 100mV or less). Now, bias currents $I_{cc}$ and $I_{hh}$ can be tuned over several decades, while still perceiving their effect. # B. Integrate-and-Fire Fig. 5(c) shows the integrate-and-fire block. Input contrast current $I_{cont}$ is integrated on capacitor $C_{int}$ . Two comparators detect whether the capacitor voltage $V_{cap}$ reaches an upper $(V_{high})$ or lower $(V_{low})$ threshold, triggering the generation of a positive (pulse+) or negative (pulse-) event, respectively. To accelerate the comparisons, both comparators activate a positive feedback loop (from $V_{cap}$ to $V_{dd04}$ for a positive event, or from $V_{cap}$ to $V_{gn04}$ for a negative event). After event generation, capacitor $C_{int}$ is reset to the central voltage $V_{ref}$ . This is done by the reset circuit shown in Fig. 5(d). This reset mechanism includes a refractory timing circuit that inhibits the pixel from generating subsequent events before refractory capacitor $C_{rfr}$ has been discharged by the DC current source MOS controlled by $V_{rfr}$ . The reset circuit also includes the global TFS (Time-to-First-Spike) mode reset signal, which resets all pixel capacitors $C_{int}$ simultaneously. Note that this signal inhibits the positive feedback loops in Fig. 5(c). This allows resetting quickly those pixels generating an event when TFS becomes active. Fig. 5(e) shows the minimum contrast thresholding circuit. A comparator detects whether capacitor voltage is above or below $V_{ref}$ and turns on either a positive $(I_{low})$ or negative $(I_{high})$ threshold current, which $I_{cont}$ needs to exceed for producing an event. Fig. 6 shows the resulting relationship | Table 2 | | | | | |-------------------------------|---------------------------------------|--|--|--| | technology | CMOS 0.35μm 4M 2P | | | | | power supply | 3.3V | | | | | chip size | 2.5 x 2.6 mm <sup>2</sup> | | | | | array size | 32 x 32 | | | | | pixel size | 80 x 80 μm <sup>2</sup> | | | | | fill factor | 2.0% | | | | | photodiode quantum efficiency | 0.34 @ 450nm | | | | | pixel complexity | 131 transistors + 2 caps | | | | | current consumption | 65μA @ 10keps | | | | | dynamic range | 1-100k lux | | | | | post-calibration FPN | 0.90% over 5 decades if ambient light | | | | | contrast sensitivity | 4400 Hz/WC | | | | | temporal latency | 0.1ms @ 50k-lux | | | | | maximum out event rate | 66 Meps | | | | between integrate-and-fire circuit output frequency $f_{out}$ and the input signed contrast current $I_{cont}$ while bias voltages $V_{th}^{+}$ and $V_{th}^{-}$ are set to generate threshold currents $I_{high}$ and $I_{low}$ , respectively. Naturally, threshold transistors would also introduce mismatch. Consequently, they were layed out with a large area of $2/20\,\mu m$ . Fig. 5(f) shows the two-stage comparators used in Fig. 5(c). At stand by they are biased at low current through $V_{b1}$ and $V_{b2}$ . However, during event generation its bias current is increased. This increase starts when signals *pulse* starts to depart from its resting voltage and stops after the pixel event reset signal $ev_rst$ returns to its resting level. The comparator within the thresholding circuit in Fig. 5(e) does not have this feature, since this comparator only needs to detect whether the so far accumulated contrast for the pixel is positive or negative, which is a slow process compared to the event generation timings. #### C. AER Communication Finally, the AER pixel communication part in Fig. 5(a) contains two identical "event block" circuits, which are shown in Fig. 5(g). These are standard AER pixel communication circuits taken from Boahen's row parallel event read-out technique [30]. When generating signed events, each pixel needs to provide two column event signals *col*+ and *col*-. This concept was already implemented and tested in prior designs [31] that required signed events. ## VI. EXPERIMENTAL RESULTS A 32 x 32 pixel test prototype AER signed spatial contrast retina chip has been designed and fabricated in a double poly 4-metal 0.35µm CMOS process with a power supply of $V_{DD}=3.3\,V$ . Table 2 summarizes the chip specifications. Fig. 7 shows a micro photograph of the die, of size 2.5 x 2.6 $mm^2$ . The whole chip, except the pad ring, is covered with the top metal layer leaving openings for the photo diode sensors. Fig. 7 also shows the layout of a single pixel highlighting its components. Each pixel layout is a symmetrical speculation of its neighboring pixels. This way noisy digital lines are shared among neighbors, as well as power supplies, and noise sensitive bias lines. At the same time, noise sensitive lines are separated from noisy ones. Pixel area is $80 \times 80\,\mu m^2$ , including routing. Fig. 7: Microphotograph of 2.5mm x 2.6mm die, and zoom out of 80μm x 80μm pixel (layout) indicating the location of its components. ## A. Pixel Frequency Range One of the corner pixels had its integrating capacitor node connected to a low-input-capacitance analog buffer for monitoring purposes. Pixel integrating capacitors have a capacitance of about $C_{int} \approx 118 fF$ (obtained from the layout extractor), while the corner pixel with monitoring buffer has a total capacitance of about $C_{mntr} \approx 196 fF$ (estimated from layout extraction and simulation). Fig. 8 shows recorded waveforms (for positive and negative currents) for this capacitor when turning off horizontal interactions among neighboring pixels (by turning off transistors $M_h$ and $M_c$ in Fig. 5(b)), and for a typical value of $I_u \approx 100 pA$ . By changing $I_u$ (with $I_u' = I_{cal} = 0$ ) or $I_u'$ (while $I_u = I_{cal} = 0$ ), pixel oscillation frequency could be tuned between 1.2Hz and 5KHz. For the maximum frequency the arbitrating periphery inserts varying delays. This is because all pixels are also firing with maximum frequency (even higher Fig. 8: Recorded waveforms at the integrating capacitor, under typical operating biases. Oscillation frequency is 466Hz. Fig. 9: Histograms of retina pixels frequencies distribution (a) before calibration and (b) after calibration than the pixel we are observing which has slightly higher integrating capacitance) and are collapsing the arbiter. Consequently, in a practical situation where only a small percentage of the pixels would fire with maximum frequency, they would be able to fire with a higher than 5KHz max frequency. #### B. Calibration In order to use the retina properly, the first requirement is to calibrate it. For this, the retina was exposed to a uniform stimulus, while biased for the following operation conditions: $I_u = 150pA$ , $V_{ref} = 1.65V$ , $V_{high} = 2.8V$ , $V_{low} = 0.45V$ , $I_{hh} = 10pA$ , $I_{cc} = 5pA$ . Also, before calibration, we set $I_{cal} = I_u' = 0$ . Under these conditions, retina output events are recorded, from which one can obtain the firing frequency of each pixel. Next, we set current $I_u' = 80pA$ so that the pixel with minimum frequency has a frequency close to zero (or slightly negative). Under these conditions the resulting histogram of pixel frequencies distribution is shown in Fig. 9(a). After this, the calibration circuit biases $(I_1, I_2, I_3)$ in Fig. 4) were set for optimum coverage of this distribution, and for each pixel the optimum calibration word $w_{cal}(x, y)$ was found. This is computed off-line by optimally combining biases Fig. 10: Effect of ambient illumination on post-calibration residual mismatch standard deviation. Five curves are shown, each for calibrating at the given illumination level. $\{I_1, I_2, I_3\}$ and calibration words $w_{cal}(x, y)$ . We allowed for a few outliers in order to minimize the residual standard deviation. One could also target to minimize the spread among the most extreme pixels at the expense of a higher standard deviation. After this process, the histogram of resulting calibrated pixel frequencies is shown in Fig. 9(b). The residual inter-pixel standard deviation is 26Hz. As we will see later (in Subsection D), maximum contrast frequency for these biases is ±4400Hz. Consequently, post-calibration residual mismatch is $\sigma = 0.30\%$ . Fig. 10 shows how the standard deviation of the post-calibration residual mismatch changes with illumination level. The figures show five superimposed graphs. Each one corresponds to performing calibration at different illumination levels (50, 15, 5, 1, and 0.25 k-lux). The worst case situation corresponds to calibrating at about 1k-lux and using the retina at very high light conditions, resulting in a standard deviation of almost 140Hz ( $\sigma = 1.5\%$ ). On the other hand, the optimum situation corresponds to calibrating at 15k-lux, which results in a standard deviation of less than 80Hz ( $\sigma = 0.9\%$ ) over the entire 5 decade range. The calibration process is all done off-line. However, it is conceivable to implement it fully on-chip (through, for example, a vhdl described state machine), since it only requires to expose the chip to uniform illumination (one can simply remove the optics), compare the pixel frequencies (for which not even a precise clock reference is required), and compute an optimum set of calibration weights. ## C. Contrast Step Response Fig. 11 illustrates the retina response to a luminance step of different contrast levels, while thresholding is turned off. Input stimulus is printed paper, providing a static image with a half dark and a half grey side. The half gray side intensity is adjusted between 100% (white) and 30% (darkest gray). Table 3 indicates the relationship of the luminance steps, with the ratio of photo currents between the gray and black parts, and the resulting Weber Contrast (defined as $(I_{light}-I_{dark})/(I_{light}+I_{dark})$ ). The left column in Fig. 11 Fig. 11: Retina response to a luminance step of changing Weber Contrast. Left column is input stimulus. Center column is output response before calibration, and right column is output response after calibration. shows this input stimulus image. The center column in Fig. 11 shows the retina output response before calibration, while the right column shows the retina response after calibration. Central gray level is zero pixel frequency. Brighter pixels are firing positively signed events, while darker pixels are firing negatively signed events. Absolute maximum pixel frequency was 250Hz. Biasing conditions in Fig. 11 were $I_u = 150pA$ , $I_u' = 150pA$ , $V_{high} = 2.9 \text{V}$ , $V_{low} = 0.4 \text{V}$ , and $V_{ref} = 1.65 V$ . ## D. Contrast Sensitivity An important characterization for a spatial contrast retina is its contrast sensitivity: what is the output event rate for a given input contrast stimulus. We have characterized spatial contrast sensitivity for the positive event branch and the negative event branch (see Fig. 5(a)) separately, since they have separate circuitry. Usually, under normal operation, the retina will be biased to have the same sensitivity for positive and negative events. However, there might be situations where one would prefer to set different contrast sensitivities for positive and negative events, and this retina offers this possibility. To characterize pixel contrast sensitivity, a gray level step stimulus (as shown in Fig. 11) of different contrast values, was used. Pixels frequencies of the two columns with the highest activity (the ones just on the left and right of the stimulus center) were recorded. This process was repeated for different bias values for $V_{high}$ and $V_{low}$ , with $V_{ref} = 1.65 \mathrm{V}$ . The results are shown in Fig. 12(a). The measured maximum | Table | 1 | |-------|---| | | | | - | | | | |----------------------|-------|-------|-------|-------|-------|-------| | luminance step | 100% | 70% | 50% | 30% | 10% | 0% | | | to 0% | to 0% | to 0% | to 0% | to 0% | to 0% | | $I_{light}/I_{dark}$ | 9 | 6 | 3.6 | 2.4 | 1.5 | 1.0 | | Weber Contrast (WC) | 0.80 | 0.72 | 0.56 | 0.41 | 0.20 | 0 | Fig. 12: Contrast sensitivity measurements. A stimulus step (as in Fig. 11) was applied and max and min frequencies were recorded. (a) Top panel shows max and min frequencies for different stimulus step contrasts and different threshold values. (b) Bottom panel shows how the maximum and minimum frequencies depends on illumination (WC=0.8). contrast sensitivity was 4400Hz/WC (Hz per Weber Contrast) for $V_{high} - V_{ref} = V_{ref} - V_{low} = 0.15 \text{V}$ . Error bars indicate inter-pixel variability. To show the sensitivity dependence with illumination, the maximum output frequency for a Weber Contrast of WC = 0.8 was measured (for both signs of contrast) with different illumination levels. As shown in Fig. 12(b), sensitivity degrades slightly when illumination decreases. Sensitivity remains almost constant over the first two decades, and approximately doubles over the second two decades. ## E. Contrast Thresholding In Fig. 13, the typical pixel output when the visual field is swept with a grey level bar stimulus of WC=0.8 is shown. The x-axis indicates bar position in row number units. The pixel output spike frequency reaches the maximum value when the stimulus is at the pixel's row. This value depends on the width of the sweeping bar. Several outputs using different bar widths have been plotted for the same pixel. The bar width is expressed in projected pixel units. The maximum frequency is proportional to the stimulus width. In both cases, the following voltages were used: $V_{high}=2.9V$ , $V_{low}=1.4V$ and $V_{ref}=1.65V$ . With these settings, $V_{high}-V_{ref}>V_{ref}-V_{low}$ , so negative events were enhanced. It is also possible to fully inhibit positive or negative events by setting either $I_{high}$ or $I_{low}$ (see Fig. 5(e)) to sufficiently large values. Asymmetrical thresholds $(I_{low} \neq I_{high})$ can also be used. Therefore, positive and negative events can be inhibited independently. In Fig. 14 the effect of thresholding is shown. First, the visual field was swept with a 100% contrast bar for different thresholds. Fig. 14(a) shows the output frequency for pixel (17,11) when setting symmetric thresholds. Fig. 14(b) shows the same pixel results but when setting only threshold values to inhibit Fig. 13: Typical pixel's output when the retina is stimulated with a 100% contrast bar of different widths Fig. 14: Effect of thresholding. (a) Bar is swept for different symmetric thresholds. (b) No threshold for negative events, and positive event thresholds are changed. (c) Events captured for calibrated retina when all positive events are inhibited by setting a high positive threshold. (d) Events captured for calibrated retina with symmetric threshold. (e) Events captured for uncalibrated retina. positive events. The negative output frequency remains constant. The main advantage of thresholding is to remove the residual mismatch after calibration. Pixels usually spike with a low residual output frequency after calibration. Positive and negative thresholds can be set to remove these undesirable outputs after calibration. Fig. 14(c-e) show some snapshots captured with the contrast retina. Central gray color indicates zero output (no contrast). Positive events range from this gray to black and negative events range from this gray to white. The three snapshots were taken for different values of the positive Fig. 15: Latency measurements under changing illumination conditions and negative thresholds. For the three cases, $I_u=150pA$ . In Fig. 14(c) a positive threshold current of 1nA was set to inhibit positive events completely after calibration. $I_{low}$ was 150pA. In Fig. 14(d) a symmetric threshold of 80pA was set after calibration. In Fig. 14(e) the retina output without neither calibration nor thresholding is shown. Above each snapshot the sum of all pixels' frequencies $f_{total}$ is indicated. We can see, by comparing (d) and (e), that calibration reduces event flow (communication bandwidth) while enhancing contrast gain. #### F. Latency Characterization To characterize the retina latency we proceeded as follows. We stimulated a LED with a step signal to turn it ON, focused it over a central region of the sensor array, and recorded the time delay between the step signal and the first event Rqst coming out of the chip from that region. The measurements were repeated by inserting different neutral density filters to attenuate light intensity from about 50k-lux down to 2 lux. The resulting latencies are shown in Fig. 15. The measurement was repeated by focusing the LED over different regions of the pixel array. The bars in Fig. 15 show the spread obtained when changing this region. As can be seen, latency changes from about 10ms down to about 0.1ms when illumination varies over almost 5 decades. This means that latency is dominated by the photo sensing circuits. However, latency does not scale proportionally to light, and consequently this retina does not suffer from the severe Light-to-Time restriction listed in Table 1. # G. Natural Scenes Although the retina resolution is rather low (32 x 32 pixels) for observing natural scenes, Fig. 16 shows some captured images when observing natural elements, which give a first order feeling of how an up-scaled retina version would respond under a natural scene. Fig. 16: Natural elements. From left to right: screw, paper clip, eye, and child face. Fig. 17: Paper clip snapshots in TFS mode for different number of captured events M. Fig. 18: Time line of the Global Reset and the Request signal. ## H. TFS Output Mode As mentioned in Section V.B, the integrate-and-fire circuit of the retina pixel can be configured to operate in TFS mode. In this mode, the refractory period of the retina has to be set to its largest possible value (by connecting voltage $V_{rfr}$ to $V_{dd}$ ) to guarantee that each pixel will fire at the most one single event. Then a periodic reset pulse has to be provided for global signal TFS. This can be done in several ways. One trivial option is to reset at a fixed preset frequency. However, another more efficient option is by counting the output events. Since output events are coming out in decreasing order of pixel contrast, high contrast pixels (either positive or negative) come out first. These are the pixels carrying more relevant information, for example, for a recognition application. Consequently, one could add a simple counter $\underline{at}$ the Rqst line and have it generating a reset pulse for $\overline{TFS}$ after each M events. This way, a dynamic "frame time" $T_{frame}$ would be produced which self adjusts to the contrast level of the scene, independent of ambient light. High contrast scenes would self-tune to faster frames, while low contrast scenes would self-tune to slower frames for the same amount of contrast information. Other more sophisticated options could use a post processing event based system for performing a given recognition and provide the reset pulse once a recognition has been achieved, or reset after a preset time if no recognition was possible. In what follows we count a fixed number of events M. Fig. 17 illustrates the effect of changing M when observing the paper clip of Fig. 16. Note that setting M to low values also removes background noise. Fig. 19: Effect of illumination on $T_{frame}$ and $T_{first}$ The TFS output mode is also insensitive to illumination (in first order), since it operates directly on $I_{cont}$ within the integrate-and-fire circuit (see Fig. 5(c-d)). To show this, several snapshots of the paper clip of Fig. 17 were taken under different illumination conditions. As shown in Fig. 18, $T_{frame}$ is the sum of $T_{first}$ (the time the retina needs to generate the first spike after the reset) and $T_M$ (the time between the first and M-th spike). Fig. 19 shows the value of $T_{frame}$ for different values of M and illumination levels. $T_{frame}$ is almost independent on illumination and is approximately constant for a given M. Fig. 19 also shows the value of $T_{first}$ versus illumination. In principle, $T_{first}$ should not depend on ambient light because this reset is performed within the integrate-and-fire circuit (see Fig. 5(c)) and not the photo sensing circuit (Fig. 5(b)). However, Fig. 19 reveals a slow-down process when decreasing ambient light (between 5k-lux and 200 lux, approximately). This is probably due to switching crosstalk between the integrate-and-fire and photo sensing circuits, which introduces a switching transient in the latter that cannot be prevented when the photo currents are too small. Such problem can be attenuated in future designs by improving decoupling between the two stages, for example, through cascoding techniques. #### I. Power Consumption Chip power consumption has been characterized. Supply voltage is 3.3V. In principle, it would depend on both static bias conditions and output event rate. However, in practice, it is dominated by the latter, because of the high consumption of digital pads communicating output events. Static power dissipation is negligible, since pixel current biases are set to relatively low values. Typical bias settings are $I_u = 150pA$ , $I_{low} = 50pA$ and $I_{high} = 50pA$ . This results in a pixel static current consumption of 15nA. At very low output event rate (1keps) we measured a chip current consumption of $40\mu A$ (130 $\mu W$ ). Fig. 20, shows the measured current consumption of the chip as a function of output event rate. As can be seen, for normal operation regimes (between 100keps and 1Meps) Fig. 20: Chip total current consumption as function of total output event rate current consumption varies between $200 \mu A$ and 2mA $(660 \mu W - 6.6 mW)$ . Pixel output frequency (or TFS timing) range is directly controlled by bias current $I_u$ (see Fig. 5). Therefore, $I_u$ controls also the overall power consumption and the speed-power trade-off. ## VII. CONCLUSIONS A new AER signed spatial contrast retina has been presented. It uses an improved and calibrated version of Boahen's contrast circuit. The design avoids the problem of AER communication bandwidth consumption present in prior designs. Furthermore, it also includes a thresholding mechanism, so that only pixels sensing spatial contrast above a given threshold generate events. A calibration scheme is included to partially compensate for pixel mismatch. An optional TFS coding scheme is also available. Extensive experimental results from a test prototype of 32 x 32 pixels, fabricated in a 0.35μm CMOS technology, are provided. An interesting advantage of this contrast retina is its fast time response as well as low communication throughput, compared to commercial video cameras rendering full frames every 30-40ms. Information throughput is reduced because only relevant contrast information is provided. Regarding speed response, for example when operating in rate coded mode, since active pixels fire at frequencies in the range of 1-5KHz, they would all update its state within fractions of one mili second, independent of ambient light. In TFS mode, the first front of relevant events (M = 250 in Fig. 19) is available in less than 1ms. If the stimulus changes, the retina latency depends on lighting conditions, ranging from about $100\mu s$ at sun light (50k-lux) to 10ms at moon light (2 -lux), with 1ms for indoor ambient light (1 -lux). Consequently, the complexity of developing spike based AER spatial contrast retinae, as opposed to conventional frame-scanned video cameras, is justified by its higher speed response for a very wide range of illumination conditions, while maintaining the information throughput low and ambient light independent. Although information throughput is low, relevant (contrast) information is preserved, which results in significant processing performance improvement for subsequent stages. ## VIII. ACKNOWLEDGEMENTS This work was supported by EU grant 216777 (NABAB), Spanish grant TEC2006-11730-C03-01 (SAMANTA2) and Andalucian grant P06TIC01417 (Brain System). JALB was supported by a JAE scholarship. The authors thank Tobi Delbrück for fruitful discussions and insights. #### IX. REFERENCES - [1] M. Sivilotti, Wiring Considerations in Analog VLSI Systems with Application to Field-Programmable Networks, Ph.D. Thesis, California Institute of Technology, Pasadena CA, 1991. - [2] M. Mahowald, VLSI Analogs of Neural Visual Processing: A Synthesis of Form and Function, Ph.D. Thesis, California Institute of Technology, Pasadena CA, 1992. - [3] E. Culurciello, R. Etienne-Cummings, and K. A. Boahen, "A biomorphic digital image sensor," *IEEE J. Solid-State Circuits*, vol. 38, pp. 281-294, 2003 - [4] P. F. Ruedi, et al., "A 128x128, pixel 120-dB dynamic-range vision-sensor chip for image contrast and orientation extraction," *IEEE J. Solid-State Circuits*, vol. 38, pp. 2325-2333, 2003. - [5] M. Barbaro, P. Y. Burgi, A. Mortara, P. Nussbaum, and F. Heitger, "A 100 x 100 pixel silicon retina for gradient extraction with steering filter capabilities and temporal output coding," *IEEE Journal of Solid-State Circuits*, vol. 37, pp. 160-172, 2002. - [6] S. Chen, and A. Bermak, "Arbitrated Time-To-First Spike CMOS Image Sensor with On-Chip Histogram Equalization," *IEEE Trans. VLSI Systems*, vol. 15, No. 3, pp. 346-357, March 2007. - [7] X. G. Qi, X.; Harris, J, "A Time-to-first-spike CMOS imager," in Proc. of the 2004 IEEE International Symposium on Circuits and Systems (ISCAS2004), Vancouver, Canada, 2004, pp. 824-827. - [8] M. Azadmehr, J. Abrahamsen, and P. Häfliger, "A foveated AER Imager Chip," in *Proc. of the IEEE Int. Symp. on Circ. and Syst.* (ISCAS2005), pp. 2751-2754, Kobe, Japan, 2005. - [9] R.J. Vogelstein, U. Mallik, E. Culurciello, R. Etienne-Cummings, and G. Cauwenberghs, "Spatial acuity modulation of an address-event imager," Proc. of the 2004 11th IEEE International Conference on Electronics, Circuits and Systems (ICECS 2004), pp. 207-210, Dec. 2004. - [10] J. Kramer, "An Integrated Optical Transient Sensor," *IEEE Trans. on Circuits and Systems, Part II: Analog and Digital Signal Proc.*, vol. 49, No. 9, pp. 612-628, Sep. 2002. - [11] P. Lichtsteiner, C. Posch, and T. Delbrück, "A 128× 128 120 dB 15μs Latency Asynchronous Temporal Contrast Vision Sensor," *IEEE J. Solid-State Circ*, vol. 43, No. 2, pp. 566 - 576, Feb. 2008. - [12] M. Arias-Estrada, D. Poussart, and M. Tremblay, "Motion Vision Sensor Architecture with Asynchronous Self-Signaling Pixels," Proc. of the 7th Int. Workshop on Computer Architecture for Machine Perception (CAMP97), pp. 75-83, 1997. - [13] C.M. Higgins and S.A. Shams, "A Biologically Inspired Modular VLSI System for Visual Measurement of Self-Motion," *IEEE Sensors Journal*, vol. 2, No. 6, pp. 508-528, Dec. 2002. - [14] E. Özalevli and C.M. Higgins, "Reconfigurable Biologically Inspired Visual Motion System Using Modular Neuromorphic VLSI Chips," *IEEE Trans. Circ. Syst. I*, vol. 52, No. 1, pp. 79-92, Jan. 2005. - [15] G. Indiveri, A.M. Whatley, and J. Kramer, "A Reconfigurable Neuromorphic VLSI Multi-Chip System Applied to Visual Motion Computation," Proc. Int. Conf. Microelectronics for Neural, Fuzzy and Bio-Inspired Systems (Microneuro99), pp. 37-44, Granada, Spain, 1999. - [16] K. Boahen, "Retinomorphic Chips that see Quadruple Images," Proc. Int. Conf. Microelectronics for Neural, Fuzzy and Bio-Inspired Systems (Microneuro99), pp. 12-20, Granada, Spain, 1999. - [17] K. A. Zaghloul and K. Boahen, "Optic nerve signals in a neuromorphic chip: Parts 1," *IEEE Trans.Biomed Eng.*, vol. 51, pp. 657-666, 2004. - [18] K. A. Zaghloul and K. Boahen, "Optic nerve signals in a neuromorphic chip: Part 2," *IEEE Trans. Biomed Eng.*, vol. 51, pp. 667-675, 2004. - [19] S. Thorpe, D. Fize, C. Marlot, "Speed of processing in the human visual system," *Nature* 381: 520-2, 1996. - [20] S. Thorpe, et al., "SpikeNet: Real-time visual processing with one spike per neuron," *Neurocomputing* 58-60: 857-64, 2004. - [21] K. Boahen and A. Andreou, "A contrast-sensitive retina with reciprocal synapses," in J. E. Moody (Ed.), Advances in neural information processing, vol. 4, pp. 764--772, San Mateo CA, 1992. Morgan Kaufman. - [22] J. Costas-Santos, T. Serrano-Gotarredona, R. Serrano-Gotarredona and B. Linares-Barranco, "A Spatial Contrast Retina with On-chip Calibration for Neuromorphic Spike-Based AER Vision Systems," *IEEE Trans. Circuits and Systems, Part-I: Regular Papers*, vol. 54, No. 7, pp. 1444-1458, July 2007. - [23]T. Serrano-Gotarredona and B. Linares-Barranco, "CMOS Mismatch Model valid from Weak to Strong Inversion", Proc. of the 2003 European Solid State Circuits Conference, (ESSCIRC'03), pp. 627-630, September 2003 - [24]B. Linares-Barranco and T. Serrano-Gotarredona, "On the Design and Characterization of Femtoampere Current-Mode Circuits," *IEEE Journal* of Solid-State Circuits, vol. 38, No. 8, pp. 1353-1363, August 2003. - [25] J. A. Leñero-Bardallo, T. Serrano-Gotarredona, and B. Linares-Barranco, "A mismatch calibrated bipolar spatial contrast AER retina with adjustable contrast threshold," *Proc. of the IEEE Int. Symp. Circ. and Syst.*, pp.1493-1496 (ISCAS 2009), May 2009. - [26] A. G. Andreou and K. Boahen, "Translinear Circuits in Subthreshold CMOS," *Analog Integrated Circuits and Signal Processing*, Kluwer, no. 9, pp. 141–166, Apr. 1996. - [27]T. Poggio, V. Torre, and C. Koch, "Computational vision and regularization theory," *Nature*, 317, pp. 314-319, 1985. - [28]F. Gómez-Rodríguez, R. Paz-Vicente, A. Linares-Barranco, M. Rivas, L. Miro, S. Vicente, G. Jiménez, and A. Civit, "AER Tools for Communications and Debugging," Proc. of the IEEE Int. Symp. Circ. and Syst., pp. 3253-3256, (ISCAS 2006), Kos (Greece), May 2006. - [29] J. A. Leñero-Bardallo, T. Serrano-Gotarredona, and B. Linares-Barranco, "A Calibration Technique for Very Low Current and Compact Tunable Neuromorphic Cells. Application to 5-bit 20nA DACs," *IEEE Trans. Circuits and Systems, Part-II: Brief Papers*, vol. 55, No. 6, pp. 522-526, June 2008 - [30] K. Boahen, "Point-to-Point Connectivity Between Neuromorphic Chips Using Address Events," *IEEE Trans. on Circuits and Systems Part-II*, vol. 47, No. 5, pp. 416-434, May 2000. - [31] R. Serrano-Gotarredona, T. Serrano-Gotarredona, A. Acosta-Jiménez, and B. Linares-Barranco, "A Neuromorphic Cortical Layer Microchip for Spike Based Event Processing Vision Systems," *IEEE Trans. on Circuits and Systems, Part-I*, vol. 53, No. 12, pp. 2548-2566, Dec. 2006.