Smart low-power CMOS cameras for 3G mobile communicators

Share Embed


Descrição do Produto

Invited paper, 1st IEEE Int’l. Conf. on Circuits and Systems for Communications (ICCSC 2002), June 26-28, 2002, St-Petersburg, Russia, pp. 216-225.

Smart Low-Power CMOS Cameras for 3G Mobile Communicators 1

1

1

1

1

1

1

M. Ansorge , S. Tanner , X. Shi , J. Bracamonte , J.-L. Nagel , P. Stadelmann , F. Pellandini , 2 2 P. Seitz , N. Blanc 1

Institute of Microtechnology, University of Neuchâtel, Rue A.-L. Breguet 2, CH – 2000 Neuchâtel, Switzerland Email: [email protected] 2

CSEM S.A., Badenerstrasse 569, CH – 8048 Zürich, Switzerland

Abstract – The paper presents the concept of a smart multifunctional low-power camera for 3G mobile communicators and Personal Digital Assistants. The device is composed of an ultra low-power image sensor featuring a large intra-scene dynamic range and a high sensitivity, so as to ensure proper image acquisition under possibly severe illumination conditions, as encountered in mobile. Further embedded functionalities encompass low-power still image compression, and low-power face authentication for secured access control. Index terms – All-digital CMOS image sensors, Active Pixel Sensor (APS) cameras, still image compression, biometrics, face authentication, very low-power VLSI implementation.

I.

Introduction

The recently introduced cellular wireless networks of generations 2.5G (GPRS, HSCSD), and the emerging ones of generations 2.5G (EDGE) and 3G (UMTS / IMT-2000), cf Table 1, are enabling numerous new services and applications thanks to the enlarged transmission bandwidth they are offering. This is in particular true regarding image and video for mobile multimedia. With this respect, it is worth mentioning that mobile still image communication is raising a remarkable interest among professionals [1], especially in connection with the recently initiated Mobile Multimedia Messaging Service (MMS) standard [2]. Considering on the other hand the progress encountered in the fields of low-power and low-cost CMOS image sensing, of low-power image processing, and of advanced VLSI technologies for Systems-On-a-Chip, one observes that the technological convergence is provided for designing new multifunctional mobile communicators satisfying a large palette of applications. This paper aims at contributing to the field by presenting the concept of a smart multifunctional low-power CMOS camera for 3G mobile communicators and Personal Digital Assistants (PDAs), embedding an ultra low-power image sensor, still image compression, and face authentication for secured access to sensitive data and teleservices. The described image sensor is featuring a high performance in terms of achievable intra-scene dynamic range and sensitivity, to ensure a high image quality over a large range of illumination conditions. This property is particularly useful for mobile image acquisition, because the illumination

conditions can very severly change, most often without control possibility from the user, especially considering outdoors conditions (e.g. back-illuminated scenes, artificial light environments, public transportation means). Additionally, the discussed image sensor is operating at low-voltage, i.e. 1.2 Volt, to drastically reduce the power consumption. Table 1. Cellular wireless networks generations, with indication of the commercial introduction year [3–6]. Nature

Generation

Analog Digital

1G 2G 2.5G

3G

4G

Type

GSM: Global System for Mobile communications GPRS: General Packet Radio System HSCSD: High Speed Circuit Switched Data EDGE: Enhanced Data rates for GSM Evolution UMTS: Universal Mobile Telecommunication Services (IMT-2000: Int’l. Mobile Telecommunications-2000)

Year ca ‘80 19911992 2001

20012002

‘07-‘10

The paper is organized as follows. Section 2 presents the design of an ultra low-power, low voltage, all-digital CMOS image sensor for still image and video. Section 3 is then providing an overview of optoelectronic characterization methods for image sensors. Next, the principle of lowcost color image acquisition is recalled in Section 4. Section 5 is then devoted to low-power still image compression based on a multiplierless implementation of the JPEG standard, whereas Section 6 presents the implementation of a low-power face authentication algorithm using two dedicated VLSI coprocessors. Finally, the conclusions are drawn in Section 7.

II. Very Low-Power Image Acquisition Compared to Charge-Coupled Device (CCD) sensors, CMOS image sensors can be designed using standard CMOS processes, rendering the monolithic integration of

photosensitive electronics, sensor control, and image processing components possible [7]. CMOS image sensors present various advantages, including: i) low voltage, lowpower operation, with power reductions reaching factors beyond 100; ii) embedded analog, digital, and mixed analog-digital signal processing, e.g. to incorporate Analogto-Digital Converters (ADC), but enabling also smart camera functionalities; iii) integration of in-pixel functions, such as offset cancellation, and pixel nonlinearities for high dynamic range in excess of 150 dB; iv) random access to Regions-of-Interest (ROI); v) miniaturization of complete all-digital camera systems; and vi) potential for low-cost. An overview of the field, and of the related terminology and concepts, can be found in [8]. The purpose of this section is to provide first a review of nowadays techniques available for improving the intrascene dynamic range and the sensitivity of image sensors, and for letting them operate at low voltage to cut the power consumption. Second, a strategy for the design of lowpower, high performance image sensors is furnished. This design strategy has been successfully applied to the realization of a test chip, which is presented at the end of the section, along with its principal parameters.

where the photodiode is used as a light-dependent current source. When this current flows through a MOS transistor operating in weak inversion, the relation between drain voltage and drain current becomes logarithmic, resulting in a true logarithmic transfer function between the pixel output voltage and the input light intensity. With such an operation mode, sensors featuring dynamic ranges up to 150 dB have been reported However, this solution suffers from three fundamental limitations. Firstly, the output swing of such a pixel, limited to about 26 mV per decade of light, is low. This has a negative effect on the achievable SNR, rendering the capture of high quality images difficult. Secondly, its speed for low light levels is insufficient, due to the very small current of the MOS operated in weak inversion. Thirdly, it is difficult to correct the fixed pattern noise (FPN) of the pixel, which has also become logarithmic.

A. High Dynamic Range

2) Modified Pixel with Logarithmic Operation: The former limitations can be partly or totally suppressed by applying special pixel design techniques. A pixel made of four transistors was demonstrated in [9], that is incorporating on-pixel gain, and a local feedback loop for improved frequency bandwidth. This results in a pixel with a dynamic range of ca 140 dB, the sensor featuring a high frame rate, even when operating with low light levels. However, the FPN of such a pixel is high, and the fill factor is reduced.

Intra-scene dynamic range is defined as the illumination ratio between the brightest and the darkest observable zone of a scene. Scenes captured under practical illumination conditions, in particular as encountered in mobile applications, are not seldom featuring zones being decades of magnitude brighter than others, the most known situation corresponding to the acquisition of back illuminated subjects. They appear dark compared to the rest of the scene. This problem is not caused by insufficient light, but by the insufficient dynamic range of the acquisition device.

Another approach, described in [10], consists in using the pixel in both linear and logarithmic modes. The pixel works in linear mode for low light intensities, whereas it enters the logarithmic mode above a certain light level. This approach offers the advantage of a high pixel fill factor, a high frequency bandwidth, a good output swing, and a programmable switching point between linear and logarithmic responses. Moreover, the FPN of the pixels can be fully corrected. The limitation of this solution lies in that the logarithmic response is valid for high light intensities only.

Usually, the transfer function relating the output signal of an image sensor to its incoming light is linear. This is the case for all CCD sensors, and for the majority of CMOSbased image sensors. For linear sensors, it is noticed that the dynamic range is equivalent to the signal-to-noise ratio (SNR). Its value is about 55 to 60 dB for standard sensors, and it can reach up to 80 dB for high-end CCD sensors. These values are however insufficient for a use under adverse illumination conditions.

3) Well Capacity Adjustment: In a further technique discussed in [11], the dynamic range is enhanced by increasing the well capacity of the photodiode once or several times during integration, using a lateral overflow gate. This results in a compression (by saturation) of the sensor illumination to charge transfer function. This solution has the disadvantage of creating discontinuities in the pixel transfer function.

Several techniques for improving the dynamic range have been implemented and reported in the literature during the last years. They are discussed below. 1) Logarithmic Operation of the Photodiode: The first technique is based on a nonlinear operation of the pixel photodiode. Unlike CCD sensors, where the photodiode can only be operated in charge integration mode, CMOS sensors allow the photodiode to be operated in current mode,

4) Multiple-Sampling Methods: Finally, the principle of acquiring multiple images of the same scene, each of them exploiting a segment of the scene dynamic, represents an interesting approach. This is realized by assigning different integration times to every constitutive image. An external data fusion algorithm merges then the images into single one embedding the whole dynamic of the scene [12]. Several realizations of this technique have been demonstrated. The first one, described in [13], consists in a sensor

equipped with a dual readout circuitry, allowing for the simultaneous integration and readout of two images. The dynamic range obtained after off-chip data fusion reaches 100 dB. This technique requires however pixels with double readout circuitry, impeding on the fill factor. Another implementation published in [14] relies on a pixel-level floating-point ADC, where multiple analog-to-digital (AD) conversions of the pixel value are performed at time intervals increasing exponentially, which is equivalent to multiple exposures. This technique offers a high SNR, high sensitivity, high flexibility, and it does not need any external frame storage. The main drawback is a complex pixel with reduced fill factor. Comparing the methods, multiple-sampling techniques feature a better image quality (or SNR), because the photodiode is driven in the linear mode, where the FPN can be easily corrected, and because the output swing is maximal. But, these methods suffer from important limits such as loss of spatial correlation between the multiple frames, and a need for increased readout bandwidth, leading in turn to a higher power consumption. Also, the selection of a set of exposure times adapted to the scene dynamic – often requiring an a priori knowledge of the scene dynamic, as well as complex adaptive algorithms for the exposure time calculation [15] – represent another difficulty. 60 Photon shot noise: 10 dB / decade

ADC output SNR [dB]

55 50 45

16 40

8 4

Gain in SNR

2 k=1

35 30 1e3

ADC quant. noise: 20 dB / decade 1e4 1e5 Number of photo-generated electrons

Figure 1. Overall SNR of a digital image sensor with and without signal amplification before ADC.

B. High Sensitivity High sensitivity is usually obtained by combining three different techniques. Firstly, the pixel photo-detection efficiency is optimized by increasing the fill factor, by choosing a technology with good quantum efficiency, and optionally by laying down micro-lenses on each pixel. Secondly, the electronic noise sources appearing in the readout path, i.e. from pixel to ADC, are reduced to improve SNR. And thirdly, an amplification of the pixel signal is performed prior to AD conversion, so as to improve the SNR, the

corresponding effect being shown in Fig. 1. This figure represents the simulated overall SNR of a digital image sensor observed after AD conversion, and expressed in function of the illumination. For high light intensity, the SNR is limited by the photon shot noise of the incoming light (loss of 10 dB/decade of light). For low light intensity, the SNR is restricted by the quantification noise of the ADC (loss of 20 dB/decade). If the signal is amplified before the ADC, the obtained signal SNR is no longer limited by the ADC quantification noise, but by the shot noise, resulting in a gain in the overall SNR. The amplification acting also on noise, it is important to minimize noise sources in the pixel signal path. Beyond a certain gain (16 in our example), the amplification becomes inefficient due to electronic noise.

C. Low Power and Low Voltage Imagers Recent publications have demonstrated the feasibility of CMOS image sensors in deep sub-micron technologies [16]. These technologies require the use of very low supply voltages (< 1.5 V), which render the design of high performance image sensors very challenging, and need new design techniques to be found. Very low-voltage sensors have already been demonstrated, cf Reference [17] presenting a sensor operating at 1.2 V.

D. Design Strategy, and Realization of Ultra Low Power High Performance Imager Based on the concepts described in Subsections A to C above, the following strategy is proposed for the design of ultra low-power, high performance image sensors: • Use of a 0.25 µm CMOS technology (0.18 µm is avoided because of poor quantum efficiency). • Maximal intra-scene dynamic range relying on a socalled dual exposure scheme specified below. • Maximum sensitivity by applying the techniques explained in Subsection B. • Systematic use of a single voltage supply of 1.2 V for the analog components (image sensor and ADCs). The sequel of this subsection is devoted to describing the dual exposure scheme, before discussing an image sensor architecture relying on the design strategy, and presenting the realization of a test circuit. 1) Dual Exposure Scheme: Referring to the multiple-sampling techniques, we propose a double integration scheme for high dynamic range image acquisition. Indeed, using the minimum of two images for multiple sampling has the advantage of minimizing frame buffering, data fusion complexity, and loss of spatial correlation between the images. Compared to [13], this solution requires a standard 3transistor pixel, instead of a dual readout circuitry for the sensor, hence maximizing the fill factor (and thus the sensitivity) of the sensor. The corresponding sensor timing

operation is depicted in Fig. 2. A dual exposure cycle begins with the reset of the first image (Reset Scan 1), followed by the long exposure time (Exposure Time 1). The first image is read out (Read Scan 1), and is immediately followed by the reset of the second image (Reset Scan 2). The second integration takes place, now with the short exposure time, typically an order of magnitude shorter than the first one. At the end of the second exposure, the readout of the second image takes place (Read Scan 2). The sensor is then ready for starting a new dual exposure cycle.

Dual readout period

Block of w x n Double low voltage Sampling active pixels w Stage n Row addr.

1 Exposure time 2

Pixel array mux

Reset circuit

sc an R 1 ea (im (im d s ag ag ca e e n2 i) i) R e (im se R ag t sc es e an (im et i+ 1 1) ag sca e n2 i)

Re ad

es et sc an

1

(im ag e

i)

Exposure time 1

R

Row index

n

rent consuming circuits (i.e. switches, reset circuit) that are supplied with 2.5 Volt. The low-voltage pixel structure [19] has an output swing of 0.8 Volt, and features a soft reset operation [18]. The double sampling stage and programmable gain amplifier – the gain ranging from 1 to 8 – are switched capacitor based circuits. The ADC is a weightedcapacitor, charge redistribution successive approximation ADC, with an overall capacitance of 8 pF. The ADC incorporates a digital offset compensation circuit correcting the offset of the whole analog chain, and thus also the interchannel offset errors.

Time

Figure 2. Timing operation of the proposed dual exposure scheme.

For both reset and read operations, the two frames are addressed in a time-interleaved manner, so that no special hardware is required for row addressing or column readout. Another advantage of this approach is the relatively small image buffering needed. Only a portion of the first image (corresponding to the duration of the short exposure time) has to be temporarily stored for data fusion of the two images. Consequently, the buffer as well as the data fusion hardware can be easily implemented on-chip. The proposed scheme is expected to provide an improvement of the dynamic range amounting to ca 15 to 20 dB, at the expense of the power consumption which increases by a factor of two. 2) Sensor Architecture: The proposed sensor structure is depicted in Fig. 3. The readout circuitry is made of j analog chains operating in parallel, each of them processing w pixel columns (in our case, w = 16). The analog chains are made of double sampling stages, programmable gain amplifiers, and 10-bit successive approximation register (SAR) ADCs. This architecture represents a good trade-off between the fully parallel case [18], where the small pitch renders the layout design difficult, and the fully sequential case, that consumes more power, and is difficult to implement because of speed limitations at such low voltages. All analog components, ranging from the pixels to the ADCs, are supplied with 1.2 Volt, except for some non cur-

Prog. Gain Amplifier

10

10-bit SAR ADC 10

j analog chains operating in parallel

Figure 3. Proposed sensor architecture.

To decrease the power consumption, the current sources (especially for the columns) are switched off when unused. All components are designed so as to achieve an overall SNR of 55 dB, even with the maximum amplification gain factor of 8. With such a gain, the expected conversion gain is 88 µV/e, corresponding to ca 0.2 lux of ambient light. Inter-channel gain error is expected to be smaller than 1 LSB. The intra-scene dynamic range expected after image data fusion using the dual exposure amounts to ca 80 dB.

Pixel array (356 x 292)

Prog. gain amplifiers 22 SAR ADCs Control logic

Figure 4. Photomicrography and floor-plan of the test chip.

3) Realization Example: CIF Test-Chip: The proposed design strategy was applied to realize a digital image sensor of CIF resolution (352x288 pixels), which implies j = 22 in

Fig. 3, and it was fabricated in a 0.25 µm standard CMOS technology. It includes a digital controller, on-chip ADCs and bias voltage references. The supply voltages are 1.2 Volt for the analog components of the circuit (sensor and ADC), 2.5 Volt for the digital part, and 3.3 Volt for the I/O pads, respectively. The chip features a remarkable overall power consumption of 2 mW only, excluding I/O pads, at a frame rate of 50 fps, counted before image data fusion. A photomicrography is shown in Fig. 4 with indication of the floor-plan, the specified sensor parameters and expected performance being listed in Table 2. The characterization of the chip prototypes started when this paper was written. Table 2. Specified sensor parameters and expected performance. Features Technology Dimensions Nb of pixels Pixel pitch Fill factor Nominal ADC resolution Maximum frame rate Voltage supply range Performance @ Vcc = 1.2 V, 30°C Max. conversion gain including column amplification Overall SNR Dynamic range after data fusion Overall power consumption

Value CMOS 0.25 µm 1 Poly + 5 Metal layers 3.2 x 2.9 mm = 9 mm 2 H 356 x V 292 6 µm x 6 µm 45 % 10 bit 50 fps @ fck=6.6 MHz 1.2 – 1.8 Volt Value 88 µV/e > 55 dB > 80 dB 2 mW (without I/O pads)

III. Optoelectronic Characterization of Image Sensors An important issue concerns the quality assessment of image sensors. For an absolute comparison of the optoelectronic performance between different solid-state image sensors, the following four key measurements are usually employed, see also Reference [20]: (1) Modulation transfer function (MTF), describing the spatial resolution of the image sensor as a function of a sinusoidal radiation stimulus with varying spatial frequency. The most direct way of producing the desired sine pattern on the image sensor’s surface is the generation with an interferometric optical setup, using a laser as a coherent monochromatic light source, Fig. 5. This avoids any additional effects of imaging with a – usually not precisely known – optical imaging lens. As an alternative, the optical cross talk function can be determined, i.e. the sensor’s output as a function of the lateral position of a “point source” of light. The cross talk function is related via a Fourier transform to the MTF. Since it is practically impossible to create an ideal point source of light, determination of the MTF is much more convenient and accurate in practice.

(2) Photodetection uniformity, requiring the measurement of an offset (dark response) and a gain value (linear light-to-voltage conversion value) for each pixel. The resulting map of offset and pixel response non-uniformity (PRNU) indicates how uniform the image sensor’s output is. For this measurement, a uniform radiation field is required, which is usually created with a socalled integrating sphere, whose radiation variations over the field of view can be less than 0.5%. (3) Response linearity, describing the relationship between electrical output and radiation power incident on the image sensor (illuminance). Important parameters that can be read directly from this relationship include the dark noise, the photodetection limit, the saturation illumination, the saturation photocharge, and the sensitivity of the photoconversion process, usually described in units of µV per photoelectron. The ratio between saturation signal and dark noise yields the image sensor’s dynamic range (D/R). (4) Absolute quantum efficiency (QE), describing the wavelength-dependent effectiveness of an image sensor in converting photons into charge pairs. The QE is usually expressed as a percent value, indicating which percentage of the incident photons are converted into detectable photocharge. An alternative description of the photoconversion efficiency is the responsivity, relating the produced photocurrent to the incident radiation power. The responsivity is measured in units of A/W, and it can be calculated directly from the QE [21]. As an alternative to the use of absolute, radiometric units for the quantification of the incident light, one can also employ photometric quantities such as lux and lumen that are related to the human visual perception. The disadvantage of these units is that they depend very strongly on the spectral distribution of the employed light source, which must be precisely defined, consequentially. These units also cannot take into account the image sensors’ detection properties in the near infrared spectral range. For this reason, the optoelectronic characterization of solid state image sensors in terms of the described absolute, radiometric units is of primary importance for the unbiased performance comparison between different image sensors and camera modules constructed with them.

IV. Low-Cost Color Image Acquisition It is worth shorty recalling low-cost techniques for color image acquisition. Color images are known to be achieved by the superposition of three chromatically complementary constitutive images, usually composed of the red (R), green (G), and blue (B) colors. High quality images for professional use require the capture of full color images, where every pixel is sensed with its RGB components, but the related hardware and calibration costs are important.

a) Figure 5. Interferometric Modulation Transfer Function (MTF) measurement of an image sensor: a) Optical MTF characterization setup based on a Michelson interferometer. The image sensor readout module under test is mounted on the far right. b) Example of actual MTF measurement results obtained at a wavelength of 633 nm with a CMOS image sensor of 10.5 µm pixel pitch. Since the pixel geometry is not identical in the horizontal and vertical directions, the corresponding MTF curves slightly differ. Comparison with the theoretical MTF prediction, based only on geometrical considerations, reveals the blurring effect of charge carrier generation and non-directed transport through diffusion in the semiconductor’s field-free bulk.

Low-cost solutions for the consumer market are achieved by laying a color filter array (CFA) down onto the image sensor, allowing for a certain distribution of the RGB pixels over the sensor, e.g. the Bayer CFA pattern [22]. The constitutive RGB images are then reconstructed using different spatial (non-) linear interpolation algorithms [23, 22, 24], this operation being called Demosaicing, resulting in a satisfying quality for many consumer applications.

V. Low-Power Still Image Compression The still image coding algorithms that are considered for low-power implementation in mobile communication devices concern JPEG [25, 26], and JPEG2000 [27, 26] in the longer term. JPEG is a variable coding bit-rate algorithm, producing compression ratios (CRs) in correspondence with the image activity. It fits well to low complexity [28] and low-power implementation, and features a good image quality at low and medium CRs. JPEG2000 in turn produces a superior performance at low-bit rates, and offers post-compression rate allocation, and several remarkable features such as region-of-interest coding and scalability [27]. However, it is 3-5 times more complex [29] and power consuming than JPEG. At lower CR, image quality is from a subjective point of view almost the same for JPEG and JPEG2000. For mobile still image communication applications, the capability of controlling the compression ratio is very important, either to manage the memory space allocated to image storage, or to maintain the transmission time within

limits. Recently, a specific bit-rate control method was elaborated for the JPEG algorithm [30] to fulfil these requirements. Details on the functioning and performance of this method, that is fully JPEG standard compliant, are available in [31]. Notice that this method is also useful to match the bit-rate to the channel bandwidth in case of Motion-JPEG based video transmission.

A. Multiplierless JPEG Implementation An efficient implementation scheme for a JPEG encoder is reported in [28], in which it is demonstrated that the baseline JPEG algorithm can be executed without involving any multiplication. All the arithmetic operations have been reduced to simple additions/subtractions and very short shifts. This translates into a JPEG hardware implementation of reduced complexity, which makes the approach attractive in digital image applications for portable devices, where silicon area and power consumption are dominant issues in the design. Furthermore, the presented multiplierless implementation produces negligible losses in terms of compression efficiency, and in terms of objective/subjective quality of reconstructed images, with respect to a 16-bit JPEG standard integer-multiplication implementation (J-SIMI). 1) JPEG Encoder Algorithm: Fig. 6 depicts the block diagram of the baseline JPEG encoder, whose main operations are recalled below [25, 32]. The first operation consists in converting the input RGB image into the luminance/ chrominance YCbCr color space representation. This transformation favors compression efficiency by eliminating a significant part of the interband RGB correlation.

The luminance and chrominance bands are then subdivided into blocks of 8 x 8 pixels. A two-dimensional 8point Discrete Cosine Transform (DCT) is applied on each of these blocks to decorrelate the original data. The 2-D DCT coefficients obtained after the linear transform are quantized with a 64-element normalization matrix. The visually important 2-D DCT coefficients located in the topleft low-frequency region of the array are quantized with short quantization steps, while the rest is coarsely quantized. Finally, the data undergo a lossless compression in the entropy coder, resulting in the outgoing compressed image data [25, 32], cf Fig. 6. Original Image

RGB to YCbCr

DCT

Quantizer

Entropy Coder

Compressed data

Figure 6. Baseline JPEG encoder [25, 32].

2) Multiplierless Approach: The multiplierless approach starts by identifying all the multiplications, y=a·x, required by the JPEG algorithm, where a represents a fixed coefficient, and x, a signal sample value. The set of a coefficients is then approximated and replaced by a corresponding set of â values. Three constraints were defined to produce the set of â values. The first one was that each â should approximate its corresponding a coefficient with a maximum absolute relative error, |(a – â)/a|, of 1%. This accuracy requirement conditions the results of the multiplierless approach to be very close to those produced by an J-SIMI implementation, in terms of compression ratio, as well as in terms of reconstruction quality. The second constraint consisted in producing a binary representation of each â, which contains the least number of 1’s. This implies the execution of y=â·x with a minimum number of add/shift iterations or stages, which favors speed and power consumption. In order to minimize the silicon area of the add/shifter unit, the maximum single shift required to represent each â should be kept to a minimum value. The maximum single shift was constrained in this implementation as not to be higher than 6. 3) JPEG Implementation Schemes and Results: The JPEG algorithm executes four different tasks, the first three (color conversion, DCT, and quantization) requiring intensive multiplications. The fourth task consists of an entropy coder, which does not involve any multiplication, but can still be optimized for low-power consumption [33]. The results regarding the coefficient approximation for the different stages are given in Table 3. For both the (RGB-YCbCr) color space conversion and the DCT unit, the largest number of add/shift operations, and the maximum required individual shift is 4. Using these values, a maximum relative error produced is of 0.87 % for the color

conversion module, and of 0.56 % for the DCT. The maximum number of add/shift operations and the maximum individual shift is 6 for the approximation of the 128 coefficients of the quantization unit, in which the largest relative error produced was 0.98 % [28, 33]. Table 3. Maximum value of: a) Add/Shift iterations; b) Individual shift, and c) Relative error.

Unit RGB-YCbCr DCT Quantizer

Max Nb Add/Shift 4 4 6

Max Nb Indiv. shift 4 4 6

Max Rel. Error 0.87 % 0.56 % 0.98 %

The very low values of the maximum produced relative errors show the excellent approximation of the multiplierless scheme. This is quantitatively and qualitatively reflected in the peak signal-to-noise ratio (PSNR) values and visual quality of the reconstructed images with respect to the results obtained with a J-SIMI implementation. Two multiplierless schemes were implemented. The first scheme uses the minimum approximation values that satisfy the constraints given in Subsection A.2, which results in a faster implementation, the corresponding features being listed in Table 3. For this scheme the average PSNR drops by 0.225 dB, for a practically identical compression ratio per image, with respect to a J-SIMI implementation. Fig. 7c) shows a region of an image representative of such a global drop in the PSNR value.

a)

b)

c)

d)

Figure 7. Reconstructed images, compression ratio = 22 in all cases. a) Original; b) J-SIMI: 35.478 dB; c) Scheme 1: 35.249 dB; d) Scheme 2: 35.467 dB.

In order to simplify the system control unit, a second multiplierless scheme is proposed that uses a fixed value of

6 for both the number of add/shift iterations, and for a maximum individual shift. For this scheme, the average PSNR drop over a large and varied test image set is of a negligible 0.018 dB, for a virtually identical compression ratio per image, obtained with a J-SIMI based system. Fig. 7d) shows the image quality achieved with this scheme.

VI. Low-Power Face Authentication Biometric verification/authentication is a further functionality presenting a high level of interest for mobile communicators and PDAs, by providing a secured access to sensitive local or remote data, and to a variety of teleservices. Moreover, it is possible to automatically select the user settings, in case the mobile communicator is shared by a few persons. Face authentication is particularly appealing among biometrics verification methods, because it is intrinsically non-intrusive, and does not require an extra sensor, assuming that the needed camera is shared with regular image/video acquisition applications. But face authentication involves intensive computations and data transfers that are to be strongly optimized to fulfil the low-power and limited resources requirements of portable devices. In this section, a low-power VLSI architecture for face authentication is presented, based on the results reported in [34, 35]. The selected algorithm is discussed first, before describing the VLSI system architecture.

A. Elastic Graph Matching Algorithm Elastic Graph Matching (EGM) was selected as the base algorithm for face authentication due to its intrinsic compensation for face expression variations and small pose changes [36]. EGM is a method relying on a labeled graph, cf Fig. 8, composed of local features centered on a set of nodes connected by edges. A correspondence is searched between features of a reference graph (Fig. 8.a) and features of a test graph (Fig. 8.b), which minimizes the euclidean distance between these features. The nodes of the test graph can be individually displaced to further minimize the distance score, while every node displacement is in turn penalizing the score in order to take into account the topological structure. Feature extraction at each node is performed using mathematical morphology [37], the face being successively dilated and eroded (cf Fig. 8.c) by progressive circular binary structuring elements. The EGM algorithm presented in this paper was tested with a grid of 8 x 8 nodes, the number of features extracted at each node amounting to 19 features, composed of the local pixel value, and of 9 dilations and 9 erosions [34, 35]. This set of features is then reduced to only 3 values through Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) [38]. Even after LDA, the total number of features on all nodes is too high to use e.g. an N-

dimensional Bayesian classifier (N=64.3). But assuming simplifications – i.e. normally distributed class likelihood in the feature space, and feature independence, so that the covariance matrix of the features becomes diagonal – then the minimum distance classifier is known to be the euclidean distance. The classification is then considered as a two-class separation problem that can be handled with a Bayesian classifier. a)

edge

node

b)

c) Progressive

...

dilation Progressive erosion

...

Increasing size of the structuring element

Figure 8. Graph matching based face recognition: a) Reference image with graph of 8x8 nodes laid on the face; b) Test image with similar graph; c) Set of images obtained by successive dilations and erosions.

B. VLSI System Architecture In the designed face authentication system, morphological features extraction and graph matching are executed separately by two dedicated coprocessors [34, 35]. The motivations therefore are multifold. Firstly, the operations required by these two tasks are very different, morphology using mostly comparisons, whereas feature reduction and graph matching rely on Multiply-Accumulate (MAC) type operations. Secondly, it was demonstrated [35] that computing morphology over the entire image before proceeding to image matching, does not involve any redundant operation, but saves computations, while improving regularity. Thirdly, by processing morphology all at once over the complete image, it is possible to reuse loaded pixels for neighboring positions. The factorized implementation is therefore nicely adapted to low-power applications. The resulting face authentication system (Fig. 9) is composed of the CMOS image sensor, a master processor handling high-level tasks (classification, database management, interface control), a shared RAM, and two coprocessors dedicated to multiscale morphology [35] and graph matching [34], respectively. The data flow proceeds then as follows: 1) the face image captured by the sensor is stored in the RAM; 2) the morphology coprocessor reads the image data, extracts the features, and stores the results back in the RAM; 3) the matching coprocessor performs feature

reduction and graph matching, before returning the obtained measure score to the master processor. Interestingly, it is observed that by organizing the data transfer from the morphology coprocessor to the matching coprocessor via the shared RAM, it is possible to decouple and thus simplify the sequencing of both coprocessors. Moreover, by processing an entire image with morphology before starting graph matching, the arbitration of the shared RAM gets also simpler. Indeed, every coprocessor operates as master of the shared bus when accessing the RAM, while liberating again the bus at the end of the data transfers. The overall bus arbitration remains under the responsability of the master processor.

RAM

Image Sensor

Morphology coprocessor

Bus arbitration

Shared Bus

Matching coprocessor

VII. Conclusions This paper presented the concept of a smart multifunctional low-power CMOS camera for 3G mobile communicators and PDAs, embedding an ultra low-power high performance image sensor, still image compression, and face authentication for access control. For the same remarkably low power consumption of 2 mW and a voltage supply of 1.2 V, the image sensor can be either operated at 50 fps in single exposure mode with an intra-scene dynamic range of 55 dB, or at 25 fps in dual exposure mode with a dynamic range of 80 dB. A multiplierless JPEG implementation was then presented, that can be nicely adapted using the JPEG standard compliant bit-rate control method described in [30, 31], in case storage space or transmission bandwidth need to be controlled. Finally, a low-power implementation of an EGM-based face authentication algorithm was discussed, the processing time for a complete face authentication amounting to less than one second.

Master processor

Finally, it is noticed that the bit-rate control method described in [30, 31] could also be applied to remote face verification/recognition over compressed images, so as to match the quality of the reconstructed images with the requirements of the remote verification/recognition system.

Figure 9. Architecture of the face authentication system.

Acknowledgements

Registers

Registers

Program memory

Registers

Program memory

A detailed discussion of both coprocessor architectures, including implemented functionalities, structure and signal flow organization, instruction set description, and performance evaluation, can be found in [35, 34]. The complexity figures show that a complete face authentication requires less than 10 million cycles, cf Table 4, excluding image acquisition and face detection. Assuming an operating frequency of 10 MHz, the face authentication is performed in less than 1 second. Timing constraints in the architecture can thus be strongly released, and power consumption substantially reduced, thanks to the low system clock frequency. Table 4. Complexity figures for a complete face authentication.

Static program size (# of instructions) Morphology coprocessor Graph matching coprocessor Number of executed cycles Morphology coprocessor Graph matching coprocessor

217 557 2’240’846 6’746’186

Presently ongoing activities in the subject concern the generation of VHDL code describing the coprocessor architectures, and their synthesis in a standard low-power CMOS technology. Moreover, algorithmic improvements, and problem-oriented application scenarios, will be studied.

The paper is based on research activities partly financed by CSEM S.A. under Grant OIH3, and by the Swiss Federal Office for Education and Science under Grants C97.0050 (COST 254 project) and C00.0105 (COST 276 project). The cooperation in the field of face authentication with the Technical University “Gh. Asachi” (Prof. L. Goras) in Iasi / Romania, is greatly acknowledged. This cooperation is carried out in the framework of the programme SCOPES (Scientific cooperation between Eastern Europe and Switzerland) supported by the Swiss National Science Foundation and financed by the Swiss Federal Department of Foreign Affairs under Grant SCOPES 7RUPJ062381.

References [1] Web site on still image communication over mobile networks: http://www.mobilestillimages.com/ , March 2002. [2] Web site on Mobile Multimedia Messaging Service (MMS): http://www.mobilemms.com/ , March 2002. [3] GSM World’s Web site: http://www.gsmworld.com , March 2002. [4] UMTS Forum Report No. 11, Enabling UMTS/Third Generation Services and Applications, London, UK, Oct. 2000, available at [5]. [5] UMTS Forum’s Web page: http://www.umts-forum.org/ , March 2002. [6] 3GPP: 3rd Generation Partnership Project’s Web site: http://www.3gpp.org , March 2002.

[7] E.R. Fossum, “CMOS Image Sensors: Electronic CameraOn-A-Chip”, IEEE Trans. on Electonic Devices, vol. 44, pp. 1689-1698, 1997. [8] S. Tanner, M. Ansorge, F. Pellandini, N. Blanc, “Low-power micro-cameras for mobile multimedia devices”, in Towards Intelligent Integrated Media Communication Techniques, J. Tasic (Ed.), 42 pages, Kluwer, 2002 (in print). [9] J. Huppertz, R. Hauschild, B.J. Hosticka, T. Kneip, S. Müller, M. Schwarz, “Fast CMOS Imaging with High Dynamic Range”, Proc. IEEE Workshop on Charge-Coupled Devices and Advanced Image Sensors, pp. R7-1 to R7-4, Bruges, Belgium, June 5-7, 1997. [10] M. Waeny, “High Dynamic CMOS Image Sensors”, G. I. T. Imaging & Microscopy 03/2001, pp. 26-28, G. I. T. Verlag, 64220 Darmstadt, Germany, 2001. [11] S. Decker, R. McGrath, K. Brehmer, C. Sodini, “A 256 x 256 CMOS imaging array with wide dynamic range pixels and column-parallel digital output”, IEEE ISSCC Digest of Technical Papers, pp. 176-177, San Francisco, CA, USA, Feb. 1998. [12] B. Wandell, P. Catrysse, J. DiCarlo, D. Yang, A. El Gamal, “Multiple Capture Single Image Architecture with a CMOS Sensor”, Proc. Int’l. Symp. on Multispectral Imaging and Color Reproduction for Digital Archives, pp. 11-17, Chiba, Japan, Oct. 21-22, 1999. [13] O. Yadid-Pecht, E.R. Fossum, “Wide Intra-Scene Dynamic Range CMOS APS Using Dual Sampling”, IEEE. Trans. on Electron Devices, vol. 44, no. 10, pp. 1721-1723, Oct. 1997. [14] D. X. Yang, A. El Gamal, B. Fowler, H. Tian, “A 640 x 512 CMOS Image Sensor with Ultrawide Dynamic Range Floating-Point Pixel-Level ADC”, IEEE J. of Solid-State Circuits, vol. 34, no. 12, pp. 1821-1834, Dec. 1999. [15] M. Schwarz, A. Bussmann, T. Heinmann, B.J. Hosticka, J. Huppertz, O. Schrey, “High Dynamic Range CMOS Image Sensors Featuring Multiple Integration and Auto-Calibration”, Proc. European Conf. on Circuit Theory and Design, ECCTD’01, vol. 2, pp. II 49-52, Espoo, Finland, Aug. 2001. [16] H. Tian, X. Liu, S. Lim, S. Kleinfelder, A. El Gamal, “Active Pixel Sensors Fabricated in a Standard 0.18 µm CMOS Technology”, Proc. SPIE, vol. 4306, pp. 441-449, 2001. [17] K.-B. Cho, A. Krymski, E.R. Fossum, “A 1.2V Micropower CMOS Active Pixel Image Sensor for Portable Applications”, IEEE ISSCC Digest of Technical Papers, pp. 114115, San Francisco, CA, USA, Feb. 7-10, 2000. [18] B. Pain, G. Yang, C. Ho, “A Single-Chip Programmable Digital CMOS Imager with Enhanced Low-Light Detection Capability”, Proc. 13th Int’l. Conf. on VLSI Design, Calcutta, India, Jan. 4-7, 2000. [19] Patent pending, 2002. [20] M. Willemin, N. Blanc, G.K. Lang, S. Lauxtermann, P. Schwider, P. Seitz and M. Wäny, “Optical characterization methods for solid-state image sensors”, Optics and Lasers in Engineering, vol. 36, pp. 185-194, 2001. [21] P. Seitz, “Solid state image sensing”, in Handbook of Computer Visions and Applications, B. Jähne, H. Haussecker and P. Geissler (Eds.), pp. 165-222, Academic Press, 2000. [22] R. Kimmel, “Demosaicing: Image Reconstruction from CCD Samples”, IEEE Trans. on Image Processing, vol. 8, no. 9, pp. 1221-1228, Sept. 1999.

[23] T. Sakamoto, C. Nakanishi, T. Hase, “Software Pixel Interpolation for Digital Still Cameras Suitable for a 32-Bit MCU”, IEEE Trans. on Consumer Electronics, vol. 44, no. 4, pp. 1342-1352, Nov. 1998. [24] P. Longère, X. Zhang, P.B. Delahunt, D.H. Brainard, “Perceptual Assessment of Demosaicing Algorithm Performance”, Proc. of IEEE, vol. 90, no. 1, pp. 123-132, Jan. 2002. [25] W.B. Pennebaker, J.L. Mitchell, JPEG Still Image Data Compression Standard, Van Nostrand Reinhold, New York, USA, 1993. [26] Web site of JPEG Committee: http://www.jpeg.org/, March 2002. [27] C. Christopoulos, A. Skodas, T. Ebrahimi, “The JPEG 2000 Still Image Coding System: An Overview”, IEEE Trans. on Consumer Electronics, vol. 46, no. 4, pp. 1103-1127, Nov. 2000. [28] J. Bracamonte, P. Stadelmann, M. Ansorge, F. Pellandini, “A Multiplierless Implementation Scheme for the JPEG Image Coding Algorithm”, Proc. IEEE Nordic Signal Processing Symposium, NORSIG 2000, pp. 17-20, Kolmarden, Sweden, June 13-15, 2000. [29] D. Santa-Cruz, T. Ebrahimi, “A Study of JPEG 2000 Still Image Coding Versus Other Standards”, Proc. 10th European Signal Processing Conf., EUSIPCO 2000, pp. 673-676, Tampere, Finland, Sept. 2000. [30] J. Bracamonte, M. Ansorge, F. Pellandini, Bit-Rate Control Process for Digital Images (in French), PCT/CH98/00413, Patent pending. [31] J. Bracamonte, M. Ansorge, and F. Pellandini, “Bit-Rate Control for the JPEG Algorithm”, in Towards Intelligent Integrated Media Communication Techniques, J. Tasic (Ed.), 42 pages, Kluwer, 2002 (in print). [32] ITU-T Recommendation T.81, Digital Compression and coding of continuous-tone still images, Sept. 1992. [33] J. Bracamonte, P. Stadelmann, M. Ansorge, F. Pellandini, “A Fully Multiplierless Implementation Scheme for the JPEG Image Coding Algorithm”, to be submitted to IEEE Trans. on Circuits and Systems for Video Technology, 2002. [34] J.-L. Nagel, P. Stadelmann, M. Ansorge, F. Pellandini, “A low-power VLSI architecture for face verification using elastic graph matching”, Accepted for publication at XIth European Signal Processing Conf., EUSIPCO 2002, Toulouse, France, Sept. 03-06, 2002. [35] P. Stadelmann, J.-L. Nagel, M. Ansorge, F. Pellandini, “A multiscale morphological coprocessor for low-power face authentication”, Accepted for publication at XIth European Signal Processing Conf., EUSIPCO 2002, Toulouse, France, Sept. 03-06, 2002. [36] M. Lades, J.C. Vorbrüggen, J. Buhmann, J. Lange, C. von der Malsburg, R.P. Würtz, W. Konen, “Distortion Invariant Object Recognition in the Dynamic Link Architecture”, IEEE Trans. on Computers, vol. 42, no. 3, pp. 300-311, March 1993. [37] C. Kotropoulos, A. Tefas, I. Pitas, “Frontal Face Authentication Using Discriminating Grids with Morphological Feature Vectors”, IEEE Trans. on Multimedia, vol. 2, no. 1, pp. 14-26, March 2000. [38] D. L. Swets, J. Weng, “Using Discriminant Eigenfeatures for Image Retrieval”, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831-836, Aug. 1996.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.